Checking for unique values in a feature class field

Unique Values

As part of a larger set of scripts, I needed to check for unique values in a feature class field. There are a number of manual ways to do this in ArcGIS Desktop, but here is a snippet to create a list of non-unique values and print them to the console. You could also write them to a file, or incorporate the snippet into a fix for the issue.

The basic concept is a “list within a list”. The Search Cursor is the master “list” that gets iterated over, and then values get added either to a unique value list or a duplicate value list (if they already exist in the unique value list). To speed up the analysis, I set up the Search Cursor parameters so that it only uses a single field. I don’t need any geometry or other fields in order to analyze the unique values in one field.

String formatting gets used in a couple of places here, substituting the {0} in an output string with the value in parentheses after str.format(). The “str” in the string methods represents the string you are formatting or otherwise altering. Another string method, str.join(), is used to print the list values, in this case separated by a new line: “\n”.join(list). If you wanted commas, you could use “,”.join(list).

import arcpy
# input feature class and field name to check for duplicates
inputFC = r"c:\mygeodatabase.gdb\myfeatureclass"
myField = "myField"
# create new, empty lists for duplicate values and unique values
dupeList = []
uniqueList = []
# create a search cursor with only a single field in the second parameter
# - the one you are analyzing.
# this is faster than the default "*" which returns all fields.
# iterate through all the values
with arcpy.da.SearchCursor(inputFC, [myField]) as cursor:
for row in cursor:
# get value in your field for this row of the ursor
value = row[0]
# check to see if the value already exists in the unique value list
if value in uniqueList:
# if it exists, append it to the list of duplicates
dupeList.append(value)
else:
# if it doesn't exist, append it to the list of unique values
uniqueList.append(value)
# if the duplicate values list has any values, print the # of them and then list them
if len(dupeList) > 0:
# uses string formatting to print the number of duplicates: the "length" of the list
# cast from an integer to a string for printing
print("Duplicate IDs found ({0})\n".format(str(len(dupeList))))
# prints each item in the duplicate value list separated by \n (new line)
print("\n".join(dupeList))
else:
print("No duplicates found in {0} values.".format(str(len(uniqueList))))

Python documentation:

ArcGIS Desktop documentation:

Truncating feature class fields

Failed on input OID…

Ever had an append or other operation fail because you are trying to squeeze a longer field into a shorter field?

append failure

While it’s neat that you can now change text field lengths in ArcCatalog, you can only increase them, not decrease them. When I was working on standardizing schemas with another agency, we went through a long process to create a new field, calculate the right number of characters into it, delete the old field, create a new field with the old name and the new length, calculate the values from the “holding” field, then delete the “holding” field.

Here is a snippet (which can also be modified to be a script tool) which takes the feature class, the name of the field to truncate, and a new length, and does this process for you. It doesn’t have messaging or error checking, so you’d have to add those in.

import arcpy
# get parameters
fc = "c:/geodatabase.gdb/featureclass" ## feature class
field = "field" ## name of field to truncate
newLength = 100 ## new length: whatever is needed
# create new name, length, expression to truncate existing data
newField = field + "tmp"
newLengthSlice = newLength - 1
# the expression slices up to the new field length
# uses string formatting to insert field name and new length
exp = "!{0}![0:{1}]".format(field, newLengthSlice)
# get list of field names using list comprehension
fieldNames = [f.name for f in arcpy.ListFields(fc)]
# if the "new" field already exists, delete it
if newField in fieldNames:
arcpy.DeleteField_management(fc, newField)
# add new field specifying target length
arcpy.AddField_management(fc,newField,"TEXT","","",newLength)
# calculate the new field using the expression which brings in only
# the left-most characters up to the character limit you specified (the new length)
arcpy.CalculateField_management(fc,newField,exp,"PYTHON_9.3")
# delete the old, too-big field
arcpy.DeleteField_management(fc,field)
# add a new field, named the same as the old field, with the new length
arcpy.AddField_management(fc,field,"TEXT","","",newLength)
# create second expression to populate the new/old field
exp2 = "!{0}!".format(newField)
# calculate field using second expression
arcpy.CalculateField_management(fc,field,exp2,"PYTHON_9.3")
# delete the field you used to make the transfer
arcpy.DeleteField_management(fc,newField)

ArcGIS Desktop documentation:

 

Working with feature class fields

A few snippets for working with feature class fields.

The basic method of accessing fields in arcpy is arcpy.ListFields().

import arcpy
featureClass = "c:/path/Geodatabase.gdb/fc"
# Get a list of field objects from a feature class
fcFields = arcpy.ListFields(featureClass)
# Print the field names, types and lengths using string formatting
for field in fcFields:
print("{0}: type = {1} length = {2}".format(field.name, field.type, field.length))
# Get a list of field names from a feature class using list comprehension
fieldNames = [field.name for field in arcpy.ListFields(featureClass)]

What is returned from arcpy.ListFields() is not field names, but a list of field objects which have accessible properties, including field name, type and length. These are printed above using string formatting. Getting a Python list of field names using list comprehension is straight out of the ArcGIS documentation. There are tons of great code samples there when you start sifting through it for specific tasks. I have learned to name the returned list of fields something like fieldList, and the returned list of names by naming it fieldNames so I can remember which data type I’m using.

Restricting the returned fields or field names

The following snippet illustrates ways you can restrict the returned fields: it’s always a good idea to limit the processing to what you actually need for the task at hand. For example, you can use a wildcard for field names in the middle parameter, or a field type in the third parameter. Another tidbit I found buried in a post by Drew Flater on an ArcGIS blog is how to prevent the return of required fields, such as OID or Shape, which can complicate things when using different data source types. Using list comprehension, he returns only those field names where the property “required” is false.

fc = "c:/path/geodatabase.gdb/featureclass"
# restrict returned fields by wildcard
fieldList = arcpy.ListFields(fc,"OWN*")
# restrict returned fields by field type
fieldList = arcpy.ListFields(fc,"","String")
# exclude required fields such as Shape and OID
# (https://blogs.esri.com/esri/arcgis/2012/08/30/field-mapping-and-python-scripting)
fieldNames = [field.name for field in arcpy.ListFields(fc) if not field.required]
Using dictionaries to compare fields

I have used Python dictionaries to compare the field characteristics of two feature classes. For example, I tested for fields of the same name but different lengths before appending. When the fields are automatically mapped by having the same name, the append operation will fail if the source string field is longer than the destination string field.

In the snippet below, two dictionaries are created using the field names and lengths as the key:value pairs. This means we can access the fields by name. Iterating through the destination feature class dictionary, the script looks for key matches (in this case, field names) in the source dictionary. Then, it can access the length of the field in each dictionary by referencing the key (field name) and compare the two integers.

import arcpy
srcFC = "c:/path/Geodatabase.gdb/srcFc"
destFC = "c:/path/Geodatabase.gdb/destFc"
# get field lists from each feature class, restricting to "String" feature types
srcFields = arcpy.ListFields(srcFC,"","String")
destFields = arcpy.ListFields(destFC,"","String")
# create empty dictionaries
srcDict = {}
destDict = {}
# populate dictionaries with key value pairs of field name: field length
# so that the properties can be accessed by field name
for srcField in srcFields:
srcDict[srcField.name] = [srcField.length]
for destField in destFields:
destDict[destField.name] = [destField.length]
# iterate through destination dictionary looking
# for matching fields in source
# access field lengths in dictionary by field name
# compare field lengths
for fieldName, fieldLen in destDict.items():
if fieldName in srcDict:
srcLen = srcDict[fieldName][0]
destLen = destDict[fieldName][0]
if destLen < srcLen:
print("ERROR String field length mismatch")
print(fieldName + " field is too long")
else:
print(fieldName + " all is good")