Concatenate parsed fields

I have a feature class with parsed address fields and I need a single concatenated field. Some of the addresses have directionals like “N” or “S”, some have units, and some don’t. A simple expression will litter the field with extra spaces.

table of parsed addresses
Parsed addresses

Here is a short Python snippet that uses string formatting & the Update Cursor to populate a field with only the relevant data for each record.

# concatenate parsed address fields without extra spaces
import arcpy
featureClass = "c:\\myfakepath\\data.gdb\\fc"
# create an expression to find records where the STREETNO field is not blank
# this indicates an address to be concatenated
expression = "STREETNO IS NOT NULL"
# create a list of the fields to be concatenated
# (the last field in the list is the field to be concatenated in to)
fields = ["STREETNO","STREETDIR","STREETNAME","STREETSUF","STREETUNIT","STREETADDR"]
# create an update cursor limited to the fields in the fields list, and by the expression
try:
with arcpy.da.UpdateCursor(featureClass,fields,expression) as cursor:
for row in cursor:
if not row[1]: ## if streetdir is null
exp = "{0} {1}".format(str(row[0]),row[2]) ## streetno + streetname
else: ## streetdir is not null
exp = "{0} {1} {2}".format(row[0],row[1],row[2])## streetno + streetdir + streetname
if row[3]: ## if streetsuf is not null
exp += " {0}".format(row[3]) ## add streetsuf to exp
if row[4]: ## if streetunit is not null
exp += " {0}".format(row[4]) ## add streetunit to exp
row[5] = exp ## set streetaddr field = exp
cursor.updateRow(row)
except Exception as e:
error = "Failed: concat fields\n {0} {1}".format(arcpy.GetMessages(), str(e))
print(error)

Fields returned by the cursor are accessed by index in the order they are listed in the list, so row[0] refers to the value in the STREETNO field, row[1] is the STREETDIR field, and so on.

A series of conditionals tests whether the “extra” fields have content, and successively builds the final concatenation expression using string formatting. Here I am using numbers for placeholders in the string formatting substitution, but it is not strictly necessary.

Line 25 sets the appropriate field (which has an index of 5) to this expression, and then line 26 calls “updateRow” to make the change.

table of concatenated addresses
Concatenated addresses

ArcGIS Desktop documentation:

Checking for unique values in a feature class field

Unique Values

As part of a larger set of scripts, I needed to check for unique values in a feature class field. There are a number of manual ways to do this in ArcGIS Desktop, but here is a snippet to create a list of non-unique values and print them to the console. You could also write them to a file, or incorporate the snippet into a fix for the issue.

The basic concept is a “list within a list”. The Search Cursor is the master “list” that gets iterated over, and then values get added either to a unique value list or a duplicate value list (if they already exist in the unique value list). To speed up the analysis, I set up the Search Cursor parameters so that it only uses a single field. I don’t need any geometry or other fields in order to analyze the unique values in one field.

String formatting gets used in a couple of places here, substituting the {0} in an output string with the value in parentheses after str.format(). The “str” in the string methods represents the string you are formatting or otherwise altering. Another string method, str.join(), is used to print the list values, in this case separated by a new line: “\n”.join(list). If you wanted commas, you could use “,”.join(list).

import arcpy
# input feature class and field name to check for duplicates
inputFC = r"c:\mygeodatabase.gdb\myfeatureclass"
myField = "myField"
# create new, empty lists for duplicate values and unique values
dupeList = []
uniqueList = []
# create a search cursor with only a single field in the second parameter
# - the one you are analyzing.
# this is faster than the default "*" which returns all fields.
# iterate through all the values
with arcpy.da.SearchCursor(inputFC, [myField]) as cursor:
for row in cursor:
# get value in your field for this row of the ursor
value = row[0]
# check to see if the value already exists in the unique value list
if value in uniqueList:
# if it exists, append it to the list of duplicates
dupeList.append(value)
else:
# if it doesn't exist, append it to the list of unique values
uniqueList.append(value)
# if the duplicate values list has any values, print the # of them and then list them
if len(dupeList) > 0:
# uses string formatting to print the number of duplicates: the "length" of the list
# cast from an integer to a string for printing
print("Duplicate IDs found ({0})\n".format(str(len(dupeList))))
# prints each item in the duplicate value list separated by \n (new line)
print("\n".join(dupeList))
else:
print("No duplicates found in {0} values.".format(str(len(uniqueList))))

Python documentation:

ArcGIS Desktop documentation:

Working with feature class fields

A few snippets for working with feature class fields.

The basic method of accessing fields in arcpy is arcpy.ListFields().

import arcpy
featureClass = "c:/path/Geodatabase.gdb/fc"
# Get a list of field objects from a feature class
fcFields = arcpy.ListFields(featureClass)
# Print the field names, types and lengths using string formatting
for field in fcFields:
print("{0}: type = {1} length = {2}".format(field.name, field.type, field.length))
# Get a list of field names from a feature class using list comprehension
fieldNames = [field.name for field in arcpy.ListFields(featureClass)]

What is returned from arcpy.ListFields() is not field names, but a list of field objects which have accessible properties, including field name, type and length. These are printed above using string formatting. Getting a Python list of field names using list comprehension is straight out of the ArcGIS documentation. There are tons of great code samples there when you start sifting through it for specific tasks. I have learned to name the returned list of fields something like fieldList, and the returned list of names by naming it fieldNames so I can remember which data type I’m using.

Restricting the returned fields or field names

The following snippet illustrates ways you can restrict the returned fields: it’s always a good idea to limit the processing to what you actually need for the task at hand. For example, you can use a wildcard for field names in the middle parameter, or a field type in the third parameter. Another tidbit I found buried in a post by Drew Flater on an ArcGIS blog is how to prevent the return of required fields, such as OID or Shape, which can complicate things when using different data source types. Using list comprehension, he returns only those field names where the property “required” is false.

fc = "c:/path/geodatabase.gdb/featureclass"
# restrict returned fields by wildcard
fieldList = arcpy.ListFields(fc,"OWN*")
# restrict returned fields by field type
fieldList = arcpy.ListFields(fc,"","String")
# exclude required fields such as Shape and OID
# (https://blogs.esri.com/esri/arcgis/2012/08/30/field-mapping-and-python-scripting)
fieldNames = [field.name for field in arcpy.ListFields(fc) if not field.required]
Using dictionaries to compare fields

I have used Python dictionaries to compare the field characteristics of two feature classes. For example, I tested for fields of the same name but different lengths before appending. When the fields are automatically mapped by having the same name, the append operation will fail if the source string field is longer than the destination string field.

In the snippet below, two dictionaries are created using the field names and lengths as the key:value pairs. This means we can access the fields by name. Iterating through the destination feature class dictionary, the script looks for key matches (in this case, field names) in the source dictionary. Then, it can access the length of the field in each dictionary by referencing the key (field name) and compare the two integers.

import arcpy
srcFC = "c:/path/Geodatabase.gdb/srcFc"
destFC = "c:/path/Geodatabase.gdb/destFc"
# get field lists from each feature class, restricting to "String" feature types
srcFields = arcpy.ListFields(srcFC,"","String")
destFields = arcpy.ListFields(destFC,"","String")
# create empty dictionaries
srcDict = {}
destDict = {}
# populate dictionaries with key value pairs of field name: field length
# so that the properties can be accessed by field name
for srcField in srcFields:
srcDict[srcField.name] = [srcField.length]
for destField in destFields:
destDict[destField.name] = [destField.length]
# iterate through destination dictionary looking
# for matching fields in source
# access field lengths in dictionary by field name
# compare field lengths
for fieldName, fieldLen in destDict.items():
if fieldName in srcDict:
srcLen = srcDict[fieldName][0]
destLen = destDict[fieldName][0]
if destLen < srcLen:
print("ERROR String field length mismatch")
print(fieldName + " field is too long")
else:
print(fieldName + " all is good")