string formatting – mtgeogeek

Concatenate parsed fields

I have a feature class with parsed address fields and I need a single concatenated field. Some of the addresses have directionals like “N” or “S”, some have units, and some don’t. A simple expression will litter the field with extra spaces.

table of parsed addresses — Parsed addresses

Here is a short Python snippet that uses string formatting & the Update Cursor to populate a field with only the relevant data for each record.

	# concatenate parsed address fields without extra spaces
	import arcpy
	featureClass = "c:\\myfakepath\\data.gdb\\fc"

	# create an expression to find records where the STREETNO field is not blank
	# this indicates an address to be concatenated
	expression = "STREETNO IS NOT NULL"

	# create a list of the fields to be concatenated
	# (the last field in the list is the field to be concatenated in to)
	fields = ["STREETNO","STREETDIR","STREETNAME","STREETSUF","STREETUNIT","STREETADDR"]

	# create an update cursor limited to the fields in the fields list, and by the expression
	try:
	with arcpy.da.UpdateCursor(featureClass,fields,expression) as cursor:
	for row in cursor:
	if not row[1]: ## if streetdir is null
	exp = "{0} {1}".format(str(row[0]),row[2]) ## streetno + streetname
	else: ## streetdir is not null
	exp = "{0} {1} {2}".format(row[0],row[1],row[2])## streetno + streetdir + streetname
	if row[3]: ## if streetsuf is not null
	exp += " {0}".format(row[3]) ## add streetsuf to exp
	if row[4]: ## if streetunit is not null
	exp += " {0}".format(row[4]) ## add streetunit to exp
	row[5] = exp ## set streetaddr field = exp
	cursor.updateRow(row)
	except Exception as e:
	error = "Failed: concat fields\n {0} {1}".format(arcpy.GetMessages(), str(e))
	print(error)

view raw snippet-concat.py hosted with ❤ by GitHub

Fields returned by the cursor are accessed by index in the order they are listed in the list, so row[0] refers to the value in the STREETNO field, row[1] is the STREETDIR field, and so on.

A series of conditionals tests whether the “extra” fields have content, and successively builds the final concatenation expression using string formatting. Here I am using numbers for placeholders in the string formatting substitution, but it is not strictly necessary.

Line 25 sets the appropriate field (which has an index of 5) to this expression, and then line 26 calls “updateRow” to make the change.

table of concatenated addresses — Concatenated addresses

ArcGIS Desktop documentation:

arcpy.da.updateCursor

Checking for unique values in a feature class field

Unique Values

As part of a larger set of scripts, I needed to check for unique values in a feature class field. There are a number of manual ways to do this in ArcGIS Desktop, but here is a snippet to create a list of non-unique values and print them to the console. You could also write them to a file, or incorporate the snippet into a fix for the issue.

The basic concept is a “list within a list”. The Search Cursor is the master “list” that gets iterated over, and then values get added either to a unique value list or a duplicate value list (if they already exist in the unique value list). To speed up the analysis, I set up the Search Cursor parameters so that it only uses a single field. I don’t need any geometry or other fields in order to analyze the unique values in one field.

String formatting gets used in a couple of places here, substituting the {0} in an output string with the value in parentheses after str.format(). The “str” in the string methods represents the string you are formatting or otherwise altering. Another string method, str.join(), is used to print the list values, in this case separated by a new line: “\n”.join(list). If you wanted commas, you could use “,”.join(list).

	import arcpy

	# input feature class and field name to check for duplicates
	inputFC = r"c:\mygeodatabase.gdb\myfeatureclass"
	myField = "myField"

	# create new, empty lists for duplicate values and unique values
	dupeList = []
	uniqueList = []

	# create a search cursor with only a single field in the second parameter
	# - the one you are analyzing.
	# this is faster than the default "*" which returns all fields.
	# iterate through all the values

	with arcpy.da.SearchCursor(inputFC, [myField]) as cursor:
	for row in cursor:
	# get value in your field for this row of the ursor
	value = row[0]
	# check to see if the value already exists in the unique value list
	if value in uniqueList:
	# if it exists, append it to the list of duplicates
	dupeList.append(value)
	else:
	# if it doesn't exist, append it to the list of unique values
	uniqueList.append(value)

	# if the duplicate values list has any values, print the # of them and then list them
	if len(dupeList) > 0:
	# uses string formatting to print the number of duplicates: the "length" of the list
	# cast from an integer to a string for printing
	print("Duplicate IDs found ({0})\n".format(str(len(dupeList))))
	# prints each item in the duplicate value list separated by \n (new line)
	print("\n".join(dupeList))
	else:
	print("No duplicates found in {0} values.".format(str(len(uniqueList))))

view raw snippet-unique-values.py hosted with ❤ by GitHub

Python documentation:

ArcGIS Desktop documentation:

arcpy.da.SearchCursor

Working with feature class fields

A few snippets for working with feature class fields.

The basic method of accessing fields in arcpy is arcpy.ListFields().

	import arcpy
	featureClass = "c:/path/Geodatabase.gdb/fc"

	# Get a list of field objects from a feature class
	fcFields = arcpy.ListFields(featureClass)

	# Print the field names, types and lengths using string formatting
	for field in fcFields:
	print("{0}: type = {1} length = {2}".format(field.name, field.type, field.length))

	# Get a list of field names from a feature class using list comprehension
	fieldNames = [field.name for field in arcpy.ListFields(featureClass)]

view raw snippet-fc-fields.py hosted with ❤ by GitHub

What is returned from arcpy.ListFields() is not field names, but a list of field objects which have accessible properties, including field name, type and length. These are printed above using string formatting. Getting a Python list of field names using list comprehension is straight out of the ArcGIS documentation. There are tons of great code samples there when you start sifting through it for specific tasks. I have learned to name the returned list of fields something like fieldList, and the returned list of names by naming it fieldNames so I can remember which data type I’m using.

Restricting the returned fields or field names

The following snippet illustrates ways you can restrict the returned fields: it’s always a good idea to limit the processing to what you actually need for the task at hand. For example, you can use a wildcard for field names in the middle parameter, or a field type in the third parameter. Another tidbit I found buried in a post by Drew Flater on an ArcGIS blog is how to prevent the return of required fields, such as OID or Shape, which can complicate things when using different data source types. Using list comprehension, he returns only those field names where the property “required” is false.

	fc = "c:/path/geodatabase.gdb/featureclass"

	# restrict returned fields by wildcard
	fieldList = arcpy.ListFields(fc,"OWN*")

	# restrict returned fields by field type
	fieldList = arcpy.ListFields(fc,"","String")

	# exclude required fields such as Shape and OID
	# (https://blogs.esri.com/esri/arcgis/2012/08/30/field-mapping-and-python-scripting)
	fieldNames = [field.name for field in arcpy.ListFields(fc) if not field.required]

view raw snippet-get-fields-restrict.py hosted with ❤ by GitHub

Using dictionaries to compare fields

I have used Python dictionaries to compare the field characteristics of two feature classes. For example, I tested for fields of the same name but different lengths before appending. When the fields are automatically mapped by having the same name, the append operation will fail if the source string field is longer than the destination string field.

In the snippet below, two dictionaries are created using the field names and lengths as the key:value pairs. This means we can access the fields by name. Iterating through the destination feature class dictionary, the script looks for key matches (in this case, field names) in the source dictionary. Then, it can access the length of the field in each dictionary by referencing the key (field name) and compare the two integers.

	import arcpy

	srcFC = "c:/path/Geodatabase.gdb/srcFc"
	destFC = "c:/path/Geodatabase.gdb/destFc"

	# get field lists from each feature class, restricting to "String" feature types
	srcFields = arcpy.ListFields(srcFC,"","String")
	destFields = arcpy.ListFields(destFC,"","String")

	# create empty dictionaries
	srcDict = {}
	destDict = {}

	# populate dictionaries with key value pairs of field name: field length
	# so that the properties can be accessed by field name
	for srcField in srcFields:
	srcDict[srcField.name] = [srcField.length]

	for destField in destFields:
	destDict[destField.name] = [destField.length]

	# iterate through destination dictionary looking
	# for matching fields in source
	# access field lengths in dictionary by field name
	# compare field lengths
	for fieldName, fieldLen in destDict.items():
	if fieldName in srcDict:
	srcLen = srcDict[fieldName][0]
	destLen = destDict[fieldName][0]
	if destLen < srcLen:
	print("ERROR String field length mismatch")
	print(fieldName + " field is too long")
	else:
	print(fieldName + " all is good")

view raw snippet-fieldlen-compare.py hosted with ❤ by GitHub