Python – mtgeogeek

Data integration – on the edge

Snap to it

One of the challenges of integrating data across boundaries is edge matching, or in this case, line matching. Our data sharing group took the first step toward solving a seamless roads dataset by agreeing on “snap points”, where each agency would snap their roads at the boundary line with the next. Theoretically, if two people snapped the lines in their individual datasets to the same point, the data would be lined up and ready for integration. In the image, the bright green points are the snap points. At this scale, everything looks nice and happy.

Cross-jurisdictional roads data with snap points

unsnapped line — Unsnapped line near snap point

Zooming in, problems emerge. How to address this without manually examining every point in an eight-county area of over 8500 square miles? We know that looks are deceiving at large scales in ArcMap: on screen the line(s) might appear snapped when not, and vice versa.

Goal

The goal was provide an automated tool to check each snap point to make sure the nearby line endpoints are coincident. Hopefully, this tool can be run once by the person integrating the data. If problems are found, the source agency can be notified and encouraged to fix their data. Once fixed, the data should (again in theory) stay fixed.

I tried many things

Topology? I don’t know how to set up a topology rule which states that every point must be covered by at least two endpoints (not just one).

Intersect? This had promise – every point should be intersected by two lines. But wait, what if a line goes through instead of snapping at the endpoint? Errors I could see on the screen continued to be missed, causing me to mistrust the results. After many, many iterations of Select by Location, I cast my net a little further. I found disjoint.

Continue reading “Data integration – on the edge”

Concatenate parsed fields

I have a feature class with parsed address fields and I need a single concatenated field. Some of the addresses have directionals like “N” or “S”, some have units, and some don’t. A simple expression will litter the field with extra spaces.

table of parsed addresses — Parsed addresses

Here is a short Python snippet that uses string formatting & the Update Cursor to populate a field with only the relevant data for each record.

	# concatenate parsed address fields without extra spaces
	import arcpy
	featureClass = "c:\\myfakepath\\data.gdb\\fc"

	# create an expression to find records where the STREETNO field is not blank
	# this indicates an address to be concatenated
	expression = "STREETNO IS NOT NULL"

	# create a list of the fields to be concatenated
	# (the last field in the list is the field to be concatenated in to)
	fields = ["STREETNO","STREETDIR","STREETNAME","STREETSUF","STREETUNIT","STREETADDR"]

	# create an update cursor limited to the fields in the fields list, and by the expression
	try:
	with arcpy.da.UpdateCursor(featureClass,fields,expression) as cursor:
	for row in cursor:
	if not row[1]: ## if streetdir is null
	exp = "{0} {1}".format(str(row[0]),row[2]) ## streetno + streetname
	else: ## streetdir is not null
	exp = "{0} {1} {2}".format(row[0],row[1],row[2])## streetno + streetdir + streetname
	if row[3]: ## if streetsuf is not null
	exp += " {0}".format(row[3]) ## add streetsuf to exp
	if row[4]: ## if streetunit is not null
	exp += " {0}".format(row[4]) ## add streetunit to exp
	row[5] = exp ## set streetaddr field = exp
	cursor.updateRow(row)
	except Exception as e:
	error = "Failed: concat fields\n {0} {1}".format(arcpy.GetMessages(), str(e))
	print(error)

view raw snippet-concat.py hosted with ❤ by GitHub

Fields returned by the cursor are accessed by index in the order they are listed in the list, so row[0] refers to the value in the STREETNO field, row[1] is the STREETDIR field, and so on.

A series of conditionals tests whether the “extra” fields have content, and successively builds the final concatenation expression using string formatting. Here I am using numbers for placeholders in the string formatting substitution, but it is not strictly necessary.

Line 25 sets the appropriate field (which has an index of 5) to this expression, and then line 26 calls “updateRow” to make the change.

table of concatenated addresses — Concatenated addresses

ArcGIS Desktop documentation:

arcpy.da.updateCursor

Checking for unique values in a feature class field

Unique Values

As part of a larger set of scripts, I needed to check for unique values in a feature class field. There are a number of manual ways to do this in ArcGIS Desktop, but here is a snippet to create a list of non-unique values and print them to the console. You could also write them to a file, or incorporate the snippet into a fix for the issue.

The basic concept is a “list within a list”. The Search Cursor is the master “list” that gets iterated over, and then values get added either to a unique value list or a duplicate value list (if they already exist in the unique value list). To speed up the analysis, I set up the Search Cursor parameters so that it only uses a single field. I don’t need any geometry or other fields in order to analyze the unique values in one field.

String formatting gets used in a couple of places here, substituting the {0} in an output string with the value in parentheses after str.format(). The “str” in the string methods represents the string you are formatting or otherwise altering. Another string method, str.join(), is used to print the list values, in this case separated by a new line: “\n”.join(list). If you wanted commas, you could use “,”.join(list).

	import arcpy

	# input feature class and field name to check for duplicates
	inputFC = r"c:\mygeodatabase.gdb\myfeatureclass"
	myField = "myField"

	# create new, empty lists for duplicate values and unique values
	dupeList = []
	uniqueList = []

	# create a search cursor with only a single field in the second parameter
	# - the one you are analyzing.
	# this is faster than the default "*" which returns all fields.
	# iterate through all the values

	with arcpy.da.SearchCursor(inputFC, [myField]) as cursor:
	for row in cursor:
	# get value in your field for this row of the ursor
	value = row[0]
	# check to see if the value already exists in the unique value list
	if value in uniqueList:
	# if it exists, append it to the list of duplicates
	dupeList.append(value)
	else:
	# if it doesn't exist, append it to the list of unique values
	uniqueList.append(value)

	# if the duplicate values list has any values, print the # of them and then list them
	if len(dupeList) > 0:
	# uses string formatting to print the number of duplicates: the "length" of the list
	# cast from an integer to a string for printing
	print("Duplicate IDs found ({0})\n".format(str(len(dupeList))))
	# prints each item in the duplicate value list separated by \n (new line)
	print("\n".join(dupeList))
	else:
	print("No duplicates found in {0} values.".format(str(len(uniqueList))))

view raw snippet-unique-values.py hosted with ❤ by GitHub

Python documentation:

ArcGIS Desktop documentation:

arcpy.da.SearchCursor

Truncating feature class fields

Failed on input OID…

Ever had an append or other operation fail because you are trying to squeeze a longer field into a shorter field?

append failure

While it’s neat that you can now change text field lengths in ArcCatalog, you can only increase them, not decrease them. When I was working on standardizing schemas with another agency, we went through a long process to create a new field, calculate the right number of characters into it, delete the old field, create a new field with the old name and the new length, calculate the values from the “holding” field, then delete the “holding” field.

Here is a snippet (which can also be modified to be a script tool) which takes the feature class, the name of the field to truncate, and a new length, and does this process for you. It doesn’t have messaging or error checking, so you’d have to add those in.

	import arcpy

	# get parameters
	fc = "c:/geodatabase.gdb/featureclass" ## feature class
	field = "field" ## name of field to truncate
	newLength = 100 ## new length: whatever is needed

	# create new name, length, expression to truncate existing data
	newField = field + "tmp"
	newLengthSlice = newLength - 1

	# the expression slices up to the new field length
	# uses string formatting to insert field name and new length
	exp = "!{0}![0:{1}]".format(field, newLengthSlice)

	# get list of field names using list comprehension
	fieldNames = [f.name for f in arcpy.ListFields(fc)]

	# if the "new" field already exists, delete it
	if newField in fieldNames:
	arcpy.DeleteField_management(fc, newField)

	# add new field specifying target length
	arcpy.AddField_management(fc,newField,"TEXT","","",newLength)

	# calculate the new field using the expression which brings in only
	# the left-most characters up to the character limit you specified (the new length)
	arcpy.CalculateField_management(fc,newField,exp,"PYTHON_9.3")

	# delete the old, too-big field
	arcpy.DeleteField_management(fc,field)

	# add a new field, named the same as the old field, with the new length
	arcpy.AddField_management(fc,field,"TEXT","","",newLength)

	# create second expression to populate the new/old field
	exp2 = "!{0}!".format(newField)

	# calculate field using second expression
	arcpy.CalculateField_management(fc,field,exp2,"PYTHON_9.3")

	# delete the field you used to make the transfer
	arcpy.DeleteField_management(fc,newField)

view raw snippet-fieldtruncate.py hosted with ❤ by GitHub

ArcGIS Desktop documentation:

Working with feature class fields

A few snippets for working with feature class fields.

The basic method of accessing fields in arcpy is arcpy.ListFields().

	import arcpy
	featureClass = "c:/path/Geodatabase.gdb/fc"

	# Get a list of field objects from a feature class
	fcFields = arcpy.ListFields(featureClass)

	# Print the field names, types and lengths using string formatting
	for field in fcFields:
	print("{0}: type = {1} length = {2}".format(field.name, field.type, field.length))

	# Get a list of field names from a feature class using list comprehension
	fieldNames = [field.name for field in arcpy.ListFields(featureClass)]

view raw snippet-fc-fields.py hosted with ❤ by GitHub

What is returned from arcpy.ListFields() is not field names, but a list of field objects which have accessible properties, including field name, type and length. These are printed above using string formatting. Getting a Python list of field names using list comprehension is straight out of the ArcGIS documentation. There are tons of great code samples there when you start sifting through it for specific tasks. I have learned to name the returned list of fields something like fieldList, and the returned list of names by naming it fieldNames so I can remember which data type I’m using.

Restricting the returned fields or field names

The following snippet illustrates ways you can restrict the returned fields: it’s always a good idea to limit the processing to what you actually need for the task at hand. For example, you can use a wildcard for field names in the middle parameter, or a field type in the third parameter. Another tidbit I found buried in a post by Drew Flater on an ArcGIS blog is how to prevent the return of required fields, such as OID or Shape, which can complicate things when using different data source types. Using list comprehension, he returns only those field names where the property “required” is false.

	fc = "c:/path/geodatabase.gdb/featureclass"

	# restrict returned fields by wildcard
	fieldList = arcpy.ListFields(fc,"OWN*")

	# restrict returned fields by field type
	fieldList = arcpy.ListFields(fc,"","String")

	# exclude required fields such as Shape and OID
	# (https://blogs.esri.com/esri/arcgis/2012/08/30/field-mapping-and-python-scripting)
	fieldNames = [field.name for field in arcpy.ListFields(fc) if not field.required]

view raw snippet-get-fields-restrict.py hosted with ❤ by GitHub

Using dictionaries to compare fields

I have used Python dictionaries to compare the field characteristics of two feature classes. For example, I tested for fields of the same name but different lengths before appending. When the fields are automatically mapped by having the same name, the append operation will fail if the source string field is longer than the destination string field.

In the snippet below, two dictionaries are created using the field names and lengths as the key:value pairs. This means we can access the fields by name. Iterating through the destination feature class dictionary, the script looks for key matches (in this case, field names) in the source dictionary. Then, it can access the length of the field in each dictionary by referencing the key (field name) and compare the two integers.

	import arcpy

	srcFC = "c:/path/Geodatabase.gdb/srcFc"
	destFC = "c:/path/Geodatabase.gdb/destFc"

	# get field lists from each feature class, restricting to "String" feature types
	srcFields = arcpy.ListFields(srcFC,"","String")
	destFields = arcpy.ListFields(destFC,"","String")

	# create empty dictionaries
	srcDict = {}
	destDict = {}

	# populate dictionaries with key value pairs of field name: field length
	# so that the properties can be accessed by field name
	for srcField in srcFields:
	srcDict[srcField.name] = [srcField.length]

	for destField in destFields:
	destDict[destField.name] = [destField.length]

	# iterate through destination dictionary looking
	# for matching fields in source
	# access field lengths in dictionary by field name
	# compare field lengths
	for fieldName, fieldLen in destDict.items():
	if fieldName in srcDict:
	srcLen = srcDict[fieldName][0]
	destLen = destDict[fieldName][0]
	if destLen < srcLen:
	print("ERROR String field length mismatch")
	print(fieldName + " field is too long")
	else:
	print(fieldName + " all is good")

view raw snippet-fieldlen-compare.py hosted with ❤ by GitHub