Data integration – on the edge

Snap to it

One of the challenges of integrating data across boundaries is edge matching, or in this case, line matching. Our data sharing group took the first step toward solving a seamless roads dataset by agreeing on “snap points”, where each agency would snap their roads at the boundary line with the next. Theoretically, if two people snapped the lines in their individual datasets to the same point, the data would be lined up and ready for integration. In the image, the bright green points are the snap points. At this scale, everything looks nice and happy.

snap points
Cross-jurisdictional roads data with snap points

 

unsnapped line
Unsnapped line near snap point

Zooming in, problems emerge. How to address this without manually examining every point in an eight-county area of over 8500 square miles? We know that looks are deceiving at large scales in ArcMap: on screen the line(s) might appear snapped when not, and vice versa.

Goal

The goal was provide an automated tool to check each snap point to make sure the nearby line endpoints are coincident. Hopefully, this tool can be run once by the person integrating the data. If problems are found, the source agency can be notified and encouraged to fix their data. Once fixed, the data should (again in theory) stay fixed.

I tried many things

Topology? I don’t know how to set up a topology rule which states that every point must be covered by at least two endpoints (not just one).

Intersect? This had promise – every point should be intersected by two lines. But wait, what if a line goes through instead of snapping at the endpoint? Errors I could see on the screen continued to be missed, causing me to mistrust the results. After many, many iterations of Select by Location, I cast my net a little further. I found disjoint.

Continue reading “Data integration – on the edge”

Checking for unique values in a feature class field

Unique Values

As part of a larger set of scripts, I needed to check for unique values in a feature class field. There are a number of manual ways to do this in ArcGIS Desktop, but here is a snippet to create a list of non-unique values and print them to the console. You could also write them to a file, or incorporate the snippet into a fix for the issue.

The basic concept is a “list within a list”. The Search Cursor is the master “list” that gets iterated over, and then values get added either to a unique value list or a duplicate value list (if they already exist in the unique value list). To speed up the analysis, I set up the Search Cursor parameters so that it only uses a single field. I don’t need any geometry or other fields in order to analyze the unique values in one field.

String formatting gets used in a couple of places here, substituting the {0} in an output string with the value in parentheses after str.format(). The “str” in the string methods represents the string you are formatting or otherwise altering. Another string method, str.join(), is used to print the list values, in this case separated by a new line: “\n”.join(list). If you wanted commas, you could use “,”.join(list).

import arcpy
# input feature class and field name to check for duplicates
inputFC = r"c:\mygeodatabase.gdb\myfeatureclass"
myField = "myField"
# create new, empty lists for duplicate values and unique values
dupeList = []
uniqueList = []
# create a search cursor with only a single field in the second parameter
# - the one you are analyzing.
# this is faster than the default "*" which returns all fields.
# iterate through all the values
with arcpy.da.SearchCursor(inputFC, [myField]) as cursor:
for row in cursor:
# get value in your field for this row of the ursor
value = row[0]
# check to see if the value already exists in the unique value list
if value in uniqueList:
# if it exists, append it to the list of duplicates
dupeList.append(value)
else:
# if it doesn't exist, append it to the list of unique values
uniqueList.append(value)
# if the duplicate values list has any values, print the # of them and then list them
if len(dupeList) > 0:
# uses string formatting to print the number of duplicates: the "length" of the list
# cast from an integer to a string for printing
print("Duplicate IDs found ({0})\n".format(str(len(dupeList))))
# prints each item in the duplicate value list separated by \n (new line)
print("\n".join(dupeList))
else:
print("No duplicates found in {0} values.".format(str(len(uniqueList))))

Python documentation:

ArcGIS Desktop documentation: