Group by query in a file geodatabase

File geodatabase feature classes and tables lack some of the more advanced ability to query that a true relational database supports. Sometimes these queries can be simulated with arcpy cursors; one good example is a SQL group by query.

Typical SQL:

SELECT field, count(*) from table group by field order by count(*) desc;

In arcpy:

from collections import Counter

def group_by_count(table_or_fc, fields):
    """ Returns dictionary containing count of unique items """
    counter = Counter()
    with arcpy.da.SearchCursor(table_or_fc, fields) as curs:
        for row in curs:
            # no need to store as a tuple if only 1 field, just store the value
            if len(row) == 1:
                row = row[0]
            counter[row] += 1
    return counter

def group_by_count_formatted(table_or_fc, fields):
    """ prints out counts of unique values """
    counter = group_by_count(table_or_fc, fields)
    # sort yields highest count records first (order by count(*) desc)
    for key, count in sorted(counter.items(), reverse=True, key=lambda item: item[1]):
        print("{}: {:,}".format(str(key), count))

Example usage in ArcMap Python console (single field):

>>> group_by_count('junctions', 'ImpedanceType')
Counter({u'SmallStreet': 455145, u'LargeStreet': 28714, u'Stream': 9375, u'RailRoad': 1742})

>>> group_by_count_formatted('junctions', 'ImpedanceType')
SmallStreet: 455,145
LargeStreet: 28,714
Stream: 9,375
RailRoad: 1,742

Example usage in ArcMap Python console (multiple fields):

>>> group_by_count('junctions', ['ImpedanceType', 'InfrastructureType'])
Counter({('SmallStreet', 'New'): 318834, ('LargeStreet', 'New'): 28710, ('Stream', 'New'): 18379, ('Stream', 'Aerial'): 5806, ('RailRoad', 'New'): 4043, ('Stream', 'Underground'): 3227, ('RailRoad', 'Aerial'): 1035, ('RailRoad', 'Underground'): 668})

>>> group_by_count_formatted('junctions', ['ImpedanceType', 'InfrastructureType'])
('SmallStreet', 'New'): 318,834
('LargeStreet', 'New'): 28,710
('Stream', 'New'): 18,379
('Stream', 'Aerial'): 5,806
('RailRoad', 'New'): 4,043
('Stream', 'Underground'): 3,227
('RailRoad', 'Aerial'): 1,035
('RailRoad', 'Underground'): 668

These functions aren’t going to be as efficient as a SQL query, but they can be quite useful sometimes for ad-hoc data exploration - especially in the Arcmap/Pro console.

comments powered by Disqus