Querying Esri File Geodatabase?

Personal geodatabase are great because they can be accessed outside of ArcGIS. But they are slow and, in practice, the performance starts degrading after they get larger than 400-500MB in size. If you can - do migrate to the file geodatabases.

There are many ways you can query your data using SQL-like syntax:

  1. Select By Attributes geoprocessing tool or a menu command will let you run your SQL where clause.

  2. You can use the Python window in ArcMap for running the Python code. For instance, arcpy.da.SearchCursor provides a rich interface into running your SQL queries while limiting the recordset. Look for the sql_clause argument in this help link. SQL prefix supports None, DISTINCT, and TOP. SQL postfix supports None, ORDER BY, and GROUP BY.

Some examples of the SQL queries:

  • Select distinct sub regions for all states table

The code:

import arcpy
states = r'C:\Program Files (x86)\ArcGIS\Desktop10.4\TemplateData\TemplateData.gdb\USA\states'
sql = ('DISTINCT', None)
with arcpy.da.SearchCursor(states,"SUB_REGION",sql_clause=sql) as cur:
    for row in cur:
        print row[0]

#E N Cen
#S Atl
#Mid Atl
#Mtn
#E S Cen
#N Eng
#W S Cen
#W N Cen
#Pacific
  • Group counties by state and fips code

The code:

import arcpy
counties = r'C:\Program Files (x86)\ArcGIS\Desktop10.4\TemplateData\TemplateData.gdb\USA\counties'
sql = (None, 'GROUP BY STATE_NAME, STATE_FIPS')
with arcpy.da.SearchCursor(counties,["STATE_NAME","STATE_FIPS"],sql_clause=sql) as cur:
    for row in cur:
        print row[0],row[1]
#Minnesota 27
#Washington 53
#Idaho 16
#Montana 30
#North Dakota 38
#Maine 23
#Wisconsin 55  
  1. pandas Python package provides rich interface for data analysis. This is pre-installed with ArcGIS 10.4 and can be installed easily for pre-10.4 versions of ArcGIS. There are great samples here that translate SQL queries into pandas syntax.

The code:

states = r'C:\Program Files (x86)\ArcGIS\Desktop10.4\TemplateData\TemplateData.gdb\USA\states'
fields = ['AREA','STATE_NAME','SUB_REGION']
df = pd.DataFrame.from_records([f for f in arcpy.da.SearchCursor(states,fields,"SUB_REGION = 'Pacific'")])
df.columns = fields
print df

    AREA  STATE_NAME SUB_REGION
0   6381      Hawaii    Pacific
1  67290  Washington    Pacific
2  97074      Oregon    Pacific
3 157776  California    Pacific
4 576594      Alaska    Pacific

Running SQL ORDER BY:

df.sort_values('AREA',ascending=False)
         AREA  STATE_NAME SUB_REGION
4  576594.104      Alaska    Pacific
3  157776.310  California    Pacific
2   97073.594      Oregon    Pacific
1   67290.061  Washington    Pacific
0    6380.614      Hawaii    Pacific

Running SQL TOP:

df.nlargest(3,'AREA')
         AREA  STATE_NAME SUB_REGION
4  576594.104      Alaska    Pacific
3  157776.310  California    Pacific
2   97073.594      Oregon    Pacific

At a last resort, you might consider to keep a copy of your data in other free DBMS (such as SQL Server Express, PostGIS, SQLite) to be able to take advantage of SQL querying. You can execute SQL queries from arcpy to DBMS using any decent Python package such as pymssql, SQLAlchemy, or psycopg2. Python also has sqlite3 module for working with SQLite databases (you can copy your data into the file-based database and run some queries there).


I would recommend staying with .gdb since it sounds like only ArcGIS desktop users will be querying the data. Here are some general concepts to help enhance querying:

  • To query data, first recommendation use arcpy da cursors (will only look at certain fields while querying, this should make querying tables with many fields more efficient) with if conditional logic for evaluating for values you are looking for. You may also use select layer by attribute arcpy method
  • Apply attribute and spatial indexes to improve querying
  • Use filter expression on cursors to query subsets within subsets of data