How is the data access cursor performance so enhanced compared to previous versions?

One of the developers of arcpy.da here. We got the performance where it is because performance was our primary concern: the main gripe with the old cursors were that they were slow, not that they lacked any particular functionality. The code uses the same underlying ArcObjects available in ArcGIS since 8.x (the CPython implementation of the search cursor, for example, looks a lot like code samples like this in its implementation except, you know, in C++ instead of C#).

The main two things we did to get the speedup are thus:

  1. Eliminate layers of abstraction: the initial implementation of the Python cursor was based on the old Dispatch/COM based GPDispatch object, which enabled one to use the same API in any language that could consume COM Dispatch objects. This means that you had an API that was not particularly well-optimized for any single environment, but it also meant that there were a lot of layers of abstraction for the COM objects to advertise and resolve methods at runtime, for example. If you remember before ArcGIS 9.3, it was possible to write geoprocessing scripts using that same clunky interface many languages, even Perl and Ruby. The extra paperwork an object needs to do to handle the IDispatch stuff adds a lot of complexity and slowdown to function calls.
  2. Make a tightly integrated, Python specific C++ library using Pythonic idioms and data structures: the idea of a Row object and the really strange while cursor.Next(): dance were just plain inefficient in Python. Fetching an item from a list is a very fast operation, and simplifies down to just a couple of CPython function calls (basically a __getitem__ call, heavily optimized on lists). Doing row.getValue("column") by comparison is more heavyweight: it does a __getattr__ to fetch the method (on which it needs to create a new bound method object), then call that method with the given arguments (__call__). Each part of the arcpy.da implementation is very closely integrated with the CPython API with a lot of hand-tuned C++ to make it fast, using native Python data structures (and numpy integration, too, for even more speed and memory efficiency).

You'll also notice that in nearly any benchmark (see these slides for example), arcobjects in .Net and C++ are still over twice as fast as arcpy.da in most tasks. Python code using arcpy.da is faster, but still not faster than a compiled, lower-level language.

TL;DR: da is faster because da is implemented in straight-up, unadulterated Arcobjects/C++/CPython which was specifically designed to result in fast Python code.

Performance related

  • Cursor only iterates through set list of fields by default (not the entire database)

Other not directly related to performance, but nice enhancements:

  • Ability to use tokens (e.g. SHAPE@LENGTH, SHAPE@XY) for accessing feature geometry
  • Ability to walk through databases (using arcpy.da.Walk method)