How to improve performance when using ArcGIS cursors in Python with big tables?

What if you fed the points into a numpy array and used a scipy cKDTree to look for neighbors. I process LiDAR point clouds with large numbers of points (> 20 million) in several MINUTES using this technique. There is documentation here for kdtree and here for numpy conversion. Basically, you read the x,y into an array, and iterate over each point in the array finding indices of points within a certain distance (neighborhood) of each point. You can use these indices to then calculate other attributes.


I am with Barbarossa... arcpy cursors are insanely lame, so I only use them to traverse a table or feature class exactly one time. If I can't get the job done in one cycle, I use the cursor to fill up some other kind of data structure and work with that.

If you do not want to hassle with numpy just make a simple python dictionary where you use your coordinates as a simple text key, and fill in the attributes you need for calculation into a list as the value of the dictionary item.

In a second step you can easily get the values you need to calculate a point by simply getting them from your dictionary (which is incredibly fast, because of the dictionaries hashindex of items).


For a regular grid, it should be by far more efficient to work in a raster format. Convert your first grid into a raster, the you can resample at the same resolution using a bilinear interpolator but shifting your output image by 1/2 pixel in X and Y, and back again to points if you still need to have points.

EDIT : for complex decisions rules, you can convert each of the fields that you need as a new raster band, then you make four copies of those bands and you shift you raster in the 4 directions by 1/2 pixel (+50, -50), (+50,+50), (-50,-50) and (-50,+50). Then you can use regular map algebra