Streamlining Python Code for Big Data

The first thing I would do is monitor your system's resource utilization using something like Resource Monitor in Windows 7 or perfmon in Vista/XP to get a feel for whether you are CPU-, memory- or IO-bound.

If you are memory or IO-bound there is likely very little that you can do but upgrade hardware, reduce the problem size, or change the approach entirely.

If you determine that you are CPU-bound, I would experiment with the multiprocessing module, or one of the many other Python-based parallel processing packages available, to see if you can use more CPU cores to speed up your operations.

The trick to multiprocessing and parallelism in general is finding a good partitioning scheme that:

  1. Allows you to split up the inputs into smaller working sets, then recombine the results in a way that makes sense,
  2. Adds the least amount of overhead (some is unavoidable compared to serial processing), and
  3. Allows you to adjust the size of the working set to best utilize the system's resources for optimal performance.

You can use the script I created in this answer as a starting point: Porting Avenue code for Producing Building Shadows to ArcPy/Python for ArcGIS Desktop?

See also this ESRI Geoprocessing blog post on the subject: Python Multiprocessing – Approaches and Considerations

I think that your case is going to be even more challenging due to the more "black box" nature of the tools you are using, rather than the more fine-grained geometry arrays I was working with. Perhaps working with NumPy arrays may come in handy.

I also came across some interesting reading material if you wanted to look beyond arcpy:

  • Parallel Processing Algorithms for GIS. CRC Press, 1997. Richard Healey, Steve Dowers, Bruce Gittings, Mike J Mineter
  • Accelerating Raster Processing with Fine and Coarse Grain Parallelism in GRASS. Proceedings of the FOSS/GRASS Users Conference 2004 - Bangkok, Thailand, 12-14 September 2004. Onil Nazra Persada, Thierry Goubier.
  • Accelerating batch processing of spatial raster analysis using GPU. Computers & Geosciences, Volume 45, August 2012, Pages 212–220. Mathias Steinbach, Reinhard Hemmerling

Some algorithm changes that should help you.

Execute your selection first before the merge or integrate. This will cut down significantly on the later functions that are most expensive.

Merge and integrate are both memory expensive, so you want to keep eliminating features as you bring in feature classes, and try to do your merges in a binary tree to keep the size of the merges and integrates down. e.g. for four shapefiles you merge two shapefiles and integrate; merge two more shapefiles and integrate; merge the two resulting feature classes and integrate.

Your job queue starts as a queue of shapefile references. You also have a result queue to place results into. The run() method for your parallel processing worker will do these operations: Take two items off the queue. If no item is taken (queue is empty), terminate the worker. If one item is taken, put that item straight into the result queue.

If two items are taken, for each item: if it is a shapefile, select for z < 10 and create an in_memory feature class; else, it is already an in_memory feature class and skips the selection step. Merge the two in_memory feature classes to create a new in_memory feature class. Delete the original two feature classes. Execute integrate on the new feature class. Place that feature class into the result queue.

Then run an outer while loop. The loop starts with the shapefile queue and tests for length greater than 1. It then runs the queue through the workers. If the result queue has a length greater than 1, the while loop executes another parallel processing run through the workers until the result queue is 1 in_memory feature class.

e.g. If you start with 3500 shapefiles, your first queue will have 3500 jobs. Second will have 1750 jobs. 875, 438, 219, 110, 55, 28, 14, 7, 4, 2, 1. Your big bottleneck will be memory. If you do not have enough memory (and you will run out of memory in the create of the first result queue if that is the case), then modify your algorithm to merge more than 2 feature classes at once then integrate, which will cut down the size of your first result queue in exchange for longer processing time. Optionally, you could write output files and skip using in_memory feature classes. This will slow you down considerably, but would get past the memory bottleneck.

Only after you have performed merge and integrate on all of the shapefiles, ending with one single feature class, do you then perform the buffer, poly to raster, and reclassify. That way those three operations are performed only once and you keep your geometry simple.