What's the easiest way to do a one-time mass geocode? (580,000 US addresses)

You could try the Street Address to Coordinates tool from the Data Science Toolkit.

This API takes either a single string representing a postal address, or a JSON-encoded  
array of addresses, and returns a JSON object with a key for every address. The value 
for each key is either null if no information was found for the address, or an object 
containing location information, including country, region, city and latitude/longitude 
coordinates. Here's an example:

Not sure what the API limits are for Pete Warden's hosted copy but you could run the toolkit yourself and do your processing offline as @Devdatta suggests. There is a downloadable virtual machine that is contains all the tools in the website. Good luck :)


With This amount of data, I would suggest that you do an offline geocoding. Just the http requests for these many records would be classified as a DoS attack by any server.


Geocoding will result in points - 580,000 of them. Are you sure you want to display them all on a map? So many clustered points will likely make the map illegible. These problems assume you find a way to geocode so many records.

The City of Philadelphia's parcel records are available as a polygon layer. Furthermore, those polygons are already available as a map service. If the data/service is suitable for your needs then you don't have to worry about geocoding so many points, and the polygons will most likely look better than so many points on the map.

Information about the data (including metadata and download) and map service:

http://www.pasda.psu.edu/uci/MapService.aspx?Dataset=462

Preview of the map service (zoom in for better view)

http://maps.psiee.psu.edu/preview/map.ashx?layer=462

Tags:

Geocoding