What is the best hack for importing large datasets into PostGIS?

I made a test for you:

  • PostgreSQL 9.3
  • PostGIS 2.1
  • Windows 7
  • i7 [email protected] GHz processor
  • GDAL 2.0-dev 64-bit
  • shapefile of 1.14 million polygons, file size 748 MB

Ogr2ogr command:

ogr2ogr -f PostgreSQL PG:"dbname='databasename' host='addr' port='5432' user='x' password='y'" test.shp --config PG_USE_COPY YES -nlt MULTIPOLYGON

Total time:1 minute 30 sec


After the suggestions of user30184, Paul Ramsey and my own experiments. I decided to answer this question.

I failed to mention in this question that I am importing data to a remote server. (although it is described in the blog post I refer to). Operations such as inserts, over the internet are subject to a network latency. Perhaps it is not irrelevant to mention that this server is on Amazon RDS, which prevents me from ssh to the machine and run operations locally.

Having this in mind, I re-engineered my approach, using the "\copy" directive to promote a dump of the data into a new table. I think this strategy is an essential key, which was also referred on the comments/answers to this question.

psql database -U user -h host.eu-west-1.rds.amazonaws.com -c "\copy newt_table from 'data.csv' with DELIMITER ','"

This operation was incredibly fast. Since I imported a csv, I then had all the work of populating the geometry, adding a spatial index, etc. It was still remarkably fast, since I was then running queries on the server.

I decided to benchmark also the suggestions from user30184, Paul Ramsey. My data file was a point shapefile with 3035369 records, and 82 MB.

The ogr2ogr approach (using the PG_USE_COPY directive) finished in 1:03:00 m, which is still *much better than before.

The shp2pgsql approach (using the -D directive) finished in only 00:01:04 m.

It is worth to say that ogr2ogr created a spatial index during the operation, while shp2pgsql did not. I find out that it is much more efficient to create the index after doing the import, rather than bloating the import operation with this type of request.

The conclusion is: shp2pgsql, when properly parameterized, is extremely well suited to perform large imports, namely those to be accomodated whithin Amazon Web Services.

Spatial table with more than 3 million records, imported using shp2pgsql

You can read a more detailed description of these conclusions, on the update of this post.