Alternatives to ogr2ogr for loading large GeoJson file(s) to PostGIS

Unfortunately JSON is, much like XML, badly suited for stream processing so almost all implementations require that the whole dataset be loaded in memory. While this is ok for small sets in your case there is no other option than breaking the dataset into smaller, manageable chunks.

Improving on Pablo's solution, here's one that does not require you to actually open and load the file into an editor and split by hand but tries to automate as much as possible the whole process.

Copy the json file onto a Unix host (linux, osx) or install cygwin tools on Windows. Then open a shell and use vim to remove first and last row from the file:

$ vim places.json

type dd to remove the first line, then SHIFT-G to move the end of the file, type dd again to remove last line. Now type :wq to save the changes. This should take just a couple of minutes at most.

Now we will harness the sheer power of unix to split the file in more manageable chunks. In the shell type:

$ split -l 10000 places.json places-chunks-

Go grab a beer. This will split the file into many smaller files, each containing 10000 lines. You can increase the number of lines, as long as you keep it small enough so that ogr2gr can manage it.

Now we are going to stick head and tail to each of the files:

$ echo '{"type":"FeatureCollection","features":[' > head
$ echo ']}' > tail
$ for f in places-chunks-* ; do cat head $f tail > $f.json && rm -f $f ; done

Go grab a snak. The first two commands simply create a header and footer file with the correct contents (just for convenience really), while the last will add header and footer to each of the chunks that we split above and remove the headerless/footerless chunk (to save space).

At this point you can hopefullyprocess the many places-chunks-*.json files with ogr2ogr:

$ for f in places-chunks-*.json ; do ogr2ogr -your-options-here $f ; done

The sample that you sent shows that it may be possible to manually split the file using an editor like notepad++

1)For each chunk create a header:

{"type":"FeatureCollection","features":[

2)After the header place many features:

{"geometry": {"type": "Point", "coordinates": [-103.422819, 20.686477]}, "type": "Feature", "id": "SG_3TspYXmaZcMIB8GxzXcayF_20.686477_-103.422819@1308163237", "properties": {"website": "http://www.buongiorno.com", "city": "M\u00e9xico D.F. ", "name": "Buongiorno", "tags": ["mobile", "vas", "community", "social-networking", "connected-devices", "android", "tablets", "smartphones"], "country": "MX", "classifiers": [{"category": "Professional", "type": "Services", "subcategory": "Computer Services"}], "href": "http://api.simplegeo.com/1.0/features/[email protected]", "address": "Le\u00f3n Tolstoi #18 PH Col. Anzures", "owner": "simplegeo", "postcode": "11590"}},

3) Finish the chunk with:

]}

EDIT - Here is python code that will split the file in pieces of defined size (in number of features):

import sys

class JsonFile(object):
    def __init__(self,file):
        self.file = open(file, 'r') 
    def split(self,csize):
        header=self.file.readline()
        number=0
        while True:
            output=open("chunk %s.geojson" %(number),'w')
            output.write(header)
            number+=1
            feature=self.file.readline()
            if feature==']}':
                break
            else:
                for i in range(csize):
                    output.write(feature)
                    feature=self.file.readline()
                    if feature==']}':
                        output.write("]}")
                        output.close()
                        sys.exit("Done!")
                output.write("]}")
                output.close()

if __name__=="__main__":
    myfile = JsonFile('places_mx.geojson')
    myfile.split(2000) #size of the chunks.

It should be straight forward to write a lazy reader and writer in Python that would convert your geojson file to the much smaller shapefile format or directly to SQL without doing it all in memory. Once converted, the native PostGIS tools can import large data sets. The geojson support in OGR is relatively new and there aren't any flags for handling large files.

If you can somehow share a manageable chunk of your file I could help you out.