Extracting vector/point data from single-layer, non-georeferenced vector PDF file

You can test what GDAL can find from the PDF document with ogrinfo and gdalinfo.

ogrinfo Afghan_Mingeol_V2.pdf
Warning 4: Failed to open Afghan_Mingeol_V2.pdf, No error.
Unable to open datasource `Afghan_Mingeol_V2.pdf' with the following drivers.

Result means that GDAL could not find vector data from the PDF.

gdalinfo Afghan_Mingeol_V2.pdf
Driver: PDF/Geospatial PDF
Files: Afghan_Mingeol_V2.pdf
Size is 11400, 8100
Coordinate System is `'
  CREATOR=Adobe Illustrator CS2
  PRODUCER=Adobe PDF library 7.77
Image Structure Metadata:
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0, 8100.0)
Upper Right (11400.0,    0.0)
Lower Right (11400.0, 8100.0)
Center      ( 5700.0, 4050.0)
Band 1 Block=1024x1024 Type=Byte, ColorInterp=Red
Band 2 Block=1024x1024 Type=Byte, ColorInterp=Green
Band 3 Block=1024x1024 Type=Byte, ColorInterp=Blue

This result means that PDF file is written as a single layer. If categories were written as separate layers then you could select a certain layer with gdal_translate, write it into a new raster file and vectorize it with gdal_polygonize.py http://www.gdal.org/gdal_polygonize.html. Now I fear that you are pretty much out of luck with GDAL and QGIS.

Just change your approach. In fact, you maybe don't need to fight with technical issue.


I was able to found metadata concerning the map data you are trying to extract. Metadata reference each shapefile used to produce the map.

Then, I was able to find the original layers and not only their reference in the metadata. Look at this other USGS link and just use the shp.

PS: I may be wrong as I didn't inspect all the datasets in detail

Since this map was likely created in Illustrator try deconstructing it with Illustrator.

Open PDF in Illustrator and all 272 appear and are correctly named.

Turn off/delete any unneeded raster such as the shaded relief

Alternatively delete ALL unneeded layers and only keep the lithology/symbols you want.

Export map to DWG of DXF

Open in ArcMap

Of course DXF/DWG is vector based so instead of points you will get the actual polygons/outlines of the symbols but you could convert to centroids with attributes with a simple script. On the other hand you will get all the vectors with "layer name" attribute. You can control editability (apeareance vs maximum editability) in the DXF/DWG export options dialog.

The benefit of this aproach is that ALL layers area preserved.

As far as georeferencing goes; convert all to shapefiles and use the spatial adjustment tools in ArcGIS. Since coordinates are given on the map, create projected points matching those coordinates and snap the corners of the grid and the tics (which also import into the same DXF as from illustrator) to these points.

Screen capture from Illustrator: enter image description here

After exort to DWG and opened in ArcMap:

enter image description here