Convert huge XYZ CSV to GeoTIFF

You can do this using GDAL, it directly supports XYZ format. It doesn't matter if your coordinates are UTM, gdal_translate will output in the same coordinate system.

So to convert to GeoTIFF is as simple as:

gdal_translate test.xyz test.tif

Look at the GeoTIFF doc for output options (such as compression) and the gdal_translate doc for more usage info. In particular, you should specify what the coordinate system is with the -a_srs parameter.

-a_srs srs_def:

Override the projection for the output file. The srs_def may be any of the usual GDAL/OGR forms, complete WKT, PROJ.4, EPSG:n or a file containing the WKT.

gdal_translate -a_srs EPSG:12345 test.xyz test.tif

Comma/space separated and fixed column widths, with and without a header row are supported.

The supported column separators are space, comma, semicolon and tabulations.

$ head -n 2 test_space.xyz 
x y z
146.360047076550984 -39.0631214488636616 0.627969205379486084

$ gdalinfo test_space.xyz
Driver: XYZ/ASCII Gridded XYZ
Files: test_space.xyz
Size is 84, 66
Coordinate System is `'
Origin = (146.359922066953317,-39.062997159090934)
Pixel Size = (0.000250019195332,-0.000248579545455)
Corner Coordinates:
Upper Left  ( 146.3599221, -39.0629972) 
Lower Left  ( 146.3599221, -39.0794034) 
Upper Right ( 146.3809237, -39.0629972) 
Lower Right ( 146.3809237, -39.0794034) 
Center      ( 146.3704229, -39.0712003) 
Band 1 Block=84x1 Type=Float32, ColorInterp=Undefined
  Min=0.336 Max=0.721 

$ head -n 2 test_commas.xyz 
x, y, z
146.360047076550984, -39.0631214488636616, 0.627969205379486084

$ gdalinfo test_commas.xyz
Driver: XYZ/ASCII Gridded XYZ
etc...

$ head -n 2 test_formatted.xyz
x                       y                       z
146.3600471            -39.06312145             0.627969205

$ gdalinfo test_formatted.xyz
Driver: XYZ/ASCII Gridded XYZ
etc...

The only gotchas I'm aware of are:

  1. The opening of a big dataset can be slow as the driver must scan the whole file to determine the dataset size and spatial resolution; and
  2. The file has to be sorted correctly (by Y, then X).

    Cells with same Y coordinates must be placed on consecutive lines. For a same Y coordinate value, the lines in the dataset must be organized by increasing X values. The value of the Y coordinate can increase or decrease however.

    $ head -n 5 test.csv
    x,y,z
    146.3707979,-39.07778764,0.491866767
    146.3787985,-39.07157315,0.614820838
    146.3637974,-39.07132457,0.555555582
    146.3630473,-39.07579901,0.481217861
    
    $ gdalinfo test.csv
    ERROR 1: Ungridded dataset: At line 3, too many stepY values
    gdalinfo failed - unable to open 'test.csv'.
    
    $ tail -n +2 test.csv| sort -n -t ',' -k2 -k1 > test_sorted.xyz
    
    $ head -n 5 test_sorted.xyz 
    146.3600471,-39.07927912,0.606096148
    146.3602971,-39.07927912,0.603663027
    146.3605471,-39.07927912,0.603663027
    146.3607971,-39.07927912,0.589507282
    146.3610472,-39.07927912,0.581049323
    
    $ gdalinfo test_sorted.xyz
    Driver: XYZ/ASCII Gridded XYZ
    etc...
    

Open Data DGM200 of Germany: https://gdz.bkg.bund.de/index.php/default/open-data/digitales-gelandemodell-gitterweite-200-m-dgm200.html

Download the xyz file: https://daten.gdz.bkg.bund.de/produkte/dgm/dgm200/aktuell/dgm200.utm32s.xyzascii.zip

Convert xyz to GeoTIFF file:

gdal_translate -a_srs utm32s.prj dgm200_utm32s.xyz germanyDGM200.tif