How to reproject 500 CSV files efficiently and easily using QGIS?

If you're looking to reproject csv files from the Python Console in QGIS then you could use the following script. All you would need to change are the three paths which are mentioned in the comments.

Essentially, the script imports your csv files into QGIS as shapefiles (assuming your geometric fields are named X and Y). It then uses the qgis:reprojectlayer and qgis:fieldcalculator algorithms from the Processing Toolbox to reproject and update the X and Y fields with the new coordinates. It then saves these in a folder and converts them to csv files in a path you specify. So in the end, you have updated shapefiles and csv files in separate folders.

import glob, os, processing

path_to_csv = "C:/Users/You/Desktop/Testing//"  # Change path to the directory of your csv files
shape_result = "C:/Users/You/Desktop/Testing/Shapefile results//"  # Change path to where you want the shapefiles saved

os.chdir(path_to_csv)  # Sets current directory to path of csv files
for fname in glob.glob("*.csv"):  # Finds each .csv file and applies following actions
        uri = "file:///" + path_to_csv + fname + "?delimiter=%s&crs=epsg:4326&xField=%s&yField=%s" % (",", "x", "y")
        name = fname.replace('.csv', '')
        lyr = QgsVectorLayer(uri, name, 'delimitedtext')
        QgsMapLayerRegistry.instance().addMapLayer(lyr)  # Imports csv files to QGIS canvas (assuming 'X' and 'Y' fields exist)

crs = 'EPSG:32633'  # Set crs
shapefiles = QgsMapLayerRegistry.instance().mapLayers().values()  # Identifies loaded layers before transforming and updating 'X' and 'Y' fields
for shapes in shapefiles:
        outputs_0 = processing.runalg("qgis:reprojectlayer", shapes, crs, None)
        outputs_1 = processing.runalg("qgis:fieldcalculator", outputs_0['OUTPUT'], 'X', 0, 10, 10, False, '$x', None)
        outputs_2 = processing.runalg("qgis:fieldcalculator", outputs_1['OUTPUT_LAYER'], 'Y', 0, 10, 10, False, '$y', shape_result + shapes.name())

os.chdir(shape_result)  # Sets current directory to path of new shapefiles
for layer in glob.glob("*.shp"):  # Finds each .shp file and applies following actions
        new_layer = QgsVectorLayer(layer, os.path.basename(layer), "ogr")
        new_name = layer.replace('.shp', '')
        csvpath = "C:/Users/You/Desktop/Testing/CSV results/" + new_name + ".csv"  # Change path to where you want the csv(s) saved
        QgsVectorFileWriter.writeAsVectorFormat(new_layer, csvpath, 'utf-8', None, "CSV")   

Hope this helps!


A quick solution for transforming a space separated file containing "lon lat" in WGS84 to UTM33N but you don't get any other data:

#!/bin/bash
#
for i in $( ls *.csv ); do
    gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 < ${i} > utm${i}
done

That works and it preserves the order of the data so maybe another loop using e.g. awk to combine the descriptive data with the coordinates?

Edit. Due to the messy comments I made below I'll edit the answer here instead.

The following script should do the job of reading multiple csv files, adding new coordinate columns to each file.

#!/bin/bash
#
for i in $( ls *.csv ); do
 paste -d',' ${i} <(awk -v OFS="," -F " " 'NR>1 {print $1 " " $2}' ${i} | gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 | awk '{gsub(" ",",",$0); print $0}' | /usr/local/bin/sed "1i\X,Y,Z") > utm${i}
#
 #paste -d',' ${i} <(awk -v OFS="," -F " " 'NR>1 {print $1 " " $2}' ${i} | gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 | awk '{gsub(" ",",",$0); print $0}' |sed "1i\X,Y,Z") > utm${i}
#
done

On OSX you will need to install the latest (2009) version of sed and use the first, uncommented line in the loop. For Linux comment out the first and use the second. Adjust the -F " " according to the format of the separator in your csv files e.g. -F "," for comma separated. Also note that the elevation transformation is to the ellipsoid,not the geoid, so be sure to transform the heights accordingly.


Using qgis or even OGR is overkill for this.
Use pyproj (https://pypi.python.org/pypi/pyproj) combined with the python csv writer and a few standard library tricks. You do not need to install anything other than pyproj for this!

import csv
import pyproj
from functools import partial
from os import listdir, path

#Define some constants at the top
#Obviously this could be rewritten as a class with these as parameters

lon = 'lon' #name of longitude field in original files
lat = 'lat' #name of latitude field in original files
f_x = 'x' #name of new x value field in new projected files
f_y = 'y' #name of new y value field in new projected files
in_path = u'D:\\Scripts\\csvtest\\input' #input directory
out_path = u'D:\\Scripts\\csvtest\\output' #output directory
input_projection = 'epsg:4326' #WGS84
output_projecton = 'epsg:32633' #UTM33N

#Get CSVs to reproject from input path
files= [f for f in listdir(in_path) if f.endswith('.csv')]

#Define partial function for use later when reprojecting
project = partial(
    pyproj.transform,
    pyproj.Proj(init=input_projection),
    pyproj.Proj(init=output_projecton))

for csvfile in files:
    #open a writer, appending '_project' onto the base name
    with open(path.join(out_path, csvfile.replace('.csv','_project.csv')), 'wb') as w:
        #open the reader
        with open(path.join( in_path, csvfile), 'rb') as r:
            reader = csv.DictReader(r)
            #Create new fieldnames list from reader
            # replacing lon and lat fields with x and y fields
            fn = [x for x in reader.fieldnames]
            fn[fn.index(lon)] = f_x
            fn[fn.index(lat)] = f_y
            writer = csv.DictWriter(w, fieldnames=fn)
            #Write the output
            writer.writeheader()
            for row in reader:
                x,y = (float(row[lon]), float(row[lat]))
                try:
                    #Add x,y keys and remove lon, lat keys
                    row[f_x], row[f_y] = project(x, y)
                    row.pop(lon, None)
                    row.pop(lat, None)
                    writer.writerow(row)
                except Exception as e:
                    #If coordinates are out of bounds, skip row and print the error
                    print e