Raster reclassify using python, gdal and numpy

Instead of doing the reclassification as a double for loop described by dmh126, do it using np.where:

# reclassification    
lista[np.where( lista < 200 )] = 1
lista[np.where((200 < lista) & (lista < 400)) ] = 2
lista[np.where((400 < lista) & (lista < 600)) ] = 3
lista[np.where((600 < lista) & (lista < 800)) ] = 4
lista[np.where( lista > 800 )] = 5

On an array of 6163 by 3537 pixels (41.6mb) the classification is done in 1.59 seconds, where it takes 12min 41s using the double for loop. In total just a speedup of 478x.

Bottomline, never use a double for loop using numpy


Here's a basic example using rasterio and numpy:

import rasterio as rio
import numpy as np


with rio.open('~/rasterio/tests/data/rgb1.tif') as src:
    # Read the raster into a (rows, cols, depth) array,
    # dstack this into a (depth, rows, cols) array,
    # the sum along the last axis (~= grayscale)
    grey = np.mean(np.dstack(src.read()), axis=2)

    # Read the file profile
    srcprof = src.profile.copy()

classes = 5
# Breaks is an array of the class breaks: [   0.   51.  102.  153.  204.]
breaks = (np.arange(classes) / float(classes)) * grey.max()

# classify the raster
classified = np.sum(np.dstack([(grey < b) for b in breaks]), axis=2).reshape(1, 400, 400).astype(np.int32)

# Update the file opts to one band
srcprof.update(count=1, nodata=None, dtype=classified.dtype)

with rio.open('/tmp/output.tif', 'w', **srcprof) as dst:
    # Write the output
    dst.write(classified)

Here you have a simple python script for reclassification, I wrote it and it works for me:

from osgeo import gdal

driver = gdal.GetDriverByName('GTiff')
file = gdal.Open('/home/user/workspace/raster.tif')
band = file.GetRasterBand(1)
lista = band.ReadAsArray()

# reclassification
for j in  range(file.RasterXSize):
    for i in  range(file.RasterYSize):
        if lista[i,j] < 200:
            lista[i,j] = 1
        elif 200 < lista[i,j] < 400:
            lista[i,j] = 2
        elif 400 < lista[i,j] < 600:
            lista[i,j] = 3
        elif 600 < lista[i,j] < 800:
            lista[i,j] = 4
        else:
            lista[i,j] = 5

# create new file
file2 = driver.Create( 'raster2.tif', file.RasterXSize , file.RasterYSize , 1)
file2.GetRasterBand(1).WriteArray(lista)

# spatial ref system
proj = file.GetProjection()
georef = file.GetGeoTransform()
file2.SetProjection(proj)
file2.SetGeoTransform(georef)
file2.FlushCache()

Just change the ranges.

I hope it will help.