Looping through all raster cell values using GDAL via Python?

You may read it as array, using numpy:

from osgeo import gdal
import sys
import numpy as np

src_ds = gdal.Open( "INPUT.tif" )

print "[ RASTER BAND COUNT ]: ", src_ds.RasterCount
for band in range( src_ds.RasterCount ):
    band += 1
    print "[ GETTING BAND ]: ", band
    srcband = src_ds.GetRasterBand(band)

    stats = srcband.GetStatistics( True, True )

    print "[ STATS ] =  Minimum=%.3f, Maximum=%.3f, Mean=%.3f, StdDev=%.3f" % ( \
            stats[0], stats[1], stats[2], stats[3] )

    rast_array = np.array(src_ds.GetRasterBand(1).ReadAsArray())
    print rast_array

Using a sample raster, the above code will return:

[ RASTER BAND COUNT ]:  1
[ GETTING BAND ]:  1
[ STATS ] =  Minimum=1683.000, Maximum=1900.000, Mean=1820.854, StdDev=59.329
[[1900 1900 1898 1895 1892 1887 1879 1871 1863 1852 1845 1837 1824 1802
  1743 1725 1713 1705 1699 1693 1687 1683]
 [1897 1896 1894 1892 1890 1884 1877 1869 1862 1854 1847 1838 1820 1800
  1745 1729 1719 1712 1706 1701 1696 1695]
 [1892 1891 1890 1888 1885 1881 1875 1868 1861 1855 1849 1837 1817 1794
  1747 1732 1725 1720 1714 1710 1707 1706]
 [1887 1885 1884 1882 1880 1878 1873 1867 1860 1855 1849 1833 1815 1789
  1749 1738 1732 1728 1723 1720 1718 1715]
 [1882 1880 1878 1876 1875 1873 1871 1866 1861 1855 1849 1832 1817 1795
  1756 1744 1740 1737 1733 1730 1728 1725]
 [1880 1877 1874 1873 1870 1868 1867 1865 1860 1855 1850 1841 1834 1817
  1795 1769 1749 1746 1743 1740 1736 1731]
 [1880 1876 1873 1870 1869 1866 1863 1862 1859 1856 1852 1847 1843 1841
  1824 1812 1802 1775 1758 1747 1740 1733]
 [1879 1876 1873 1870 1869 1866 1863 1860 1858 1855 1852 1850 1847 1843
  1831 1819 1803 1782 1763 1747 1738 1730]
 [1879 1877 1874 1872 1869 1866 1864 1861 1858 1855 1852 1850 1850 1848
  1836 1816 1794 1775 1754 1744 1736 1728]
 [1880 1877 1875 1872 1869 1867 1864 1862 1858 1854 1850 1848 1850 1850
  1840 1806 1786 1767 1749 1742 1734 1726]
 [1881 1879 1876 1873 1870 1866 1864 1861 1857 1851 1843 1840 1841 1850
  1827 1797 1782 1769 1752 1742 1733 1723]
 [1882 1879 1876 1873 1870 1867 1864 1861 1855 1848 1839 1835 1833 1836
  1810 1794 1783 1771 1758 1747 1737 1729]
 [1882 1880 1876 1873 1869 1866 1862 1858 1854 1849 1838 1833 1826 1814
  1800 1792 1782 1773 1762 1752 1742 1733]
 [1881 1878 1874 1870 1867 1863 1860 1856 1853 1849 1840 1835 1821 1813
  1798 1790 1783 1774 1766 1757 1748 1738]]

(If you want to print each value separately, it could be easy to edit the code).


The problem has been resolved in GDAL does not ignore NoData value

 f = gdal.Open("a.tif")
 bands = f.RasterCount
 print bands 
 3
 for j in range(bands):
       band = f.GetRasterBand(j+1)
       stats = band.GetStatistics( True, True )
       print "[ STATS ] =  Minimum=%.3f, Maximum=%.3f, Mean=%.3f, StdDev=%.3f" % ( stats[0], stats[1], stats[2], stats[3] )
 [ STATS ] =  Minimum=17.000, Maximum=255.000, Mean=220.586, StdDev=39.705
 [ STATS ] =  Minimum=64.000, Maximum=255.000, Mean=214.975, StdDev=36.926
 [ STATS ] =  Minimum=45.000, Maximum=255.000, Mean=179.029, StdDev=68.234

But if you use band.ReadAsArray() (= Numpy array)

for j in range(bands):
    band = f.GetRasterBand(j+1)
    data = band.ReadAsArray()
    print "[ Numpy ] =  Minimum=%.3f, Maximum=%.3f, Mean=%.3f, StdDev=%.3f" % (data.min(), data.max(), data.mean(), data.std())
[ Numpy ] =  Minimum=0.000, Maximum=255.000, Mean=220.477, StdDev=42.584
[ Numpy ] =  Minimum=31.000, Maximum=255.000, Mean=214.955, StdDev=39.558
[ Numpy ] =  Minimum=0.000, Maximum=255.000, Mean=178.856, StdDev=69.535

Why? The problem is (GDAL does not ignore NoData value)

GetStatistics will reuse previously computed statistics if they exist (i.e computed before you set the NoData value). You can use stats = band.ComputeStatistics(0) instead of GetStatistics to force the statistics to be recomputed.

for j in range(bands):
    band = f.GetRasterBand(j+1)
    stats = band.ComputeStatistics(0)
    print "[ STATS ] =  Minimum=%.3f, Maximum=%.3f, Mean=%.3f, StdDev=%.3f" % ( stats[0], stats[1], stats[2], stats[3] ) 

[ STATS ] =  Minimum=0.000, Maximum=255.000, Mean=220.477, StdDev=42.584
[ STATS ] =  Minimum=31.000, Maximum=255.000, Mean=214.955, StdDev=39.558
[ STATS ] =  Minimum=0.000, Maximum=255.000, Mean=178.856, StdDev=69.535

...Or you could just convert it to an ESRI Ascii Raster and achieve effectively the same result in much less time.

Here's an example of an ascii raster from the documentation:

ncols 480
nrows 450
xllcorner 378923
yllcorner 4072345
cellsize 30
nodata_value -32768

43 2 45 7 3 56 2 5 23 65 34 6 32 54 57 34 2 2 54 6 
35 45 65 34 2 6 78 4 2 6 89 3 2 7 45 23 5 8 4 1 62 ...

GDAL can do the conversion very fast, you can then read the file, or whatever is required. There is nothing wrong with the other answer. I only suggest this because I find NUMPY very slow for cell-by-cell operations.

Tags:

Python

Numpy

Gdal