Resample xarray object to lower resolution spatially

Update

@clausmichele's answer using coarsen is now the best way to do this. Note that coarsen now includes the ability to specify desired output coordinates.

Original post

As piman314 suggests, groupby is the only way to do this in xarray. Resample can only be used for datetime coordinates.

Since xarray currently does not handle multidimensional groupby, this has to be done in two stages:

# this results in bin centers on 100, 300, ...
reduced = (
    output_ds
    .groupby(((output_ds.x//200) + 0.5) * 200)
    .mean(dim='x')
    .groupby(((output_ds.y//200) + 0.5) * 200)
    .mean(dim='y'))

If you simply want to downsample your data, you can use positional slicing:

output_ds[:, ::200, ::200]

or, using named dims:

output_ds[{'x': slice(None, None, 200), 'y': slice(None, None, 200)}]

Finally, there are other packages out there that are specifically designed for fast regridding compatible with xarray. xESMF is a good one.


Recently the coarsen method has been added to xarray and I think it's the best way for spatially downsampling, even though it's not possible to use it setting a desired final resolution and have it computed automatically. Coarsen will perform an operation (mean, max, min, etc) over non-overlapping windows and depending on the window size you set you will get your desired final resolution.

Original input data from the author:

import pandas as pd
import numpy as np
import xarray as xr

​

time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)


latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)

BHR_SW = np.ones((365, 1200, 1200))

output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})

output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)

​

<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56

Coarsen method to reduce spatial resolution from 1200x1200 to 200x200, we need 6x6 windows.

output_ds.coarsen(x=6).mean().coarsen(y=6).mean()
# or output_ds.coarsen(x=6,y=6).mean()
<xarray.Dataset>
Dimensions:    (time: 365, x: 200, y: 200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
  * x          (x) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.02 40.07 40.12 40.17 ... 49.88 49.93 49.98
    longitude  (y, x) float64 0.03244 0.03244 0.03244 ... 15.52 15.52 15.52

To do it using xarray the most obvious way is to use groupby_bins, however it turns out this is incredibly slow. It's probably much more effecient to drop into numpy and use the superfast indexing ([:, :, frequency])

nsamples = 200
bins = np.linspace(output_ds.x.min(),
                   output_ds.x.max(), nsamples).astype(int)
output_ds = output_ds.groupby_bins('x', bins).first()

As you are using a NetCDF file which already was manipulated with CDOs you could also use either CDOs SAMPLEGRID function or NCOs bilinear_interp function:

SAMPLEGRID (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) does not interpolate, it just removes every n-th grid point.

bilinear_interp (http://nco.sourceforge.net/nco.html#Bilinear-interpolation) does interpolation.

As you probably want mean, max, whatever albedo values you probably would prefer NCOs bilinear_interp. But CDOs SAMPLEGRID can give you the grid_out you need for NOCs bilinear_interp.