how do I compute a weighted moving average using pandas

Using pandas you can calculate a weighted moving average (wma) using:
.rolling() combined with .apply()

Here's an example with 3 weights and window=3:

data = {'colA': random.randint(1, 6, 10)}
df = pd.DataFrame(data)

weights = np.array([0.5, 0.25, 0.25])
sum_weights = np.sum(weights)

df['weighted_ma'] = (df['colA']
    .rolling(window=3, center=True)
    .apply(lambda x: np.sum(weights*x) / sum_weights, raw=False)
)


Please note that in .rolling() I have used argument center=True.
You should check if this applies with your usecase or whether you need center=False.


Construct a kernel with the weights, and apply it to your series using numpy.convolve.

import pandas as pd
import numpy as np

def wma(arr, period):
    kernel = np.arange(period, 0, -1)
    kernel = np.concatenate([np.zeros(period - 1), kernel / kernel.sum()])
    return np.convolve(arr, kernel, 'same')

df = pd.DataFrame({'value':np.arange(11)})
df['wma'] = wma(df['value'], 4)

Here I am interpreting WMA according to this page: https://en.wikipedia.org/wiki/Moving_average

For this type of WMA, the weights should be a linear range of n values, adding up to 1.0.

Note that I pad the front of the kernel with zeros. This is because we want a 'one-sided' window function, so that 'future' values in the time series do not affect the moving average.

numpy.convolve is fast, unlike apply()! You can also use numpy.correlate if you reverse the kernel.


If data is a Pandas DataFrame or Series and you want to compute the WMA over the rows, you can do it using

wma = data[::-1].cumsum().sum() * 2 / data.shape[0] / (data.shape[0] + 1)

If you want a rolling WMA of window length n, use

data.rolling(n).apply(lambda x: x[::-1].cumsum().sum() * 2 / n / (n + 1))

as n = x.shape[0]. Note that this solution might be a bit slower than the one by Sander van den Oord, but you don't have to worry about the weights.


No, there is no implementation of that exact algorithm. Created a GitHub issue about it here:

https://github.com/pydata/pandas/issues/886

I'd be happy to take a pull request for this-- implementation should be straightforward Cython coding and can be integrated into pandas.stats.moments

Tags:

Python

Pandas