Reducing noise on Data

It depends how you define the "noise" and how it is caused. Since you didn't provide much information about your case, I'll take your question as "how to make the curve smooth". Kalman filter can do this, but it's too complex, I'd prefer simple IIR filter

import matplotlib.pyplot as plt

mu, sigma = 0, 500

x = np.arange(1, 100, 0.1)  # x axis
z = np.random.normal(mu, sigma, len(x))  # noise
y = x ** 2 + z # data
plt.plot(x, y, linewidth=2, linestyle="-", c="b")  # it include some noise

enter image description here

After filter

from scipy.signal import lfilter

n = 15  # the larger n is, the smoother curve will be
b = [1.0 / n] * n
a = 1
yy = lfilter(b,a,y)
plt.plot(x, yy, linewidth=2, linestyle="-", c="b")  # smooth by filter

enter image description here

lfilter is a function from scipy.signal.

By the way, if you do want to use Kalman filter for smoothing, scipy also provides an example. Kalman filter should also work on this case, just not so necessary.


If you are dealing with timeseries I suggest you tsmoothie: A python library for timeseries smoothing and outlier detection in a vectorized way.

It provides different smoothing algorithms together with the possibility to computes intervals.

Here I use a ConvolutionSmoother but you can also test it others. (Also KalmanSmoother is available)

import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.smoother import *

mu, sigma = 0, 500
x = np.arange(1, 100, 0.1)  # x axis
z = np.random.normal(mu, sigma, len(x))  # noise
y = x ** 2 + z # data

# operate smoothing
smoother = ConvolutionSmoother(window_len=30, window_type='ones')
smoother.smooth(y)

# generate intervals
low, up = smoother.get_intervals('sigma_interval', n_sigma=3)

# plot the smoothed timeseries with intervals
plt.figure(figsize=(11,6))
plt.plot(smoother.data[0], color='orange')
plt.plot(smoother.smooth_data[0], linewidth=3, color='blue')
plt.fill_between(range(len(smoother.data[0])), low[0], up[0], alpha=0.3)

enter image description here

I point out also that tsmoothie can carry out the smoothing of multiple timeseries in a vectorized way


Depending on how much you like to remove the noise, you can also use the Savitzky-Golay filter from scipy.

The following takes the example from @lyken-syu:

import matplotlib.pyplot as plt
import numpy as np
mu, sigma = 0, 500
x = np.arange(1, 100, 0.1)  # x axis
z = np.random.normal(mu, sigma, len(x))  # noise
y = x ** 2 + z # data
plt.plot(x, y, linewidth=2, linestyle="-", c="b")  # it include some noise

enter image description here

and applies the Savitzky-Golay filter

from scipy.signal import savgol_filter
w = savgol_filter(y, 101, 2)
plt.plot(x, w, 'b')  # high frequency noise removed

window_length = 101

Increasing the window_length to 501:

window_length = 501

Read more about the filter here