Getting rid of "glitches" in DiscreteWaveletTransform-de-noised data

I do not think that this is a bug. It is quiet normal to get "glitches" during the IDWT of thresholded wavelet coefficients. In some cases it is obvious from the structure of the data, in other cases the overall relation of all the data points, as an ensemble, can give raise to glitch in a not so obvious (i.e. visual) way.

Let’s see the use case from DMWood

If we plot the original curve together with the smooth reconstructed one we can understand what has happened.

ListPlot[{eg, Wavel[eg]}, Joined -> True, PlotRange -> {{200, 250}, All}]

enter image description here

We can see that the glitch, at 236 time units, is induced by the shape of the data around this area. That means that maybe one or more coefficients overfit(s) the area of the curve at the time when the glitch occurs.

1st approach

Let's break down the process:

symWavlet =DiscreteWaveletTransform[eg[[All, 2]], SymletWavelet[7], 6];
symWavletThreshold = WaveletThreshold[symWavlet];
WaveletListPlot[symWavletThreshold,DataRange -> {eg[[1, 1]], eg[[-1, 1]]},ImageSize -> 500,Ticks -> Full]

enter image description here

Thus, the coefficient {0,0,1} of the thresholded wavelength at 236 time units is not smoothed around this area, since the wavelet symWavlet overfitted the original curve for this coefficient

WaveletListPlot[symWavlet, DataRange -> {eg[[1, 1]], eg[[-1, 1]]},ImageSize -> 500, Ticks -> Full]

enter image description here

This is caused from the default threshold settings of the WaveletThreshold command

symWavletThreshold["ThresholdTable"]

\begin{array}{l|cc} \text{Wavelet Index} & \text{Threshold Value} & \\ \hline \{1\} & 27.8544 & \text{} \\ \{0,1\} & 27.8544 & \text{} \\ \{0,0,1\} & 27.8544 & \text{} \\ \{0,0,0,1\} & 27.8544 & \text{} \\ \{0,0,0,0,1\} & 27.8544 & \text{} \\ \{0,0,0,0,0,1\} & 27.8544 & \text{} \\ \end{array}

One solution is to change the threshold for the individual coefficient leaving all the others unchanged (I will set the threshold 4 times its standard deviation):

thrLim[coeff_, {1}] := 27.854
thrLim[coeff_, {0, 1}] :=  27.854
thrLim[coeff_, {0, 0, 1}] := 4 StandardDeviation[coeff]
thrLim[coeff_, {0, 0, 0, 1}] :=  27.854
thrLim[coeff_, {0, 0, 0, 0, 1}] :=  27.854
thrLim[coeff_, {0, 0, 0, 0, 0, 1}] :=  27.854
thrLim[coeff_, ___] := 0.0

With these settings a estimate the new thresholded wavelet:

symWavletThresholdNew=WaveletThreshold[any, {"Soft", thrLim}, Automatic];
symWavletThresholdNew["ThresholdTable"]

\begin{array}{l|cc} \text{Wavelet Index} & \text{Threshold Value} & \\ \hline \{1\} & 27.854 & \text{} \\ \{0,1\} & 27.854 & \text{} \\ \{0,0,1\} & 29.6791 & \text{} \\ \{0,0,0,1\} & 27.854 & \text{} \\ \{0,0,0,0,1\} & 27.854 & \text{} \\ \{0,0,0,0,0,0\} & 0. & \text{} \\ \{0,0,0,0,0,1\} & 27.854 & \text{} \\ \end{array} Yielding no glitch for {0,0,1}

WaveletListPlot[symWavletThresholdNew, DataRange -> {eg[[1, 1]], eg[[-1, 1]]}, ImageSize -> 500,Ticks -> Full]

enter image description here

And, as expected, no glitch in the final reconstructed curve

ListPlot[Transpose[{eg[[All, 1]],InverseWaveletTransform[symWavletThresholdNew]}], Joined -> True]

enter image description here

2nd approach

We can find a total threshold for the overall signal in which a portion of the data is below a fixed value.

alternativeTransform =WaveletThreshold[transform, {"Soft",Abs[FindThreshold[#, Method -> {"BlackFraction", 10^-4}]] &}];

Yielding

ListPlot[Transpose[{eg[[All, 1]],InverseWaveletTransform[alternativeTransform]}], Joined -> True]

enter image description here

Final comments For the use case presented by flinty: The glitch occurs at position 736

k = wiv@w[dat]; Position[k, Min[k]]

The original data set looks like this

ListPlot[dat]

enter image description here

and has a minimum at the same position i.e. 736

l=Position[dat, Min[dat]]

with an actual value 26.3233

dat[[l[[1, 1]]]]

Let's visualise it:

ListPlot[dat, 
 Epilog -> Circle[{Position[k, Min[k]][[1, 1]], Min[dat]}, {10, 1}]]

enter image description here

This is big deviation from the parent distribution. The probability of getting a value equal or smaller than this is quiet low (~$1.1\times10^{-6}$)

Probability[x <= Min[dat], x \[Distributed] NormalDistribution[50, 5]]

This point is causing the glitch and by bringing it closer to the other points the feature disappears (you can play with the Manipulate)

w[in_] := DiscreteWaveletTransform[in, DaubechiesWavelet[5], 5];
wiv[in_] := InverseWaveletTransform@WaveletThreshold@in; Manipulate[
 SeedRandom[123456]; 
 Module[{dat = RandomVariate[NormalDistribution[50, 5], 1000], l, 
   newPoint}, l = Position[dat, Min[dat]][[1, 1]]; 
  newPoint = ReplacePart[dat, l -> dat[[l]]*i]; 
  GraphicsRow[{ListPlot[newPoint, ImageSize -> 600, 
     PlotRange -> {10, 100}, 
     Epilog -> Circle[{l, dat[[l]]*i}, {10, 1.5}]], 
    ListPlot[wiv@w[newPoint], Joined -> True, 
     PlotRange -> {0, 70}]}]], {{i, 1, "Multiplication factor"}, 1, 4,
   0.1}]

As flinty mentions, by dropping some values from the original data set one might end up with no glitches due to the way that all the data interact, even a single point e.g.

    ListPlot[wiv@w[Delete[dat, {23}]], Joined -> True, 
 PlotRange -> {0, 70}]

enter image description here

Also for the other use case with the SeedRandom[1234567] the same occurs since the glitch occurs exactly where the maximum of the data set occurs. The value of the maximum is quiet big 74.498 (probability $4.8\times10^{-7}$).

In both cases this sudden changes in the original data are quiet big and thus the resulting wavelengths overfit the signal around these areas. Same techniques as the ones discussed above can be applied to overcome the overfitting and thus the glitches in the resulting reconstructed signal.


Update 30/05/2020: this answer explains the motivation for thinking it's a bug - but see @demm's answer. There are good reasons the spikes appear and are not thresholded out by default. I'm leaving this answer up as @demm's own answer references it.

Unless somebody can explain otherwise, I think it's a bug. Take the following example containing a spike around x=735:

SeedRandom[123456];
w[in_] := DiscreteWaveletTransform[in, DaubechiesWavelet[5], 4];
wiv[in_] := InverseWaveletTransform@WaveletThreshold@in
dat = RandomVariate[NormalDistribution[50, 5], 1000];
ListPlot[wiv@w[dat], Joined -> True, PlotRange -> {0, 70}]

wavelet thresholded with spike

However, if we remove the first 12 data points the spike disappears:

ListPlot[wiv@w[dat[[12 ;;]]], Joined -> True, PlotRange -> {0, 70}]

wavelet thresholded data trimmed without spike

As far as I can tell, no changes to padding or wavelet size remove these spikes in general and they're always likely to arise with random data like this. I will submit this to Wolfram Support.

You could try GaussianFilter or TotalVariationFilter on your data if you are happy to do the denoising without wavelets.

Another example with a different wavelet occuring with a different seed:

SeedRandom[1234567];
w[in_] := DiscreteWaveletTransform[in, HaarWavelet[], 4]; 
wiv[in_] := InverseWaveletTransform@WaveletThreshold@in
dat = RandomVariate[NormalDistribution[50, 5], 1000];
ListPlot[wiv@w[dat], Joined -> True, PlotRange -> {0, 70}]

haar wavelet threshold with spike