Padding time-series subsequences for LSTM-RNN training

The most powerful attribute of LSTMs and RNNs in general is that their parameters are shared along the time frames(Parameters recur over time frames) but the parameter sharing relies upon the assumption that the same parameters can be used for different time steps i.e. the relationship between the previous time step and the next time step does not depend on t as explained here in page 388, 2nd paragraph.

In short, padding zeros at the end, theoretically should not change the accuracy of the model. I used the adverb theoretically because at each time step LSTM's decision depends on its cell state among other factors and this cell state is kind of a short summary of the past frames. As far as I understood, that past frames may be missing in your case. I think what you have here is a little trade-off.

I would rather pad zeros at the end because it doesn't completely conflict with the underlying assumption of RNNs and it's more convenient to implement and keep track of.

On the implementation side, I know tensorflow calculates the loss function once you give it the sequences and the actual sequence size of each sample(e.g. for 4 5 6 7 0 0 0 0 0 0 you also need to give it the actual size which is 4 here) assuming you're implementing the option 2. I don't know whether there is an implementation for option 1, though.


Better go for padding zeroes in the beginning, as this paper suggests Effects of padding on LSTMs and CNNs,

Though post padding model peaked it’s efficiency at 6 epochs and started to overfit after that, it’s accuracy is way less than pre-padding.

Check table 1, where the accuracy of pre-padding(padding zeroes in the beginning) is around 80%, but for post-padding(padding zeroes in the end), it is only around 50%