Recurrent neural network in 11.1 explicit examples?

Here is a simple example that may help you get started. In this example, we are going to a predict a simple time series of a sinusoid wave.

data = Table[Sin[x], {x, 0, 100, 0.04}];

enter image description here

We will cut the data into windows of 51 data points. The first 50 points as a whole is our X, and the last data point is our Y.

training = 
  RandomSample[
   List /@ Most[#] -> List@Last[#] & /@ (Partition[data, 51, 1])];

We use a single gated recurrent layer in our neural network

net = NetChain[{
   GatedRecurrentLayer[10],
   LinearLayer[1]}, "Input" -> {50, 1}, "Output" -> 1
  ]

and train with the training data

trained = NetTrain[net, training]

After training, we can use it to predict the time series. We first feed the neural network with 50 data points and then repeatedly use the data it generates to feed back into the neural network to generate the next data point. Here is a comparison between the ground truth and our predictions, which shows very good agreements.

ListPlot[{Flatten@
   NestList[Append[Rest[#], trained[#]] &, 
     List /@ Sin[Range[-49*0.04, 0, 0.04]], 500][[All, -1]], 
  Table[Sin[x], {x, 0, 500*0.04, 0.04}]}, Joined -> True, 
 PlotLegends -> {"predicted", "ground truth"}]

enter image description here


Taking inspiration from the answer by xslittlegrass, I came up with the following solution.

Recall the sample data from this question. We have an observable obs we are interested to predict:

enter image description here

and three parameters par1, par2, par3 that are correlated with the observable to some extent:

enter image description here

We only use the data for the first 700 time steps to train the model, and will try to predict the next 300 time steps.

We create a training set of tlen consecutive data points of length featn each, which have the respectively following data point of length featn as output. Then we train a model that returns featn outputs:

dat = Transpose[{par1/Max[par1], par2/Max[par2], par3/Max[par3], obs/Max[obs]}];
tlen = 300; featn = Length[dat[[1]]];
training = Table[dat[[i ;; i + tlen - 1]] -> dat[[i + tlen]], {i, 1, Length[dat] - tlen}];
net = NetChain[{GatedRecurrentLayer[tlen, "Dropout" -> {"VariationalInput" -> 0.1 , "VariationalState" -> 0.5}], LinearLayer[featn]}, "Input" -> {tlen, featn}, "Output" -> {featn}]
trained = NetTrain[net, training, Method -> {"ADAM", "InitialLearningRate" -> 0.0001}]

The training takes about two minutes. Finally, we can iteratively predict the future 300 steps

datt = dat;
Dynamic[i]
Do[
 start = datt[[-tlen ;;]];
 AppendTo[datt, trained[start]];
 , {i, 1, 300}]

Amazingly, the prediction is qualitatively correct, with amplitude deviation growing to about 15% over the course of 300 time steps!

ListPlot[{datt[[;; , 4]], tab}]

enter image description here

Any suggestions for how to improve upon the above?


For instance, let's assume you have a sequence of 3 and you have 8 input variables (features) X

Let Y be the output with values "yes" or "no" for each sequence of X

Let's X have a dimension of 195

You create a sequence of 3 for X

Xpartition = Partition[X, 3]

Now, you create your trainingData:

trainingData = MapThread[Rule, {Xpartition, Y}]

You build your model:

net = NetChain[
    { LongShortTermMemoryLayer[32]
    , SequenceLastLayer[]
    , LinearLayer[2]
    , SoftmaxLayer[]
    }
  , "Input" -> {3, 8}
  , "Output" -> NetDecoder[{"Class", {"no", "yes"}}]
]

where 3 is the number of vectors in the sequence, and 8 is the length of vectors.