How to combine numerical and categorical values in a vector as input for LSTM?

There are variety of preprocessing that can be looked at while dealing with input of various ranges in general (like normalization etc). One hot representation is certainly a good way to represent categories.

Embeddings are used when there too many category elements which makes one hot encoding very large. They provide a vector representation (potentially trainable ) that encodes a given input. You can read more about them in the link below. Use of Embeddings are very common in NLP.

https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12

That aside, you could however take advantage of the fact that Keras modelling supports multiple input layers.

For your specific case, here is a made up example that might help you get started. Again, I added few dense hidden layers just to demonstrate the point. It should be self explanatory

X1 = rands  
X2 = df_days_onehot
Y = np.random.random(7)

float_input = Input(shape=(1, ))
one_hot_input = Input(shape=(7,) )

first_dense = Dense(3)(float_input)
second_dense = Dense(50)(one_hot_input)

merge_one = concatenate([first_dense, second_dense])
dense_inner = Dense(10)(merge_one)
dense_output = Dense(1)(dense_inner)


model = Model(inputs=[float_input, one_hot_input], outputs=dense_output)


model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

model.summary()

model.fit([X1,X2], Y, epochs=2)

Another way (probably more elegant) is to condition on the categorical variables (whose value do not change over time).

Let's take an example with weather data from two different cities: Paris and San Francisco. You want to predict the next temperature based on historical data. But at the same time, you expect the weather to change based on the city. You can either:

  • Combine the auxiliary features with the time series data (what you suggested here).

  • Concatenate the auxiliary features with the output of the RNN layer. It's some kind of post-RNN adjustment since the RNN layer won't see this auxiliary info.

  • Or just initialize the RNN states with a learned representation of the condition (e.g. Paris or San Francisco).

I wrote a library to condition on auxiliary inputs. It abstracts all the complexity and has been designed to be as user-friendly as possible:

https://github.com/philipperemy/cond_rnn/

Hope it helps!