Hidden Markov Model for multiple observed variables

I found that this can be achieved by modelling the system as a Dynamic Naive Bayes classifier (DNB), which is a slight extension of an ordinary (single-variable) HMM that can cater for multi-observation scenarios as shown in the figure.

Caution is advised in that DNB still has a hidden state and should therefore not be regarded as a direct sequential expansion of the original Naive Bayes classifier. The 'naive' in the algorithm's name originates from the fact that all observed variables are independent of each other, given the hidden state variable.

Similar to an HMM, the parameter estimations of this model can be achieved via the Baum Welch (or EM, whichever you prefer to name it) algorithm. Since the emission distribution at each time step is now the product of P(Yti|Xt) of each observed variable Yti, the forward, backward, and joint variable equations need to be slightly modified as described in section 3 of this paper by Aviles-Arriaga et al.


The simplest way to do this, and have the model remain generative, is to make the y_is conditionally independent given the x_is. This leads to trivial estimators, and relatively few parameters, but is a fairly restrictive assumption in some cases (it's basically the HMM form of the Naive Bayes classifier).

EDIT: what this means. For each timestep i, you have a multivariate observation y_i = {y_i1...y_in}. You treat the y_ij as being conditionally independent given x_i, so that:

p(y_i|x_i) = \prod_j p(y_ij | x_i)

you're then effectively learning a naive Bayes classifier for each possible value of the hidden variable x. (Conditionally independent is important here: there are dependencies in the unconditional distribution of the ys). This can be learned with standard EM for an HMM.

You could also, as one commenter said, treat the concatenation of the y_ijs as a single observation, but if the dimensionality of any of the j variables is beyond trivial this will lead to a lot of parameters, and you'll need way more training data.

Do you specifically need the model to be generative? If you're only looking for inference in the x_is, you'd probably be much better served with a conditional random field, which through its feature functions can have far more complex observations without the same restrictive assumptions of independence.