What is hidden in Hidden Markov Models?

The unobserved state.

Let's consider a hidden Markov model for my cat's behavior. Bella can be in five states: hungry, tired, playful, cuddly, bored. She can respond to these states with six behaviors: whining, scratching, cuddling, pouncing, sleeping and stalking.

A hidden Markov model would consist of two matrices, one 5x5 and the other 5x6. The 5x5 matrix gives the probabilities that, if she is hungry at time $t$, she will be tired at time $t+1$, and so forth. So we can compute the probability that she is in different emotional states by taking powers of this matrix.

However, we can't observe her emotions -- they are hidden. The 5x6 matrix gives the probability that, if she is hungry at time $t$, she will whine at time $t+1$. (Very close to $1$.) These are the behaviors we observe.

In an ordinary Markov model, there would just be a single 6x6 matrix, which directly described the probability of transitions like whining ---> clawing. As you can see, an ordinary Markov model is less able to reflect the complexity of my cat's inner life.

See the wikipedia article for much more information.


Another standard application area of (various refinements of) HMM models is the analysis of genomic or proteomic sequences. For genomes, the observations (the six behaviours of David's cat) could be the four well-known nucleotides A, C, G and T, and the states (the moods of the cat) would be some attributes of portions of the genome such as, in the basest version of segmentation, being a coding or a non-coding region.

The relevant litterature is huge, one starting point could be Anders Krogh's wikipedia page.


It may also help to consider the standard application areas of HMMs. In speech recognition, the goal is to decode an audio signal into actual text (what the person was saying); here, the observations are given by the audio signal and the unobserved states are the actual syllables being spoken. In other natural language processing tasks like part-of-speech tagging, named entity recognition, or information extraction, the observations are a bunch of words in a document, while the hidden parts are characteristics of those words (their grammatical parts of speech, whether or not they refer to a person, etc) are things we wish to infer but are not in the actual data itself. Google for these topics for more information. Also, see Lawrence Rabiner's HMM tutorial.