I was exposed briefly to Hidden Markov Models (HMMs) as an undergraduate, but now as a graduate student I have revisited HMMs in much more detail (I need to know it “forward” and “backward”, if you catch my drift). Our instructor taught us with very simple examples (two hidden states, very short sequences) and in each case, the notation he used was specific to each example. This made it easy to grasp the concepts initially, but I’m not sure I would have been able to implement more complicated HMMs with only that notation to help me. In preparing for a recent exam, however, I’ve finally come across some notation that is more generalizable. I’ll use it here to describe the forward, backward, and Viterbi algorithms.
First let us provide some formal definitions. We have an observed sequence where for all . For each , there is an unobserved state forming the corresponding hidden state sequence . Additionally we define initial state probabilities (where is the probability of starting a sequence in state ), transition probabilities (where is the probability of moving from state to state ), and emission (or output) probabilities (where is the probability of observing the symbol given that we know the corresponding hidden state is ).
The forward algorithm provides the most straightforward way to determine the probability of an observed sequence given the HMM (i.e. ). This algorithm introduces a probability , which we can conceptualize as an matrix. We initialize the first column of this matrix like so.
We can then continue to fill out the rest of the matrix using the following recurrence.
Once the matrix is filled out, we can get our desired probability by summing over the values in the last column of the matrix.
The backward algorithm can also be used to find the probability of an observed sequence, but it is most useful in combination with the forward algorithm to determine the probability of a state at a specific position given an output sequence.
The backward algorithm introduces a probability , which we can also conceptualize as an matrix. We first initialize the last column of the matrix like so.
Then, as the algorithm’s name suggests, we work backwards to fill out the rest of the matrix using the following recurrence.
The Viterbi algorithm is a slight variation of the forward algorithm that answers a slightly different question: what is the most probable hidden state sequence given an observed sequence (in formal terms, which hidden state sequence maximizes the probability )?
Here we introduce another probability which we can conceptualize as an matrix.
We initialize this matrix exactly the same way we do in the forward algorithm.
We then continue to fill out the rest of the matrix using the following recurrence.
We also maintain a matrix of pointers where indicates the value of k which maximizes (used for backtracing). To find the most probable hidden state sequence, identify the highest value in the last column of the matrix (or ) and use the matrix of pointers to trace backward and identify the sequence of states that produced the maximal probability.