Notation for HMM algorithms

I was exposed briefly to Hidden Markov Models (HMMs) as an undergraduate, but now as a graduate student I have revisited HMMs in much more detail (I need to know it “forward” and “backward”, if you catch my drift). Our instructor taught us with very simple examples (two hidden states, very short sequences) and in each case, the notation he used was specific to each example. This made it easy to grasp the concepts initially, but I’m not sure I would have been able to implement more complicated HMMs with only that notation to help me. In preparing for a recent exam, however, I’ve finally come across some notation that is more generalizable. I’ll use it here to describe the forward, backward, and Viterbi algorithms.

Terminology

First let us provide some formal definitions. We have an observed sequence O = (O_1 O_2 ... O_N) where O_i \in A = \{ a_1 a_2 ... a_r \} for all i \in 1,2,...,N. For each O_i, there is an unobserved state Q_i \in S = \{ S_1, S_2, ..., S_s \} forming the corresponding hidden state sequence Q = (Q_1 Q_2 ... Q_N). Additionally we define initial state probabilities \pi (where \pi_{S_i} is the probability of starting a sequence in state S_i), transition probabilities \tau (where \tau_{S_i S_j} is the probability of moving from state S_i to state S_j), and emission (or output) probabilities \theta (where \theta_{a_i | S_j} is the probability of observing the symbol a_i given that we know the corresponding hidden state is S_j).

Forward algorithm

The forward algorithm provides the most straightforward way to determine the probability of an observed sequence given the HMM (i.e. Prob\{O_1 O_2 ... O_N\} ). This algorithm introduces a probability F_{i,j} = Prob\{O_1 O_2 ... O_i, Q_i = S_j\}, which we can conceptualize as an s \times N matrix. We initialize the first column of this matrix like so.

F_{1,j} = \pi_{S_j}\theta_{O_1 | S_j}

We can then continue to fill out the rest of the matrix using the following recurrence.

F_{i,j} = \sum_{k=1}^{s} F_{i-1,k} \tau_{S_k S_j} \theta_{O_i | S_j}

Once the matrix is filled out, we can get our desired probability Prob\{O\} = Prob\{O_1 O_2 ... O_N\} by summing over the values in the last column of the matrix.

Prob\{O\} = \sum_{k=1}^{s} F_{N, k}

Backward algorithm

The backward algorithm can also be used to find the probability of an observed sequence, but it is most useful in combination with the forward algorithm to determine the probability of a state at a specific position given an output sequence.

The backward algorithm introduces a probability B_{i,j} = Prob\{O_{i+1} O_{i+2} ...O_N | Q_i = S_j\}, which we can also conceptualize as an s \times N matrix. We first initialize the last column of the matrix like so.

B_{N, j} = 1

Then, as the algorithm’s name suggests, we work backwards to fill out the rest of the matrix using the following recurrence.

B_{i,j} = \sum_{k=1}^{s} B_{i+1,k}\tau_{S_j S_k}\theta_{O_{i+1}|S_k}

Viterbi algorithm

The Viterbi algorithm is a slight variation of the forward algorithm that answers a slightly different question: what is the most probable hidden state sequence given an observed sequence (in formal terms, which hidden state sequence Q maximizes the probability Prob\{Q | O\})?

Here we introduce another probability which we can conceptualize as an s \times N matrix.

V_{i,j} = \max_{Q_1...Q_i}Prob\{O_1 ... O_i, Q_1 ... Q_{i-1}, Q_i = S_j\}

We initialize this matrix exactly the same way we do in the forward algorithm.

V_{1,j} = \pi_{S_j}\theta_{O_1 | S_j}

We then continue to fill out the rest of the matrix using the following recurrence.

V_{i,j} = \theta_{O_i | S_j} \max_{k=\{1 ... s\}}V_{i-1,k}\tau_{S_k S_j}

We also maintain a matrix of pointers V' where V'_{i,j} indicates the value of k which maximizes V_{i-1,k}\tau_{S_k S_j} (used for backtracing). To find the most probable hidden state sequence, identify the highest value in the last column of the matrix (or \max_{j=1 ... s} V_{N,j}) and use the matrix of pointers V' to trace backward and identify the sequence of states that produced the maximal probability.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s