# Notation for HMM algorithms

I was exposed briefly to Hidden Markov Models (HMMs) as an undergraduate, but now as a graduate student I have revisited HMMs in much more detail (I need to know it “forward” and “backward”, if you catch my drift). Our instructor taught us with very simple examples (two hidden states, very short sequences) and in each case, the notation he used was specific to each example. This made it easy to grasp the concepts initially, but I’m not sure I would have been able to implement more complicated HMMs with only that notation to help me. In preparing for a recent exam, however, I’ve finally come across some notation that is more generalizable. I’ll use it here to describe the forward, backward, and Viterbi algorithms.

## Terminology

First let us provide some formal definitions. We have an observed sequence $O = (O_1 O_2 ... O_N)$ where $O_i \in A = \{ a_1 a_2 ... a_r \}$ for all $i \in 1,2,...,N$. For each $O_i$, there is an unobserved state $Q_i \in S = \{ S_1, S_2, ..., S_s \}$ forming the corresponding hidden state sequence $Q = (Q_1 Q_2 ... Q_N)$. Additionally we define initial state probabilities $\pi$ (where $\pi_{S_i}$ is the probability of starting a sequence in state $S_i$), transition probabilities $\tau$ (where $\tau_{S_i S_j}$ is the probability of moving from state $S_i$ to state $S_j$), and emission (or output) probabilities $\theta$ (where $\theta_{a_i | S_j}$ is the probability of observing the symbol $a_i$ given that we know the corresponding hidden state is $S_j$).

## Forward algorithm

The forward algorithm provides the most straightforward way to determine the probability of an observed sequence given the HMM (i.e. $Prob\{O_1 O_2 ... O_N\}$). This algorithm introduces a probability $F_{i,j} = Prob\{O_1 O_2 ... O_i, Q_i = S_j\}$, which we can conceptualize as an $s \times N$ matrix. We initialize the first column of this matrix like so.

$F_{1,j} = \pi_{S_j}\theta_{O_1 | S_j}$

We can then continue to fill out the rest of the matrix using the following recurrence.

$F_{i,j} = \sum_{k=1}^{s} F_{i-1,k} \tau_{S_k S_j} \theta_{O_i | S_j}$

Once the matrix is filled out, we can get our desired probability $Prob\{O\} = Prob\{O_1 O_2 ... O_N\}$ by summing over the values in the last column of the matrix.

$Prob\{O\} = \sum_{k=1}^{s} F_{N, k}$

## Backward algorithm

The backward algorithm can also be used to find the probability of an observed sequence, but it is most useful in combination with the forward algorithm to determine the probability of a state at a specific position given an output sequence.

The backward algorithm introduces a probability $B_{i,j} = Prob\{O_{i+1} O_{i+2} ...O_N | Q_i = S_j\}$, which we can also conceptualize as an $s \times N$ matrix. We first initialize the last column of the matrix like so.

$B_{N, j} = 1$

Then, as the algorithm’s name suggests, we work backwards to fill out the rest of the matrix using the following recurrence.

$B_{i,j} = \sum_{k=1}^{s} B_{i+1,k}\tau_{S_j S_k}\theta_{O_{i+1}|S_k}$

## Viterbi algorithm

The Viterbi algorithm is a slight variation of the forward algorithm that answers a slightly different question: what is the most probable hidden state sequence given an observed sequence (in formal terms, which hidden state sequence $Q$ maximizes the probability $Prob\{Q | O\}$)?

Here we introduce another probability which we can conceptualize as an $s \times N$ matrix.

$V_{i,j} = \max_{Q_1...Q_i}Prob\{O_1 ... O_i, Q_1 ... Q_{i-1}, Q_i = S_j\}$

We initialize this matrix exactly the same way we do in the forward algorithm.

$V_{1,j} = \pi_{S_j}\theta_{O_1 | S_j}$

We then continue to fill out the rest of the matrix using the following recurrence.

$V_{i,j} = \theta_{O_i | S_j} \max_{k=\{1 ... s\}}V_{i-1,k}\tau_{S_k S_j}$

We also maintain a matrix of pointers $V'$ where $V'_{i,j}$ indicates the value of k which maximizes $V_{i-1,k}\tau_{S_k S_j}$ (used for backtracing). To find the most probable hidden state sequence, identify the highest value in the last column of the matrix (or $\max_{j=1 ... s} V_{N,j}$) and use the matrix of pointers $V'$ to trace backward and identify the sequence of states that produced the maximal probability.