Hidden Markov Models¶

This notebook implements a simple example of a Hidden Markov Model (HMM) and demonstrates how different implementations have different runtime characteristics.

Prerequisites¶

This tutorial assumes you have completed the Introduction to FlipPy tutorial.

Learning Objectives¶

By the end of this tutorial, you will be able to:

  1. Understand the structure of Hidden Markov Models
  2. Implement HMM inference in FlipPy
  3. Recognize why naive inference is slow and how to fix it

What are Hidden Markov Models?¶

HMMs are probabilistic models for sequential data where:

  • There's a sequence of hidden states we can't observe directly
  • At each timestep, we receive an observation that depends on the hidden state
  • States transition according to a Markov process (next state depends only on current state)

Real-world applications:

  • Speech recognition: Hidden phonemes → observed audio
  • Genomics: Hidden gene functions → observed DNA sequences
  • Finance: Hidden market regimes → observed prices
  • Weather prediction: Hidden atmospheric states → observed conditions
In [1]:
from flippy import infer, condition, flip, draw_from
import pandas as pd
import seaborn as sns

The Rainy/Sunny HMM Example¶

This classic example (from Russell & Norvig, 2010) models weather prediction:

  • Hidden states: The actual weather (rainy or sunny)
  • Observations: Whether someone is carrying an umbrella
    Hidden:     rainy ──→ sunny ──→ rainy ──→ sunny ──→ ...
                  ↓         ↓         ↓         ↓
    Observed: umbrella  no umbrella  umbrella  umbrella

HMM Components:

Component Notation Description
Initial distribution $P(x_0)$ Probability of starting in each state
Transition model $P(x_{t+1} \mid x_t)$ How states evolve over time
Sensor/observation model $P(y_t \mid x_t)$ How observations depend on states

Our model's parameters:

Transition model:

Transition From Rainy From Sunny
→ Rainy 70% 30%
→ Sunny 30% 70%

Observation model:

Observation When Rainy When Sunny
Umbrella 90% 20%
No umbrella 10% 80%

The joint distribution over states and observations is: $$ P(x_{1:T}, y_{1:T}) = P(x_0)\prod_{t = 1}^T P(y_t \mid x_t) P(x_{t+1} \mid x_{t}) $$

In [2]:
def transition(state):
    if state == "rainy":
        next_state = "rainy" if flip(.7) else "sunny"
    else:
        next_state = "rainy" if flip(.3) else "sunny"
    return next_state

def sensor_model(state):
    if state == "rainy":
        return "umbrella" if flip(.9) else "no umbrella"
    else:
        return "umbrella" if flip(.2) else "no umbrella"
In [3]:
def generate_sequence(length):
    state_seq = ()
    obs_seq = ()
    state = draw_from(["rainy", "sunny"])
    for t in range(length):
        obs = sensor_model(state)
        obs_seq += (obs,)
        state_seq += (state,)
        state = transition(state)
    return state_seq, obs_seq

Inferring latent states from observations (version 1)¶

HMMs can be used to perform smoothing, in which we observe a sequence of observations $y_{1:T}$ and attempt to infer the hidden state at each time $t$, where $1 \leq t \leq T$. That is, for each $t$, calculate: $$ P(x_t \mid y_{1:T}) = \frac{\sum_{x_{1:t-1}, x_{t-1:T}}P(x_{1:T}, y_{1:T})}{\sum_{x_{1:T}}P(x_{1:T}, y_{1:T})} $$

In FlipPy, the simplest way to implement this inference is to infer a posterior over state sequences given observations and then compute marginals for each timestep like in the following cells.

In [4]:
@infer
def state_seq_model(obs_seq):
    state = draw_from(["rainy", "sunny"])
    state_seq = ()
    for obs in obs_seq:
        # Save the state at time t
        state_seq += (state,)

        # Generate the observation 
        obs_ = sensor_model(state)

        # Condition on the observation
        condition(obs == obs_)

        # Transition to the next state
        state = transition(state)

    return state_seq
In [5]:
obs_seq = ("umbrella", "umbrella", "umbrella", "umbrella", "no umbrella")
state_seq_dist = state_seq_model(obs_seq)
In [6]:
state_seq_marginals_df = []
for t in range(len(obs_seq)):
    marginal_dist = state_seq_dist.marginalize(lambda seq: seq[t])
    state_seq_marginals_df.append({'t': t, **dict(marginal_dist)})
ax = sns.lineplot(
    data=pd.DataFrame(state_seq_marginals_df),
    x='t', y='rainy'
)
ax.set_ylim(0, 1)
Out[6]:
(0.0, 1.0)
No description has been provided for this image

Why is This Slow?¶

The naive approach above computes the full joint distribution over all possible state sequences. The problem is that the number of possible sequences grows exponentially with the sequence length:

Sequence Length Number of Possible Sequences Computation
5 $2^5 = 32$ Fast
10 $2^{10} = 1,024$ Okay
15 $2^{15} = 32,768$ Slow
20 $2^{20} ≈ 1$ million Very slow

This defeats the purpose of HMMs! The Markov property tells us that the future is independent of the past given the present—we should be able to exploit this for efficient computation.

The fix: Instead of computing the joint over sequences, compute marginals at each timestep independently. This is the key insight behind the forward-backward algorithm.

Try running the cell below—it takes about 30 seconds because it's enumerating all $2^{15} = 32,768$ possible state sequences:

In [7]:
# Running this cell will take about 30 seconds!
long_state_seq_dist = state_seq_model(obs_seq*3)
In [8]:
long_state_seq_marginals_df = []
for t in range(len(obs_seq)*3):
    marginal_dist = long_state_seq_dist.marginalize(lambda seq: seq[t])
    long_state_seq_marginals_df.append({'t': t, **dict(marginal_dist)})
ax = sns.lineplot(
    data=pd.DataFrame(long_state_seq_marginals_df),
    x='t', y='rainy'
)
ax.set_ylim(0, 1)
Out[8]:
(0.0, 1.0)
No description has been provided for this image

The Efficient Approach: Compute Marginals Directly¶

The key insight is that we don't need the full joint distribution—we only need the marginal probability of each state at each timestep. By restructuring our code to return the state at a specific time t (rather than the full sequence), FlipPy can exploit the Markov structure.

Why does this work?

When we call state_marginal_model(obs, t=2), FlipPy only needs to track:

  1. The probability distribution over states at time 2
  2. How observations before and after time 2 constrain this distribution

It doesn't need to enumerate all possible sequences—it can sum over "past" and "future" states efficiently.

Computational complexity:

  • Naive approach: $O(|S|^T)$ where $|S|$ is the number of states and $T$ is sequence length
  • Efficient approach: $O(|S|^2 \cdot T)$—linear in sequence length!

This is the same insight behind the classic forward-backward algorithm, but expressed naturally in a probabilistic programming framework.

In [9]:
@infer
def state_marginal_model(observations, t):
    assert t < len(observations)
    state = draw_from(["rainy", "sunny"])

    # We will return this at the end
    state_t = None 
    for t_, obs in list(enumerate(observations)):
        # Save the state at time t
        if t_ == t:
            state_t = state

        # generate observation
        obs_ = sensor_model(state)

        # Condition on the observation
        condition(obs == obs_)

        # Transition to the next state
        state = transition(state)

    return state_t
In [10]:
state_marginals_df = []
for t in range(len(obs_seq)*3):
    marginal_dist = state_marginal_model(obs_seq*3, t)
    state_marginals_df.append({'t': t, **dict(marginal_dist)})
ax = sns.lineplot(
    data=pd.DataFrame(state_marginals_df),
    x='t', y='rainy',
)
ax.set_ylim(0, 1)
Out[10]:
(0.0, 1.0)
No description has been provided for this image

Interpreting the results: The plot shows the probability of rain at each timestep, given the full observation sequence. Notice:

  • When we see umbrellas (timesteps 0-4, 5-9, 10-14), the probability of rain is high
  • When we see "no umbrella" (timesteps 4, 9, 14), the probability drops
  • The effect of observations propagates both forward and backward in time (this is "smoothing")

Summary¶

In this tutorial, you learned:

  • HMMs model sequential data with hidden states and observations
  • Naive inference over full sequences is exponentially slow
  • Marginal inference exploits the Markov property for linear-time computation
  • FlipPy can automatically exploit this structure when you return marginals instead of sequences

Key takeaway: How you structure your probabilistic program affects efficiency. Returning what you actually need (marginals) rather than intermediate representations (full sequences) can yield dramatic speedups.

Extensions¶

The HMM framework can be extended to:

  • Continuous observations: Replace the discrete sensor model with Gaussian observations
  • Continuous states: Leads to Kalman filters and particle filters
  • Switching dynamics: Different transition models in different regimes
  • Input-driven transitions: HMMs where actions influence state transitions (leading to POMDPs)
In [ ]: