In-Class Activity: TD Learning — Predict, Experiment, Explain

This activity is split into two sessions designed to be completed in order. Each session has its own worksheet with auto-saving answers, progress tracking, and export. Students work in pairs or small groups, making predictions before running experiments on the interactive simulation.

Session 1

Random Walk — TD vs Monte Carlo

Explore why TD(0) learns faster than Monte Carlo on the 5-state Random Walk, and reproduce the classic RMS error comparison from the textbook.

Predict which states change after one TD(0) episode
Compare TD(0) and MC convergence after 100 episodes
Run the Figure 6.3 comparison experiment
Reflect on why bootstrapping helps despite using estimates

~25 minutes
Start Session 1 →

Session 2

Cliff Walking — SARSA vs Q-Learning

Discover why SARSA learns a safe path while Q-Learning finds the optimal-but-risky cliff-edge route, and understand the on-policy vs off-policy distinction.

Predict shortest vs cautious paths before training
Train both algorithms and compare learned paths
Investigate how ε affects SARSA's behavior
Compare heatmaps and fill in the on-policy / off-policy summary

~30 minutes
Start Session 2 →