In-Class Activity: TD Learning

Predict, Experiment, Explain — two hands-on sessions using the TD Learning simulation

This activity is split into two sessions designed to be completed in order. Each session has its own worksheet with auto-saving answers, progress tracking, and export. Students work in pairs or small groups, making predictions before running experiments on the interactive simulation.
Session 1

Random Walk — TD vs Monte Carlo

Explore why TD(0) learns faster than Monte Carlo on the 5-state Random Walk, and reproduce the classic RMS error comparison from the textbook.

  • Predict which states change after one TD(0) episode
  • Compare TD(0) and MC convergence after 100 episodes
  • Run the Figure 6.3 comparison experiment
  • Reflect on why bootstrapping helps despite using estimates
~25 minutes
Start Session 1 →
Session 2

Cliff Walking — SARSA vs Q-Learning

Discover why SARSA learns a safe path while Q-Learning finds the optimal-but-risky cliff-edge route, and understand the on-policy vs off-policy distinction.

  • Predict shortest vs cautious paths before training
  • Train both algorithms and compare learned paths
  • Investigate how ε affects SARSA's behavior
  • Compare heatmaps and fill in the on-policy / off-policy summary
~30 minutes
Start Session 2 →