Reinforcement Learning: Interactive Simulations

Eight interactive visualizations designed to help students learn key concepts from Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

  1. Tic-Tac-Toe RL Agent Ch. 1 Play against a temporal-difference learning agent that improves its value estimates in real time, illustrating the core RL idea from Section 1.5.
  2. 10-Armed Bandit Testbed Ch. 2 Explore the exploration-exploitation trade-off by comparing epsilon-greedy, UCB, and gradient bandit strategies on a 10-armed testbed.
  3. Gridworld Value Function Ch. 3 Visualize state-value functions and optimal policies on the 5×5 Gridworld with special jump states (Example 3.5 / Figure 3.2).
  4. Gambler's Problem (Dynamic Programming) Ch. 4 Watch value iteration solve the Gambler's Problem step by step, revealing how the optimal policy emerges from successive sweeps.
  5. Monte Carlo Tree Search (MCTS) Ch. 8 Step through the four MCTS phases—Selection, Expansion, Simulation, Backpropagation—on a Tic-Tac-Toe board, with a live tree visualization and tunable UCB1 exploration.
  6. Policy Gradient — REINFORCE Ch. 13 See the REINFORCE algorithm learn a stochastic policy on a simple left-right game, with live plots of policy probabilities and reward curves.
  7. PPO vs A2C — Actor-Critic Methods Ch. 13 Compare Proximal Policy Optimization and Advantage Actor-Critic side by side on a simple navigation task, highlighting PPO's clipped objective and multi-epoch updates.
  8. Flappy Bird RL — PPO, DQN & A2C Deep RL Play Flappy Bird yourself, then train neural networks with PPO, DQN, and A2C to master it. Includes a pretrained DQN model ready to play immediately, a real-time training dashboard, live AI demos, and model export/import.