Dynamic Programming In-Class Activity

The Gambler's Problem — Sutton & Barto, Example 4.3  |  Open Simulation  |  ← All simulations

Activity Overview

Saved
Time: ~15 minutes
Format: Individual or pairs
Materials: Laptop with DynamicProgramming.html
Method: Predict → Experiment → Explain
Progress: 0 / 0 questions answered

A gambler repeatedly bets on coin flips. Heads wins the stake, tails loses it. The goal is to reach $100 starting from some capital. Value iteration finds the optimal betting strategy. Type your answers below — they are auto-saved.

Part 1 — Predict & First Look (3 min)
1A — Predict Before You Run 2 min

Open DynamicProgramming.html. The default coin probability is ph = 0.40 (the coin is biased against the gambler). The value function V(s) gives the probability of winning from capital $s.

  • Before running anything: guess V(50) — the probability of reaching $100 from $50 with a 40% coin. Would you bet big or small? Why?
1B — Run to Convergence 1 min
  • Click "Run to Convergence". Wait for the status to read "Converged after … sweeps".
  • Hover over the V(s) chart at s=25, s=50, and s=75 to read the converged values.
StateConverged V(s)
V(25)
V(50)
V(75)
  • How close was your prediction? Were you surprised by V(50)?
Part 2 — The Optimal Policy (4 min)
2A — Reading the Policy Chart 2 min

Look at the Optimal Policy π*(s) chart. The green bars show the optimal stake at each capital level.

  • Hover over the policy chart to read specific stake values.
CapitalOptimal Stake
s = 25
s = 50
s = 75
  • The policy chart has a distinctive spiky pattern — certain states have very high stakes while neighboring states bet small. Why do spikes appear at s=25, s=50, and s=75?
    Hint: what is special about these numbers relative to the goal of $100?
2B — Why Go All-In? 2 min
  • The policy says the gambler should sometimes bet everything (e.g., at s=50 the stake might equal 50). Why is "all-in" optimal when the coin is unfavorable?
    Hint: with a bad coin, does the gambler benefit from many flips or few flips?
Part 3 — Change the Coin (5 min)
3A — Fair Coin (ph = 0.50) 2 min
  • Move the ph slider to 0.50 (this resets automatically). Click "Run to Convergence".
V(50) at ph = 0.50
  • What shape does V(s) have with a fair coin? Why does this make sense?
    Hint: with ph = 0.50, the gambler's chance of reaching $100 from $s is just s/100.
3B — Favorable Coin (ph = 0.55) 1 min
  • Set ph = 0.55. Run to convergence. Look at both V(s) and the policy.
V(50) at ph = 0.55
  • How has the policy changed? Is it still spiky, or does the gambler use a different strategy?
    Hint: with a favorable coin, is it better to take many small bets or a few large ones?
3C — The Big Picture 2 min
  • Summarize: how does the optimal strategy change as ph goes from unfavorable (0.40) to fair (0.50) to favorable (0.55)? Why is the difference so dramatic?
Part 4 — Simulate & Reflect (3 min)
4A — Watch It Play 2 min

Set ph back to 0.40 and run to convergence. Then scroll down to "Simulate Optimal Policy".

  • Set starting capital to $50 and click "Play Episode" three times. Tally wins and losses.
RunResult (Win/Loss)
1
2
3
  • How many of your 3 runs ended in a win? Is this consistent with V(50)?
4B — Fill in the Blanks 1 min
Value iteration repeatedly applies the equation to update V(s).
The algorithm converges when the maximum change Δ falls below the threshold .
When the coin is unfavorable (ph < 0.5), the optimal strategy is to bet to minimize the number of flips.
When the coin is favorable (ph > 0.5), the optimal strategy is to bet to let the edge compound over many flips.
The spikes in the policy at s = 25, 50, 75 occur because these capitals can reach in a single winning bet.
Your answers are auto-saved in your browser. Use the buttons above to export for submission.
Copied to clipboard!