Dynamic Programming In-Class Activity
The Gambler's Problem — Sutton & Barto, Example 4.3 | Open Simulation | ← All simulations
Activity Overview
Time: ~15 minutes
Format: Individual or pairs
Materials: Laptop with DynamicProgramming.html
Method: Predict → Experiment → Explain
A gambler repeatedly bets on coin flips. Heads wins the stake, tails loses it. The goal is to reach $100 starting from some capital. Value iteration finds the optimal betting strategy. Type your answers below — they are auto-saved.
Part 1 — Predict & First Look (3 min)
1A — Predict Before You Run
2 min
Open DynamicProgramming.html. The default coin probability is ph = 0.40 (the coin is biased against the gambler). The value function V(s) gives the probability of winning from capital $s.
- Before running anything: guess V(50) — the probability of reaching $100 from $50 with a 40% coin. Would you bet big or small? Why?
1B — Run to Convergence
1 min
- Click "Run to Convergence". Wait for the status to read "Converged after … sweeps".
- Hover over the V(s) chart at s=25, s=50, and s=75 to read the converged values.
| State | Converged V(s) |
|---|---|
| V(25) | |
| V(50) | |
| V(75) |
- How close was your prediction? Were you surprised by V(50)?
Part 2 — The Optimal Policy (4 min)
2A — Reading the Policy Chart
2 min
Look at the Optimal Policy π*(s) chart. The green bars show the optimal stake at each capital level.
- Hover over the policy chart to read specific stake values.
| Capital | Optimal Stake |
|---|---|
| s = 25 | |
| s = 50 | |
| s = 75 |
- The policy chart has a distinctive spiky pattern — certain states have very high stakes while neighboring states bet small. Why do spikes appear at s=25, s=50, and s=75?
Hint: what is special about these numbers relative to the goal of $100?
2B — Why Go All-In?
2 min
- The policy says the gambler should sometimes bet everything (e.g., at s=50 the stake might equal 50). Why is "all-in" optimal when the coin is unfavorable?
Hint: with a bad coin, does the gambler benefit from many flips or few flips?
Part 3 — Change the Coin (5 min)
3A — Fair Coin (ph = 0.50)
2 min
- Move the ph slider to 0.50 (this resets automatically). Click "Run to Convergence".
| V(50) at ph = 0.50 |
- What shape does V(s) have with a fair coin? Why does this make sense?
Hint: with ph = 0.50, the gambler's chance of reaching $100 from $s is just s/100.
3B — Favorable Coin (ph = 0.55)
1 min
- Set ph = 0.55. Run to convergence. Look at both V(s) and the policy.
| V(50) at ph = 0.55 |
- How has the policy changed? Is it still spiky, or does the gambler use a different strategy?
Hint: with a favorable coin, is it better to take many small bets or a few large ones?
3C — The Big Picture
2 min
- Summarize: how does the optimal strategy change as ph goes from unfavorable (0.40) to fair (0.50) to favorable (0.55)? Why is the difference so dramatic?
Part 4 — Simulate & Reflect (3 min)
4A — Watch It Play
2 min
Set ph back to 0.40 and run to convergence. Then scroll down to "Simulate Optimal Policy".
- Set starting capital to $50 and click "Play Episode" three times. Tally wins and losses.
| Run | Result (Win/Loss) |
|---|---|
| 1 | |
| 2 | |
| 3 |
- How many of your 3 runs ended in a win? Is this consistent with V(50)?
4B — Fill in the Blanks
1 min
Value iteration repeatedly applies the equation to update V(s).
The algorithm converges when the maximum change Δ falls below the threshold .
When the coin is unfavorable (ph < 0.5), the optimal strategy is to bet to minimize the number of flips.
When the coin is favorable (ph > 0.5), the optimal strategy is to bet to let the edge compound over many flips.
The spikes in the policy at s = 25, 50, 75 occur because these capitals can reach in a single winning bet.
The algorithm converges when the maximum change Δ falls below the threshold .
When the coin is unfavorable (ph < 0.5), the optimal strategy is to bet to minimize the number of flips.
When the coin is favorable (ph > 0.5), the optimal strategy is to bet to let the edge compound over many flips.
The spikes in the policy at s = 25, 50, 75 occur because these capitals can reach in a single winning bet.
Your answers are auto-saved in your browser. Use the buttons above to export for submission.