Dynamic Programming In-Class Activity

The Gambler's Problem — Sutton & Barto, Example 4.3 | Open Simulation | ← All simulations

Activity Overview

Name: Partner: ID: Saved

Time: ~15 minutes

Format: Individual or pairs

Materials: Laptop with DynamicProgramming.html

Method: Predict → Experiment → Explain

Progress: 0 / 0 questions answered

A gambler repeatedly bets on coin flips. Heads wins the stake, tails loses it. The goal is to reach $100 starting from some capital. Value iteration finds the optimal betting strategy. Type your answers below — they are auto-saved.

Part 1 — Predict & First Look (3 min)

1A — Predict Before You Run 2 min

Open DynamicProgramming.html. The default coin probability is p_h = 0.40 (the coin is biased against the gambler). The value function V(s) gives the probability of winning from capital $s.

Before running anything: guess V(50) — the probability of reaching $100 from $50 with a 40% coin. Would you bet big or small? Why?

1B — Run to Convergence 1 min

Click "Run to Convergence". Wait for the status to read "Converged after … sweeps".
Hover over the V(s) chart at s=25, s=50, and s=75 to read the converged values.

State	Converged V(s)
V(25)
V(50)
V(75)

How close was your prediction? Were you surprised by V(50)?

Part 2 — The Optimal Policy (4 min)

2A — Reading the Policy Chart 2 min

Look at the Optimal Policy π*(s) chart. The green bars show the optimal stake at each capital level.

Hover over the policy chart to read specific stake values.

Capital	Optimal Stake
s = 25
s = 50
s = 75

The policy chart has a distinctive spiky pattern — certain states have very high stakes while neighboring states bet small. Why do spikes appear at s=25, s=50, and s=75?
Hint: what is special about these numbers relative to the goal of $100?

2B — Why Go All-In? 2 min

The policy says the gambler should sometimes bet everything (e.g., at s=50 the stake might equal 50). Why is "all-in" optimal when the coin is unfavorable?
Hint: with a bad coin, does the gambler benefit from many flips or few flips?

Part 3 — Change the Coin (5 min)

3A — Fair Coin (p_h = 0.50) 2 min

Move the p_h slider to 0.50 (this resets automatically). Click "Run to Convergence".

V(50) at p_h = 0.50

What shape does V(s) have with a fair coin? Why does this make sense?
Hint: with p_h = 0.50, the gambler's chance of reaching $100 from $s is just s/100.

3B — Favorable Coin (p_h = 0.55) 1 min

Set p_h = 0.55. Run to convergence. Look at both V(s) and the policy.

V(50) at p_h = 0.55

How has the policy changed? Is it still spiky, or does the gambler use a different strategy?
Hint: with a favorable coin, is it better to take many small bets or a few large ones?

3C — The Big Picture 2 min

Summarize: how does the optimal strategy change as p_h goes from unfavorable (0.40) to fair (0.50) to favorable (0.55)? Why is the difference so dramatic?

Part 4 — Simulate & Reflect (3 min)

4A — Watch It Play 2 min

Set p_h back to 0.40 and run to convergence. Then scroll down to "Simulate Optimal Policy".

Set starting capital to $50 and click "Play Episode" three times. Tally wins and losses.

Run	Result (Win/Loss)
1
2
3

How many of your 3 runs ended in a win? Is this consistent with V(50)?

4B — Fill in the Blanks 1 min

Value iteration repeatedly applies the equation to update V(s).
The algorithm converges when the maximum change Δ falls below the threshold .
When the coin is unfavorable (p_h < 0.5), the optimal strategy is to bet to minimize the number of flips.
When the coin is favorable (p_h > 0.5), the optimal strategy is to bet to let the edge compound over many flips.
The spikes in the policy at s = 25, 50, 75 occur because these capitals can reach in a single winning bet.

Your answers are auto-saved in your browser. Use the buttons above to export for submission.