MCTS In-Class Activity — Monte Carlo Tree Search
Predict, Experiment, Explain | Open Simulation | ← All simulations
Activity Overview
Type your answers directly into the text boxes below. Your work is auto-saved to your browser. When finished, click "Copy All Answers" or "Download as Text" to submit. Hover over the phase bar labels in the simulation for built-in tooltips.
Open MCTS.html. The board is empty and X moves first. The tree shows a single root node with "0" visits.
MCTS has four phases: Selection, Expansion, Simulation, Backpropagation.
- Before clicking anything: on the very first iteration, there is nothing to "select" yet. What do you think Expansion will do to the tree?
- After a new node is expanded, Simulation plays random moves to the end of the game. Why use random moves instead of smart ones?
Hint: think about computational cost and bias.
- Click "Step Phase" four times — once for each phase. Watch the colored phase bar at the top of the tree and the nodes highlight.
- After all four phases, read the Iteration Log panel on the right.
- Which move was expanded (e.g., A1, B2)? How many nodes are now in the tree?
- What was the simulation result (Win, Loss, or Draw)? After backpropagation, what are the root node's visit count and win rate?
- Click "Step Phase" four more times to run a second iteration. Was the same move expanded, or a different one? Why?
Hint: with only 1 visit each, how does UCB1 decide?
- Click "Reset Tree".
- Click "+100" to run 100 iterations instantly.
- Look at the Move Rankings table and the tree visualization. Click "Fit" if needed.
- Which move has the most visits after 100 iterations? Which has the highest win rate? Are they the same move?
- Look at the tree: are all first-level children equally deep, or are some branches deeper than others? Why?
After running 100 iterations (from step 1C), look at the Move Rankings table. It shows each move's Visits, Win%, Q (average value), and UCB1 score.
- Click on a tree node in the visualization. The UCB1 Live panel shows the UCB1 breakdown for all its children.
- Pick the most-visited root child. Record its Q (exploitation) and exploration bonus from the UCB1 Live panel.
| Component | Most-Visited Child | Least-Visited Child |
|---|---|---|
| Move | ||
| Visits | ||
| Q (win rate) | ||
| Exploration bonus | ||
| UCB1 total |
- Which child has a larger exploration bonus — the most-visited or the least-visited? Explain why using the UCB1 formula.
The slider labeled "C" controls the exploration constant in UCB1. Default is √2 ≈ 1.41.
- Reset. Set C = 0.1 (low exploration). Run +100. Note the tree shape and best move.
- Reset. Set C = 3.0 (high exploration). Run +100. Note the tree shape and best move.
| C = 0.1 | C = 1.41 (default) | C = 3.0 | |
|---|---|---|---|
| Best move after 100 iter. | |||
| Visits for best move |
- With C = 0.1, does the tree look wide or narrow? Why?
- With C = 3.0, does the tree look wide or narrow? Why?
- What is the risk of setting C too low? What about too high?
- Reset. Set C back to default (~1.41). Run +10. Record the best move and its win rate.
- Then click +100. Record again. Then click +1000. Record one more time.
| After 10 iter. | After 110 iter. | After 1110 iter. | |
|---|---|---|---|
| Best move | |||
| Win rate | |||
| Tree nodes |
- Look at the Win Rate Over Iterations chart at the bottom right. Does the win rate stabilize? Around what value?
- Did the best move change between 10 and 1110 iterations? What does this tell you about the reliability of MCTS with few iterations?
The Rollout policy dropdown controls how simulations are played. "Random" makes purely random moves. "Informed" tries to win or block winning moves first.
- Reset. Set rollout policy to Random. Run +100. Record the best move and win rate.
- Reset. Set rollout policy to Informed (win/block). Run +100. Record again.
| Random Rollout | Informed Rollout | |
|---|---|---|
| Best move | ||
| Win rate of best move |
- Did the two rollout policies pick the same best move? Did the win rates differ? Why?
- If informed rollouts are "smarter," why might someone still prefer random rollouts?
Hint: think about what happens in games where we can't easily write a good informed policy.
Use Setup Board to create this position (click cells to cycle empty/X/O):
| A | B | C | |
|---|---|---|---|
| 1 | X | O | · |
| 2 | · | X | · |
| 3 | · | · | O |
It is X's turn. Click "Done Editing", then run +100 iterations.
- What move does MCTS recommend for X? What is its win rate?
- Can you see why this move is strong? Describe X's winning strategy from here.
- Now run +1000 more. Did the recommended move change? Did the win rate become more confident?
Use Setup Board to create this position (O to move):
| A | B | C | |
|---|---|---|---|
| 1 | X | · | · |
| 2 | · | X | · |
| 3 | O | · | · |
Click "Done Editing". Run +500 iterations.
- What move does MCTS recommend for O? Why is this move critical for O?
Hint: look at X's diagonal threat.
The UCB1 formula balances (choosing nodes with high win rates) and (trying nodes with few visits).
Increasing C makes MCTS explore , producing a tree.
With more iterations, the best move's win rate and becomes more reliable.
- MCTS uses random rollouts to estimate a position's value. How is this similar to the Monte Carlo methods from Chapter 5 (which estimate state values by averaging returns from sampled episodes)?
- The UCB1 formula comes from the multi-armed bandit problem (Chapter 2). In MCTS, what are the "arms" and what is the "reward"?
Bonus Challenges (if time permits)
- Set up the empty board. Run +100 with C = 0 (pure exploitation, no exploration). What happens? Does MCTS find a good move?
- Set up a mid-game position where it's obvious who will win. Does MCTS agree? How many iterations does it need to reach >90% confidence?