MCTS In-Class Activity — Monte Carlo Tree Search

Predict, Experiment, Explain | Open Simulation | ← All simulations

Activity Overview

Name: ID: Saved

Time: ~35 minutes

Format: Pairs or small groups

Materials: Laptop with MCTS.html

Method: Predict → Experiment → Explain

Progress: 0 / 0 questions answered

Type your answers directly into the text boxes below. Your work is auto-saved to your browser. When finished, click "Copy All Answers" or "Download as Text" to submit. Hover over the phase bar labels in the simulation for built-in tooltips.

Part 1 — The Four Phases of MCTS

1A — Predict Before You Step 3 min

Open MCTS.html. The board is empty and X moves first. The tree shows a single root node with "0" visits.

MCTS has four phases: Selection, Expansion, Simulation, Backpropagation.

Before clicking anything: on the very first iteration, there is nothing to "select" yet. What do you think Expansion will do to the tree?

After a new node is expanded, Simulation plays random moves to the end of the game. Why use random moves instead of smart ones?
Hint: think about computational cost and bias.

1B — Step Through One Iteration 5 min

Click "Step Phase" four times — once for each phase. Watch the colored phase bar at the top of the tree and the nodes highlight.
After all four phases, read the Iteration Log panel on the right.

Which move was expanded (e.g., A1, B2)? How many nodes are now in the tree?

What was the simulation result (Win, Loss, or Draw)? After backpropagation, what are the root node's visit count and win rate?

Click "Step Phase" four more times to run a second iteration. Was the same move expanded, or a different one? Why?
Hint: with only 1 visit each, how does UCB1 decide?

1C — Watch the Tree Grow 3 min

Click "Reset Tree".
Click "+100" to run 100 iterations instantly.
Look at the Move Rankings table and the tree visualization. Click "Fit" if needed.

Which move has the most visits after 100 iterations? Which has the highest win rate? Are they the same move?

Look at the tree: are all first-level children equally deep, or are some branches deeper than others? Why?

Part 2 — UCB1: Exploration vs Exploitation

2A — Reading UCB1 Values 5 min

After running 100 iterations (from step 1C), look at the Move Rankings table. It shows each move's Visits, Win%, Q (average value), and UCB1 score.

Click on a tree node in the visualization. The UCB1 Live panel shows the UCB1 breakdown for all its children.

Pick the most-visited root child. Record its Q (exploitation) and exploration bonus from the UCB1 Live panel.

Component	Most-Visited Child	Least-Visited Child
Move
Visits
Q (win rate)
Exploration bonus
UCB1 total

Which child has a larger exploration bonus — the most-visited or the least-visited? Explain why using the UCB1 formula.

2B — The Exploration Constant C 5 min

The slider labeled "C" controls the exploration constant in UCB1. Default is √2 ≈ 1.41.

Reset. Set C = 0.1 (low exploration). Run +100. Note the tree shape and best move.
Reset. Set C = 3.0 (high exploration). Run +100. Note the tree shape and best move.

	C = 0.1	C = 1.41 (default)	C = 3.0
Best move after 100 iter.
Visits for best move

With C = 0.1, does the tree look wide or narrow? Why?

With C = 3.0, does the tree look wide or narrow? Why?

What is the risk of setting C too low? What about too high?

Part 3 — Convergence and Rollout Policies

3A — Watching Convergence 4 min

Reset. Set C back to default (~1.41). Run +10. Record the best move and its win rate.
Then click +100. Record again. Then click +1000. Record one more time.

	After 10 iter.	After 110 iter.	After 1110 iter.
Best move
Win rate
Tree nodes

Look at the Win Rate Over Iterations chart at the bottom right. Does the win rate stabilize? Around what value?

Did the best move change between 10 and 1110 iterations? What does this tell you about the reliability of MCTS with few iterations?

3B — Random vs Informed Rollouts 5 min

The Rollout policy dropdown controls how simulations are played. "Random" makes purely random moves. "Informed" tries to win or block winning moves first.

Reset. Set rollout policy to Random. Run +100. Record the best move and win rate.
Reset. Set rollout policy to Informed (win/block). Run +100. Record again.

	Random Rollout	Informed Rollout
Best move
Win rate of best move

Did the two rollout policies pick the same best move? Did the win rates differ? Why?

If informed rollouts are "smarter," why might someone still prefer random rollouts?
Hint: think about what happens in games where we can't easily write a good informed policy.

Part 4 — Analyzing Specific Board Positions

4A — A Critical Position 5 min

Use Setup Board to create this position (click cells to cycle empty/X/O):

	A	B	C
1	X	O	·
2	·	X	·
3	·	·	O

It is X's turn. Click "Done Editing", then run +100 iterations.

What move does MCTS recommend for X? What is its win rate?

Can you see why this move is strong? Describe X's winning strategy from here.

Now run +1000 more. Did the recommended move change? Did the win rate become more confident?

4B — O's Perspective 3 min

Use Setup Board to create this position (O to move):

	A	B	C
1	X	·	·
2	·	X	·
3	O	·	·

Click "Done Editing". Run +500 iterations.

What move does MCTS recommend for O? Why is this move critical for O?
Hint: look at X's diagonal threat.

Part 5 — Reflection

5A — Fill in the Blanks 2 min

During Selection, MCTS traverses the tree by always picking the child with the highest score.
The UCB1 formula balances (choosing nodes with high win rates) and (trying nodes with few visits).
Increasing C makes MCTS explore , producing a tree.
With more iterations, the best move's win rate and becomes more reliable.

5B — Connecting to the Textbook 2 min

MCTS uses random rollouts to estimate a position's value. How is this similar to the Monte Carlo methods from Chapter 5 (which estimate state values by averaging returns from sampled episodes)?

The UCB1 formula comes from the multi-armed bandit problem (Chapter 2). In MCTS, what are the "arms" and what is the "reward"?

Bonus Challenges (if time permits)

Set up the empty board. Run +100 with C = 0 (pure exploitation, no exploration). What happens? Does MCTS find a good move?

Set up a mid-game position where it's obvious who will win. Does MCTS agree? How many iterations does it need to reach >90% confidence?

Your answers are auto-saved in your browser. Use the buttons above to export for submission.