LLM-Guided MCTS vs Naive MCTS
Comparing LLM position evaluation against random rollouts in Monte Carlo Tree Search on Tic-Tac-Toe
Guided Walkthrough
Algorithm Overview
Traditional MCTS uses random rollouts (random play to game end) to estimate position value. LLM-guided MCTS replaces this with an LLM evaluation — the model directly estimates the win probability. This is analogous to how AlphaGo replaced random rollouts with a neural network value function. At low iteration counts, the quality difference is most visible.
Naive MCTS
1. Selection (UCB1)
↓
↓
Key Difference ►
3. Random Rollout
↓
4. Backpropagation
LLM-Guided MCTS
1. Selection (UCB1)
↓
↓
◄ Key Difference
3. LLM Evaluation
↓
4. Backpropagation
LLM Configuration Not configured — click to set up
Board Position ⓘ
X to move
A
B
C
1
2
3
Naive best
LLM best
Click empty cells to set up positions.
Analysis Results
Click "Analyze Position" to compare both algorithms.
Tree legend: each circle is a node in the MCTS search tree. Top number = visit count, bottom = win rate.
Color: green = high win rate (good for root player),
red = low win rate (bad),
gray = unvisited.
blue border = root (Naive),
purple border = root (LLM).
Hover any node for details.
Naive MCTS (Random Rollouts)
Best Move
—
Win%
—
Nodes
—
Time
—
Move Rankings
| Move | Visits | Win% |
|---|
LLM-Guided MCTS
Best Move
—
Win%
—
Nodes
—
Time
—
LLM Calls
—
Cache Hits
—
Avg Latency
—
Move Rankings
| Move | Visits | Win% |
|---|
Win Rate Convergence
ⓘ