← Back to simulations

LLM-Guided MCTS vs Naive MCTS

Comparing LLM position evaluation against random rollouts in Monte Carlo Tree Search on Tic-Tac-Toe

Guided Walkthrough

Algorithm Overview

Traditional MCTS uses random rollouts (random play to game end) to estimate position value. LLM-guided MCTS replaces this with an LLM evaluation — the model directly estimates the win probability. This is analogous to how AlphaGo replaced random rollouts with a neural network value function. At low iteration counts, the quality difference is most visible.

Naive MCTS

1. Selection (UCB1)
2. Expansion
Key Difference ►
3. Random Rollout
4. Backpropagation

LLM-Guided MCTS

1. Selection (UCB1)
2. Expansion
◄ Key Difference
3. LLM Evaluation
4. Backpropagation

LLM Configuration Not configured — click to set up

1.41

Board Position

X to move
A
B
C
1
2
3
Naive best   LLM best   Click empty cells to set up positions.

Analysis Results

Click "Analyze Position" to compare both algorithms.
Tree legend: each circle is a node in the MCTS search tree. Top number = visit count, bottom = win rate. Color: green = high win rate (good for root player), red = low win rate (bad), gray = unvisited. blue border = root (Naive), purple border = root (LLM). Hover any node for details.

Naive MCTS (Random Rollouts)

Best Move
Win%
Nodes
Time
Move Rankings
MoveVisitsWin%

LLM-Guided MCTS

Best Move
Win%
Nodes
Time
LLM Calls
Cache Hits
Avg Latency
Move Rankings
MoveVisitsWin%
Win Rate Convergence