LLM-Guided MCTS vs Naive MCTS

Comparing LLM position evaluation against random rollouts in Monte Carlo Tree Search on Tic-Tac-Toe

Guided Walkthrough

Read the Algorithm Overview below. Compare the two flow diagrams. Both share Selection, Expansion, and Backpropagation — the only difference is Step 3: Naive plays random moves to a terminal state, while LLM asks a language model to estimate the win probability in one shot. Notice how this mirrors AlphaGo's replacement of rollouts with a neural network value function.
Try Demo Mode. Click Load Demo in the LLM Configuration panel — no API key needed. This loads 1,440 pre-computed positions from Qwen 3.5 122B, giving you instant results with zero configuration. For a quicker look, try Load Quick Demo (110 positions, mid-game with 2X+1O), Load Local Demo — 33 positions from a local Ollama Qwen3 8B, or Load Qwen3 32B Demo — the same 2X+2O mid-game position evaluated by the larger Qwen3 32B model for richer analysis. With Load Demo the board stays empty (analysis starts from scratch); Quick Demo and Local Demo show pre-placed pieces — partially-played games ready to analyze. The Local Demo starts deepest into the game (4 pieces already placed).
Analyze a position. Press Analyze Position to run 100 iterations of both algorithms on the current board. In demo mode, both Naive and LLM results appear instantly — every evaluation is served from the pre-loaded cache, so there are no API calls to wait for. Watch the best-move highlights appear on the board — blue for Naive, purple for LLM. With Quick Demo, notice how the mid-game position changes which moves are recommended.
Compare the Analysis Results. Scroll to the side-by-side analysis panel. Check whether the algorithms agree on the best move. Compare Win% values, node counts, and timing. On the LLM side, also check LLM Calls, Cache Hits, and Avg Latency. In demo mode, LLM Calls shows 0 and latency is near-zero — this is expected because the cache pre-computed everything. Look at the mini search trees — the LLM tree often has more confident (greener) nodes because each evaluation is informed, not random. The convergence chart below shows how quickly each algorithm's win-rate estimate stabilizes.
Run a match. Press Run Match to pit the two algorithms against each other over multiple games. The W-D-L record updates live. Sides alternate each game for fairness. With Quick Demo, matches start from the mid-game position — games are shorter and every move comes from the cache. Watch the win rate chart trend over games and the latency chart showing the speed vs. quality trade-off. The match log at the bottom records each game's result and move sequence.
Experiment with settings. Try changing the C slider and iteration count. At low iterations (50), the LLM advantage is most visible. At high iterations (1000), Naive catches up as random sampling averages out. Try loading the Quick Demo, then click board cells to modify the position before analyzing — if you place moves outside the cache, the LLM will be called live (requires Free Play or an API key). Analyze the same position twice — the second run will show many more Cache Hits and finish faster, demonstrating the evaluation cache.
Go live with your own LLM. Ready to move beyond the demo cache? Click Free Play for instant live LLM calls via a free DeepSeek demo — no API key needed. Or select a provider (default: DeepSeek) and enter your API key. Try switching providers (OpenAI, Anthropic, Gemini) and compare live results to what you saw in demo mode. Live calls show actual latency in the progress bar — compare move quality and speed across providers. Smaller/cheaper models may evaluate faster but less accurately.

Algorithm Overview

Traditional MCTS uses random rollouts (random play to game end) to estimate position value. LLM-guided MCTS replaces this with an LLM evaluation — the model directly estimates the win probability. This is analogous to how AlphaGo replaced random rollouts with a neural network value function. At low iteration counts, the quality difference is most visible.

Naive MCTS

1. Selection (UCB1)

↓

2. Expansion

↓

Key Difference ►

3. Random Rollout

↓

4. Backpropagation

LLM-Guided MCTS

1. Selection (UCB1)

↓

2. Expansion

↓

◄ Key Difference

3. LLM Evaluation

↓

4. Backpropagation

LLM Configuration FREE PLAYDEMONot configured — click to set up

C 1.41

Iterations:

Board Position ⓘ (click to place pieces)

X to move

Naive best LLM best Click empty cells to set up positions.

Analysis Results

Click "Analyze Position" to compare both algorithms.

Tree legend: each circle is a node in the MCTS search tree. Top number = visit count, bottom = win rate. Color: green = high win rate (good for root player), red = low win rate (bad), gray = unvisited. blue border = root (Naive), purple border = root (LLM). Hover any node for details.

Naive MCTS (Random Rollouts)

Best Move

—

Win%

—

Nodes

—

Time

—

Move Rankings

Move	Visits	Win%

LLM-Guided MCTS

Best Move

—

Win%

—

Nodes

—

Time

—

LLM Calls

—

Cache Hits

—

Avg Latency

—

Move Rankings

Move	Visits	Win%

Win Rate Convergence ⓘ

Match games:

LLM-MCTS plays first

Match Arena

LLM-MCTS 0 - 0 - 0 Naive MCTS

Games Played

LLM Wins

Naive Wins

Draws

Match results will appear here...

Win Rate Over Games

Move Latency per Game (ms)

Eval Checker Oracle vs LLM Comparison

Compares every position in the LLM evaluation cache against a perfect minimax oracle. The oracle computes the exact game-theoretic value of each position (1.0 = X wins, 0.5 = draw, 0.0 = O wins with perfect play). Load a demo (or run an analysis) first to populate the cache, then click Run Checker.

Min depth:

Show:

No evaluations in cache. Load a demo or run an analysis first.

LLM-Guided MCTS vs Naive MCTS

Guided Walkthrough Show

Algorithm Overview

Naive MCTS

LLM-Guided MCTS

LLM Configuration FREE PLAYDEMONot configured — click to set up

Analysis Log Prompts

Board Position ⓘ (click to place pieces)

Analysis Results

Naive MCTS (Random Rollouts)

LLM-Guided MCTS

Match Arena

Eval Checker Oracle vs LLM Comparison

Guided Walkthrough

Analysis Log