Connect 4 — RL, MCTS & LLM Evaluation
Train RL agents, explore MCTS search, and evaluate LLM board understanding | ← All simulations
Game
Game Traces
How to Play Connect 4
- Objective: Be the first player to connect four of your pieces in a row — horizontally, vertically, or diagonally.
- Taking turns: Red always goes first. Players alternate dropping one piece per turn into any column that is not full.
- Dropping pieces: Click a column to drop your piece. It falls to the lowest available row in that column.
- Winning: The game ends immediately when a player forms an unbroken line of four pieces. Winning cells are highlighted with a glow.
- Draw: If all 42 cells are filled and neither player has four in a row, the game is a draw.
Game modes:
- Player vs Player — Two humans take turns on the same screen.
- Player vs Minimax AI — Play against a classic search-based AI. Higher depth = stronger play (depth 4+ recommended).
- Player vs RL Agent — Play against a neural-network agent trained in the Train RL tab. Train a model first, then select it here.
- Player vs MCTS Agent — Play against Monte Carlo Tree Search. More iterations = stronger play.
- RL Agent vs AI — Watch a trained RL model (Red) play automatically against a configurable AI opponent (Yellow). Adjust speed, step through moves, or auto-play with a live scoreboard.
Solved Game & Optimal Play
Connect 4 is a solved game. With perfect play, the first player (Red) can always force a win, provided they open in the center column. If the first player opens in a non-center column, the second player can force a draw.
The game was independently solved by James D. Allen (October 1, 1988) and Victor Allis (October 16, 1988). Allis used a knowledge-based approach combining nine strategic rules with alpha-beta search, while Allen developed a combinatorial analysis of "threats" — categorizing them as major, minor, and useless. Both proofs demonstrated that the first player wins within at most 41 moves.
John Tromp later computed a complete 8-ply opening database and extended the solution to boards of various sizes up to width+height=15, requiring approximately 40,000 CPU hours at CWI Amsterdam. His Fhourstones program remains a widely used benchmark for integer performance.
References:
- Allis, V. (1988). A Knowledge-Based Approach of Connect-Four: The Game is Solved: White Wins. M.Sc. Thesis, Report No. IR-163, Faculty of Mathematics and Computer Science, Vrije Universiteit, Amsterdam. [PDF]
- Allen, J. D. (1990). Expert Play in Connect-Four. [Link]
- Tromp, J. (2008). Solving Connect-4 on Medium Board Sizes. ICGA Journal, 31(2), 110–112. [DOI]
- Edelkamp, S. & Kissmann, P. (2008). Symbolic Classification of General Two-Player Games. Proc. KI 2008, LNAI 5243, pp. 185–192. Springer.
RL Training
Episode Reward
Win Rate (per 100)
Loss
Algorithm Detail
Training Log
Models
Algorithm Reference
Deep Q-Network stores a Q-value table as a neural network. Uses experience replay and a target network for stable learning.
| Learning Rate | Step size for gradient descent (log scale: 10^x). Lower = slower but more stable. |
| Gamma | Discount factor for future rewards. Higher values make the agent plan further ahead. |
| ε Start | Initial exploration rate. At 1.0 the agent explores randomly at first. |
| ε End | Final exploration rate. A small value ensures some exploration always remains. |
| Replay Size | Max transitions stored in the replay buffer. Larger = more diverse training samples. |
| Batch Size | Transitions sampled per training step. Larger = more stable gradients. |
Current Board
Move Rankings
| Column | Visits | Win% | UCB1 |
|---|
Win Rate Over Iterations
Iteration Log
LLM Configuration
Evaluation Tasks
Results
Response Log
LLM-Guided MCTS
MCTS where the simulation (rollout) step is replaced by LLM position evaluation. Instead of random playouts, the LLM estimates “who is more likely to win?” for each position.
Configuration
Uses LLM settings from the LLM Eval tab.
Board Position
Results
Move Rankings
| Column | Visits | Win% | Source |
|---|