Connect 4 — RL, MCTS & LLM Evaluation

Train RL agents, explore MCTS search, and evaluate LLM board understanding  |  ← All simulations

Game

Red's turn

Game Traces

0 traces saved

    How to Play Connect 4

    1. Objective: Be the first player to connect four of your pieces in a row — horizontally, vertically, or diagonally.
    2. Taking turns: Red always goes first. Players alternate dropping one piece per turn into any column that is not full.
    3. Dropping pieces: Click a column to drop your piece. It falls to the lowest available row in that column.
    4. Winning: The game ends immediately when a player forms an unbroken line of four pieces. Winning cells are highlighted with a glow.
    5. Draw: If all 42 cells are filled and neither player has four in a row, the game is a draw.

    Game modes:

    • Player vs Player — Two humans take turns on the same screen.
    • Player vs Minimax AI — Play against a classic search-based AI. Higher depth = stronger play (depth 4+ recommended).
    • Player vs RL Agent — Play against a neural-network agent trained in the Train RL tab. Train a model first, then select it here.
    • Player vs MCTS Agent — Play against Monte Carlo Tree Search. More iterations = stronger play.
    • RL Agent vs AI — Watch a trained RL model (Red) play automatically against a configurable AI opponent (Yellow). Adjust speed, step through moves, or auto-play with a live scoreboard.

    Solved Game & Optimal Play

    Connect 4 is a solved game. With perfect play, the first player (Red) can always force a win, provided they open in the center column. If the first player opens in a non-center column, the second player can force a draw.

    The game was independently solved by James D. Allen (October 1, 1988) and Victor Allis (October 16, 1988). Allis used a knowledge-based approach combining nine strategic rules with alpha-beta search, while Allen developed a combinatorial analysis of "threats" — categorizing them as major, minor, and useless. Both proofs demonstrated that the first player wins within at most 41 moves.

    John Tromp later computed a complete 8-ply opening database and extended the solution to boards of various sizes up to width+height=15, requiring approximately 40,000 CPU hours at CWI Amsterdam. His Fhourstones program remains a widely used benchmark for integer performance.

    References:

    1. Allis, V. (1988). A Knowledge-Based Approach of Connect-Four: The Game is Solved: White Wins. M.Sc. Thesis, Report No. IR-163, Faculty of Mathematics and Computer Science, Vrije Universiteit, Amsterdam. [PDF]
    2. Allen, J. D. (1990). Expert Play in Connect-Four. [Link]
    3. Tromp, J. (2008). Solving Connect-4 on Medium Board Sizes. ICGA Journal, 31(2), 110–112. [DOI]
    4. Edelkamp, S. & Kissmann, P. (2008). Symbolic Classification of General Two-Player Games. Proc. KI 2008, LNAI 5243, pp. 185–192. Springer.

    RL Training

    3e-4
    0.99
    1.00
    0.05
    10000
    32
    0
    Episode
    0%
    Win Rate (100)
    0
    Avg Reward
    0
    Avg Loss

    Episode Reward

    Win Rate (per 100)

    Loss

    Algorithm Detail

    Training Log

    Models

    Algorithm Reference

    Deep Q-Network stores a Q-value table as a neural network. Uses experience replay and a target network for stable learning.

    Learning RateStep size for gradient descent (log scale: 10^x). Lower = slower but more stable.
    GammaDiscount factor for future rewards. Higher values make the agent plan further ahead.
    ε StartInitial exploration rate. At 1.0 the agent explores randomly at first.
    ε EndFinal exploration rate. A small value ensures some exploration always remains.
    Replay SizeMax transitions stored in the replay buffer. Larger = more diverse training samples.
    Batch SizeTransitions sampled per training step. Larger = more stable gradients.
    10
    1.41
    1. Selection
    2. Expansion
    3. Simulation
    4. Backpropagation

    Current Board

    0
    Iterations
    1
    Tree Nodes
    -
    Best Move
    -
    Best Win%

    Move Rankings

    ColumnVisitsWin%UCB1

    Win Rate Over Iterations

    Iteration Log

    Waiting for first iteration...

    LLM Configuration

    Evaluation Tasks

    Results

    -
    Accuracy
    0
    Questions
    0
    Correct
    -
    Avg Latency

    Response Log

    LLM-Guided MCTS

    MCTS where the simulation (rollout) step is replaced by LLM position evaluation. Instead of random playouts, the LLM estimates “who is more likely to win?” for each position.

    Configuration

    Uses LLM settings from the LLM Eval tab.

    1.41

    Board Position

    Results

    0
    LLM Calls
    0
    Cache Hits
    -
    Avg Latency
    0
    Cache Size

    Move Rankings

    ColumnVisitsWin%Source

    Log