Connect 4 — Getting Started
A quick tour of three AI approaches to Connect 4 | Open Simulation | ← All simulations
Activity Overview
The Connect 4 simulation has five tabs: Play Train RL MCTS LLM Eval LLM-MCTS. In this quick tour you will explore the first three, then brainstorm how LLMs could play a role. Your answers are auto-saved.
Open Connect4.html. You start on the Play tab. The default mode is Player vs Minimax AI.
- Play a quick game against the Minimax AI (depth 4). Try to win — or at least survive.
- After the game ends, click "New Game" and try one more time.
- Did you win, lose, or draw? Was the AI's play predictable or surprising?
- Connect 4 is a solved game — the first player can always force a win with perfect play. The Minimax AI searches a game tree to depth 4. Why can't it play perfectly even though the game is solved?
Hint: how deep is the full game tree for a 7×6 board?
Click the game mode dropdown. You will see five modes: Player vs Player, Player vs Minimax AI, Player vs RL Agent, Player vs MCTS Agent, and RL Agent vs AI.
- Each mode uses a different kind of AI (or no AI). In one sentence each, predict how Minimax, RL, and MCTS approach the game differently.
Click the Train RL tab. You see algorithm choices (DQN, PPO, REINFORCE, TD(λ), SARSA), hyperparameter sliders, and training controls.
- Leave the algorithm on DQN (default) and opponent on Random.
- Set episodes to 1000 and click "Start Training".
- Watch the four charts update in real time. When training finishes, record the stats below.
| Metric | After 1000 episodes |
|---|---|
| Win Rate (last 100) | |
| Avg Reward |
- Look at the Win Rate per 100 chart. Does the win rate climb steadily or plateau? What does the curve shape tell you about how DQN learns?
- Go back to the Play tab. Change the mode to "Player vs RL Agent".
- Play a game against the agent you just trained. (If no model appears, save it first on the Train RL tab.)
- How did the RL agent play? Did it seem to have a strategy, or was it mostly random? Why might 1000 episodes against a random opponent not be enough?
Click the MCTS tab. You see a Connect 4 board, an MCTS tree canvas, and step controls. MCTS builds a search tree by repeating four phases: Selection → Expansion → Simulation → Backpropagation.
- Click "Step Phase" four times (one full iteration). Watch the phase bar highlight each step and look at the Iteration Log.
- Then click "+100" to run 100 iterations quickly. Check the Move Rankings table.
| After 100 iterations | Value |
|---|---|
| Best Move (column) | |
| Best Win% | |
| Tree Nodes |
- MCTS doesn't train a model — it thinks fresh each turn by running simulations. How is this fundamentally different from the RL approach you just tried?
- Click "+1000" to add 1000 more iterations (1100 total). Check the Move Rankings again.
- Did the recommended move change? Did the win% become more confident? What is the trade-off of running more MCTS iterations?
You have now seen three approaches to Connect 4:
- Minimax — searches the game tree with a depth limit
- RL (DQN) — learns a value function from thousands of self-play games
- MCTS — builds a search tree on-the-fly using random simulations
- Fill in the comparison below. For each approach, note one strength and one weakness.
| Approach | Strength | Weakness |
|---|---|---|
| Minimax | ||
| RL (DQN) | ||
| MCTS |
Large Language Models (like GPT-4 or Claude) can read a text description of a board and suggest moves. The simulation has two more tabs — LLM Eval and LLM-MCTS — that you can explore later. For now, just think:
- If you asked an LLM "Here is a Connect 4 board — what column should I play?", what kind of knowledge would it draw on? Would it "search" like Minimax or MCTS?
- MCTS uses random rollouts (random games to the end) to estimate who is winning. What if you replaced those random rollouts with an LLM that evaluates the board instead? What might improve? What might go wrong?
Hint: think about speed, accuracy, and cost.
- Name one task where an LLM could clearly help in a game setting (e.g., explaining moves, analyzing boards, coaching) and one task where traditional search/RL would still be better.