Connect 4 — Getting Started

A quick tour of three AI approaches to Connect 4 | Open Simulation | ← All simulations

Activity Overview

Name: ID: Saved

Time: ~10 minutes

Format: Individual or pairs

Materials: Laptop with Connect4.html

Goal: Explore three AI approaches, think about LLMs

Progress: 0 / 0 questions answered

The Connect 4 simulation has five tabs: Play Train RL MCTS LLM Eval LLM-MCTS. In this quick tour you will explore the first three, then brainstorm how LLMs could play a role. Your answers are auto-saved.

Part 1 — Play: Know Your Opponent Play Tab

1A — Feel the Game 2 min

Open Connect4.html. You start on the Play tab. The default mode is Player vs Minimax AI.

Play a quick game against the Minimax AI (depth 4). Try to win — or at least survive.
After the game ends, click "New Game" and try one more time.

Did you win, lose, or draw? Was the AI's play predictable or surprising?

Connect 4 is a solved game — the first player can always force a win with perfect play. The Minimax AI searches a game tree to depth 4. Why can't it play perfectly even though the game is solved?
Hint: how deep is the full game tree for a 7×6 board?

1B — Game Modes at a Glance 1 min

Click the game mode dropdown. You will see five modes: Player vs Player, Player vs Minimax AI, Player vs RL Agent, Player vs MCTS Agent, and RL Agent vs AI.

Each mode uses a different kind of AI (or no AI). In one sentence each, predict how Minimax, RL, and MCTS approach the game differently.

Part 2 — Train RL: Teaching an Agent from Scratch Train RL Tab

2A — Quick Training Run 3 min

Click the Train RL tab. You see algorithm choices (DQN, PPO, REINFORCE, TD(λ), SARSA), hyperparameter sliders, and training controls.

Leave the algorithm on DQN (default) and opponent on Random.
Set episodes to 1000 and click "Start Training".
Watch the four charts update in real time. When training finishes, record the stats below.

Metric	After 1000 episodes
Win Rate (last 100)
Avg Reward

Look at the Win Rate per 100 chart. Does the win rate climb steadily or plateau? What does the curve shape tell you about how DQN learns?

2B — Test Your Trained Agent 1 min

Go back to the Play tab. Change the mode to "Player vs RL Agent".
Play a game against the agent you just trained. (If no model appears, save it first on the Train RL tab.)

How did the RL agent play? Did it seem to have a strategy, or was it mostly random? Why might 1000 episodes against a random opponent not be enough?

Part 3 — MCTS: Thinking by Simulating MCTS Tab

3A — Step Through MCTS 2 min

Click the MCTS tab. You see a Connect 4 board, an MCTS tree canvas, and step controls. MCTS builds a search tree by repeating four phases: Selection → Expansion → Simulation → Backpropagation.

Click "Step Phase" four times (one full iteration). Watch the phase bar highlight each step and look at the Iteration Log.
Then click "+100" to run 100 iterations quickly. Check the Move Rankings table.

After 100 iterations	Value
Best Move (column)
Best Win%
Tree Nodes

MCTS doesn't train a model — it thinks fresh each turn by running simulations. How is this fundamentally different from the RL approach you just tried?

3B — More Iterations, Better Decisions? 1 min

Click "+1000" to add 1000 more iterations (1100 total). Check the Move Rankings again.

Did the recommended move change? Did the win% become more confident? What is the trade-off of running more MCTS iterations?

Part 4 — Looking Ahead: What Could LLMs Do? Think

4A — Three Approaches Compared 1 min

You have now seen three approaches to Connect 4:

Minimax — searches the game tree with a depth limit
RL (DQN) — learns a value function from thousands of self-play games
MCTS — builds a search tree on-the-fly using random simulations

Fill in the comparison below. For each approach, note one strength and one weakness.

Approach	Strength	Weakness
Minimax
RL (DQN)
MCTS

4B — Imagining an LLM Player 2 min

Large Language Models (like GPT-4 or Claude) can read a text description of a board and suggest moves. The simulation has two more tabs — LLM Eval and LLM-MCTS — that you can explore later. For now, just think:

If you asked an LLM "Here is a Connect 4 board — what column should I play?", what kind of knowledge would it draw on? Would it "search" like Minimax or MCTS?

MCTS uses random rollouts (random games to the end) to estimate who is winning. What if you replaced those random rollouts with an LLM that evaluates the board instead? What might improve? What might go wrong?
Hint: think about speed, accuracy, and cost.

Name one task where an LLM could clearly help in a game setting (e.g., explaining moves, analyzing boards, coaching) and one task where traditional search/RL would still be better.

Your answers are auto-saved in your browser. Use the buttons above to export for submission.