Playing as: Guest Anonymous (Elo not tracked)
BETA

⚔️ Natural Language RTS Arena

Elo Leaderboard  |  About  |  Game Rules  |  LLM Interaction  |  Disclaimer  |  Engine Spec  |  Parser Grammar  |  🌙 Dark
Engine Command Log ×

Engine Command Log

This panel shows every command the game engine executes, in real time.

Valid commands (executed)
Invalid commands (rejected)

Example commands:

W1 move (5,3)
all workers harvest
train 2 light
squad Alpha attack (6,6)
build barracks

Blue commands align left, Red commands align right.

Toggle raw actions to see per-tick unit actions.

View full Engine Specification →

🟦 Player 1 (Blue)

5ticks
▶ 1. Game Rules (read-only)

            
▶ 2. Live Board Info (read-only)
=== TURN 5/100 ===
RESOURCES: You=8, Enemy=5

YOUR FORCES & ORDERS:
  Base at (1,1) [10/10hp]
  W1 (worker, 1/1hp) at (2,1): harvesting
  W2 (worker, 1/1hp) at (1,2): moving to (5,10)
  L1 (light, 4/4hp) at (3,3): attacking (6,6)

ENEMY FORCES:
  Base at (6,6) [10/10hp]
  Worker at (5,6) [1/1hp]
  Worker at (6,5) [1/1hp]

MAP (you=UPPERCASE, enemy=lowercase):
  01234567
0 ........
1 .B......
2 .W......
3 ...L..$.
4 ......$.
5 .....w..
6 .....wb.
7 ........

(This section is auto-generated each turn)
Edit freely or pick a preset. Click Save to persist across sessions.

🔍 Interpreter LLM (NL fallback parser)

Configure to enable NL fallback for ambiguous commands.
Configure both players, then hit Play.
1000ms
100
▶ Match History

🟥 Player 2 (Red)

5ticks
▶ 1. Game Rules (read-only)

            
▶ 2. Live Board Info (read-only)
=== TURN 5/100 ===
RESOURCES: You=8, Enemy=5

YOUR FORCES & ORDERS:
  Base at (1,1) [10/10hp]
  W1 (worker, 1/1hp) at (2,1): harvesting
  W2 (worker, 1/1hp) at (1,2): moving to (5,10)
  L1 (light, 4/4hp) at (3,3): attacking (6,6)

ENEMY FORCES:
  Base at (6,6) [10/10hp]
  Worker at (5,6) [1/1hp]
  Worker at (6,5) [1/1hp]

MAP (you=UPPERCASE, enemy=lowercase):
  01234567
0 ........
1 .B......
2 .W......
3 ...L..$.
4 ......$.
5 .....w..
6 .....wb.
7 ........

(This section is auto-generated each turn)
This strategy is prepended to the prompt. Edit freely or pick a preset.

About NL RTS Arena

Research Prototype

The Natural Language RTS Arena is a research prototype for studying how Large Language Models (LLMs) perform as strategic decision-makers in real-time strategy games. Instead of traditional click-based controls or scripted AI, players (both human and LLM) issue commands entirely in natural language.

Motivation

RTS games require a unique combination of cognitive skills: strategic planning, resource management, tactical execution, and real-time adaptation. These are precisely the capabilities we want to evaluate in LLMs. By constraining the interface to natural language, we create a testbed that measures:

1. Strategic reasoning — Can the LLM formulate and execute multi-step strategies (economy first, then military buildup, then attack)?

2. Spatial understanding — Can it reason about unit positions, distances, and map control from text-based state descriptions?

3. Instruction following — Does it produce well-structured commands that the game engine can parse, or does it ramble?

4. Adaptation — Does it change strategy based on the opponent's actions, resource levels, and unit losses?

Architecture

The system uses a two-tier command processing architecture. A fast deterministic parser handles common command patterns instantly. Commands the parser can't handle are forwarded to an interpreter LLM for best-effort translation. This is transparent to the user — all natural language input is accepted.

The parser's grammar is defined in an external specification file (parser-spec.json) that can be edited by a game designer without modifying code. Changes take effect on page reload.

How to Use

Configure both players (LLM models or Human), select an interpreter LLM for natural language fallback parsing, and click Play. Human players type natural language commands. LLM players receive the game state as a text prompt and respond with commands.

All prompts, responses, and game engine commands are logged and can be examined via the status bar (click any section) and the Engine Command Log (left edge tab). Saved games include full replay data and raw LLM logs.

Links

Game Engine Specification  |  Parser Grammar (BNF)  |  Command Architecture

Show welcome dialog again

Disclaimer

Free to Play

This game is free to play. You can use it with local LLMs via Ollama (free), with the built-in "On the House" models (free), or by supplying your own API key for a cloud LLM provider.

API Keys & Privacy

API keys are stored locally in your browser's localStorage. They are never transmitted to any server other than the LLM provider you selected.

Estimated Costs

Provider / ModelEst. Cost per Game
Ollama (local)Free
On the House modelsFree
DeepSeek Chat~$0.01
GPT-4o-mini / 4.1-mini~$0.01
Gemini 2.0 Flash~$0.01
GPT-4o / 4.1~$0.05
Claude Sonnet 4~$0.08
GPT-5~$0.08
Claude Opus 4~$0.15

Estimates based on ~2K input + ~500 output tokens per LLM consultation, ~20 consultations per 100-turn game. Prices as of March 2026.

No Warranty

This software is provided "as is", without warranty of any kind, express or implied. The authors are not liable for any damages, data loss, or unexpected API charges arising from the use of this application. You are responsible for any costs incurred through your own API keys.

Age Requirement

You must be at least 13 years old to use this application. If you are under 18, you should have permission from a parent or guardian.

Data Collection

All interactions in the game may be recorded, including your IP address. By playing, you acknowledge and accept these terms.

Game Rules & Mechanics

Objective

Destroy all enemy bases and units. If the game reaches the max turn limit (default 100), the player with higher total Hit Points (HP) wins.

Starting Conditions

Selectable grid size (8×8 or 16×16) with symmetric layout. Each player starts with 1 Base, 2 Workers, and 5 resources.

Units

UnitHPCostBuildSpeedDmgRangeBuilt By
Worker (W)114t111Base
Light (L)425t221Barracks
Heavy (H)838t141Barracks
Ranged (R)226t113Barracks

Workers harvest resources. Light units are fast. Heavy units are tanks. Ranged units attack from 3 cells away.

Buildings

BuildingHPCostBuildProduces
Base (B)101015tWorker
Barracks (K)5510tLight, Heavy, Ranged

NL Commands

ExampleDescription
W1 move to 5 10Move a specific unit
All workers harvestAssign workers to gather
Build barracksBuild barracks near base (costs 5)
Train 2 lightQueue units for production
Squad Alpha attack the enemy baseSend a squad to attack
Everyone defend the baseRally all units to defend
Form squad Bravo with H1 L2Create a named squad

Timing & Turn Model

Two modes are available, selectable before the match:

Sync Mode (default):

1. If it is time to consult Blue (every N ticks), the game pauses and waits for Blue's LLM to respond.

2. Then, if it is time to consult Red, the game pauses and waits for Red's LLM.

3. Both players' orders are executed simultaneously, then one tick advances.

4. After the tick, the game waits for the Tick Speed delay before the next tick.

LLM calls are sequential (Blue first). A slow LLM freezes the game until it responds.

Async Mode:

Ticks advance on a fixed timer (Tick Speed, e.g. 800ms) regardless of LLM latency. LLM calls run in the background. If an LLM is too slow, it misses turns — its previous orders keep executing. "Consult every N ticks" means the engine will attempt a consultation every N ticks, but only if the previous call has finished. Fast LLMs get more consultations; slow LLMs miss opportunities.

This mode penalizes slow models and rewards fast ones, creating a more realistic real-time pressure.

Standing Orders

In NL mode, each unit has its own standing order in a per-unit order table. Orders persist until overridden by a new command with equal or higher specificity. Units start idle.

Map Legend

UPPERCASE = your units  |  lowercase = enemy units

Base (you) Base (enemy) B/b Base
Barracks (you) Barracks (enemy) K/k Barracks
Worker (you) Worker (enemy) W/w Worker
Light (you) Light (enemy) L/l Light
Heavy (you) Heavy (enemy) H/h Heavy
Ranged (you) Ranged (enemy) R/r Ranged
Resource $ Resource
Wall # Wall

NL Mode: How the Game Engine Interacts with the LLM

Natural Language Commands

In NL mode, LLMs issue natural language commands instead of rigid structured commands. Examples: W1 move to 5 10, Squad Alpha attack the enemy base, All workers harvest.

Unit Tags & Squads

Each unit gets a persistent tag (W1, L2, H1, R3). Units can be grouped into named squads with Form squad Alpha with W1 L1 R1. Commands can target individual units, squads, unit types, spatial regions, or all units.

Two-Stage Parsing

Commands are parsed in two stages:

1. Deterministic parser (● green): Regex-based, instant, handles standard command formats.

2. Interpreter LLM fallback (○ yellow): If the deterministic parser can't understand a line, an optional interpreter LLM translates it to structured JSON.

Lines that neither parser can handle show as ✗ red (unparsed).

Standing Orders

Unlike the classic mode, NL mode maintains a per-unit order table. Each unit retains its last order until a new, more specific order overrides it. More specific selectors (unit ID > squad > spatial > type > universal) take priority.

Player 2 Human Mode

Player 2 can be set to "Human" mode. Type natural language commands (or shortcuts) and press Enter — commands are sent to the interpreter immediately. If the interpreter is busy, commands queue up and are sent when ready.

Consultation Frequency & Timing

Same as classic H2H: the LLM is consulted every N ticks. Between consultations, standing orders keep executing via the NL Tactical AI.

Warmup

For Ollama models, a warmup call is sent before the game starts to load the model into GPU memory.

LLM Raw Log