← Back to RTS Arena
Why RTS Arena?
On the idea of real-time strategy games as testbeds for AI, and a short history of the field.
Motivation
Real-time strategy (RTS) games have long been regarded as one of the grand challenges for artificial intelligence.
Unlike turn-based board games such as chess or Go, RTS games demand that agents reason under partial observability,
manage real-time concurrent actions, and plan over combinatorially vast action spaces—all
at once. A single turn in a typical RTS game can involve dozens of units, each with multiple possible actions,
yielding a branching factor that dwarfs even Go's famously large search space.
These properties make RTS games an ideal proving ground for AI techniques ranging from classical planning and
Monte Carlo tree search to modern deep reinforcement learning and large language model (LLM) reasoning.
As Yannakakis and Togelius observe, games are "the perfect testbed for artificial intelligence" because they
provide well-defined goals, measurable performance, and controllable complexity [2].
Why This Simple RTS Game?
RTS Arena is modeled after MicroRTS (μRTS), a minimalist real-time strategy game
originally created by Santiago Ontañón in 2013 [3] specifically as a research platform.
MicroRTS strips away the elaborate graphics and sprawling tech trees of commercial RTS titles
(StarCraft, Age of Empires, Dota 2) while preserving the core computational challenges:
resource gathering, base building, unit production, and tactical combat on a small grid map.
This simplicity is a feature, not a limitation. By reducing the game to its strategic essence,
MicroRTS allows researchers to iterate quickly, compare algorithms fairly, and focus on the
fundamental AI problems rather than engineering around game-engine quirks.
The annual MicroRTS AI Competition,
held at the IEEE Conference on Games since 2017, has become a premier venue for testing RTS AI agents [6].
RTS Arena brings this tradition to the browser. It lets anyone pit an LLM-driven commander against
scripted AI opponents (or other LLMs), observe games in real time, and replay past battles.
The goal is to make RTS AI research accessible—no Java environment, no complex setup,
just a web page and a model.
A Brief History of AI Game Agents
The idea of machines playing games stretches back to the very origins of computer science.
Claude Shannon's 1950 paper on chess programming and Alan Turing's paper checkers-playing
routine were among the earliest works in AI. Over the following decades, games became the
canonical benchmark for measuring machine intelligence—from Samuel's checkers player (1959)
to the momentous events of the late 1990s and beyond.
Timeline: AI Defeats Human Champions
1992 —
TD-Gammon (Gerald Tesauro, IBM) learns backgammon through self-play using
temporal-difference learning, reaching expert-level play [12].
1997 —
IBM Deep Blue defeats world chess champion Garry Kasparov in a six-game match,
the first time a computer beat a reigning champion under standard tournament conditions.
2007 —
Checkers solved. Chinook (Jonathan Schaeffer et al.) proves that perfect play
from both sides results in a draw.
2011 —
IBM Watson defeats Jeopardy! champions Ken Jennings and Brad Rutter.
2013 —
Atari DQN (DeepMind). Deep Q-Networks learn to play dozens of Atari 2600 games
from raw pixels, achieving superhuman scores on several titles [13].
2016 —
AlphaGo (DeepMind) defeats Go world champion Lee Sedol 4–1 in Seoul,
a landmark previously thought decades away [14].
2017 —
AlphaGo Zero surpasses all prior Go programs by learning entirely from
self-play with no human data [15]. Libratus (CMU) defeats top poker professionals in no-limit Texas Hold'em.
2018 —
AlphaZero masters chess, shogi, and Go with a single general algorithm [16].
2019 —
AlphaStar (DeepMind) reaches Grandmaster level in StarCraft II,
the first AI to achieve top-tier performance in a major commercial RTS game [19].
OpenAI Five defeats the Dota 2 world champions OG [24].
2020 —
MuZero masters Atari, Go, chess, and shogi by learning the game model
itself, without being told the rules [17]. Tencent's AI reaches top human performance
in Honor of Kings, a full MOBA game [25].
2022 —
AlphaCode reaches median human competitor level in programming contests [20].
AlphaTensor discovers faster matrix multiplication algorithms [21].
2023 —
AlphaDev discovers faster sorting algorithms now used in standard C++ libraries [22].
2024 —
AlphaGeometry solves International Mathematical Olympiad geometry problems
approaching gold-medalist performance [23].
Thread 1: Reinforcement Learning
Reinforcement learning (RL)—the framework in which an agent learns by trial-and-error
interaction with an environment—is the theoretical backbone of nearly all game-playing AI.
The foundational text is Sutton and Barto's Reinforcement Learning: An Introduction [1],
first published in 1998 and updated in 2020, which lays out the core ideas: Markov decision
processes, temporal-difference learning, policy gradient methods, and function approximation.
The progression from tabular Q-learning to deep RL was catalyzed by DeepMind's 2015 DQN paper [13],
which demonstrated that a single neural network could learn control policies from raw Atari pixels.
This sparked an explosion of deep RL research, leading directly to the AlphaGo family of agents
and eventually to multi-agent RL systems capable of handling the complexity of RTS games.
Silver et al. provocatively argued in 2021 that "reward is enough"—that the maximization
of a suitably rich reward signal could, in principle, give rise to all aspects of intelligence [26].
Whether or not this thesis holds in full generality, it captures the spirit that has driven
game AI research for decades: build agents that learn to win, and intelligence follows.
Thread 2: The MicroRTS Competition
Santiago Ontañón introduced MicroRTS in 2013 as a lightweight testbed for RTS AI
research [3]. Unlike full-scale commercial RTS games, MicroRTS runs in Java with a simple API,
small maps, and fast simulation—enabling thousands of games per hour on modest hardware.
The first MicroRTS AI Competition was organized in 2017 at the AAAI Conference on
Artificial Intelligence and Interactive Digital Entertainment [6], quickly becoming an annual fixture
at the IEEE Conference on Games
(2023,
2024).
The competition has driven a remarkable progression of approaches:
- Search-based methods — Early winners used Monte Carlo tree search (MCTS) and variants
adapted for the large branching factor of RTS games [4, 5].
- Programmatic strategies — Scripted bots such as
UTS_Imass (2019 winner),
Coac AI (2020 winner), and
Mayari (2021 winner)
demonstrated the effectiveness of hand-crafted strategies combined with planning abstractions.
- Deep RL agents — Huang et al.'s Gym-μRTS [10] provided a Python/OpenAI Gym interface,
enabling deep RL research on MicroRTS with PPO and other algorithms.
Goodfriend's deep RL agent won the 2023 competition [11], showing that learned policies
can outperform sophisticated scripted bots.
- Program synthesis — Moraes, Aleixo, Lelis et al. explored asymmetric action abstractions [7],
behavioral cloning from weak demonstrations [8], and local-learner methods for synthesizing
programmatic strategies [9]—the 2L baseline placed second in the 2023 competition despite
being intended as a reference agent.
The MicroRTS ecosystem now includes the
original Java implementation,
a Python wrapper maintained by the
Farama Foundation, and an active
research community.
Thread 3: The Alpha Series
DeepMind's "Alpha" lineage of agents represents perhaps the most sustained and celebrated
arc in AI game research, led by David Silver and collaborators.
- AlphaGo (2016) combined deep neural networks with Monte Carlo tree search
to defeat Lee Sedol at Go [14]—a game where the number of possible positions exceeds
the number of atoms in the observable universe.
- AlphaGo Zero (2017) eliminated all reliance on human game data, learning
entirely through self-play and surpassing its predecessor within days [15].
- AlphaZero (2018) generalized the approach to chess and shogi, demonstrating
that a single algorithm could master multiple games from scratch [16].
- MuZero (2020) went further still, learning the dynamics model of the game
environment itself rather than requiring explicit rules [17].
The most directly relevant to RTS is AlphaStar (2019), which achieved Grandmaster
level in the full commercial game StarCraft II [19]. AlphaStar used a transformer-based architecture,
multi-agent training in a league of specialists, and operated under conditions close to those of
human players (limited actions per minute, camera restrictions). It demonstrated that deep RL could
scale to imperfect-information, real-time, multi-agent domains with massive action spaces.
Beyond games, the Alpha methodology has been applied to discover faster matrix multiplication
algorithms (AlphaTensor [21]), faster sorting routines (AlphaDev [22]), competitive-level code
generation (AlphaCode [20]), and mathematical reasoning (AlphaGeometry [23]). Czarnecki et al.'s
"spinning tops" analysis [18] offers a general framework for understanding which strategies
dominate in different types of games.
MicroRTS Resources
Competition-Winning Bots
References
Books
- Sutton, R. S. and Barto, A. G. Reinforcement Learning: An Introduction. MIT Press, 2020. [Online] [PDF]
- Yannakakis, G. N. and Togelius, J. Artificial Intelligence and Games, 2nd ed. Springer, 2025. [Website]
MicroRTS
- Ontañón, S. "The combinatorial multi-armed bandit problem and its application to real-time strategy games." In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), vol. 9, no. 1, pp. 58–64, 2013. [Paper]
- Ontañón, S. "Informed Monte Carlo tree search for real-time strategy games." In Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), 2016. [IEEE]
- Ontañón, S. "Experiments on learning unit-action models from replay data from RTS games." In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), vol. 12, no. 2, pp. 9–14, 2016.
- Ontañón, S., Barriga, N. A., Silva, C. R., Moraes, R. O., and Lelis, L. H. S. "The first MicroRTS artificial intelligence competition." AI Magazine, vol. 39, no. 1, pp. 75–83, 2018.
- Moraes, R. O., Nascimento, M. A., and Lelis, L. H. S. "Asymmetric action abstractions for planning in real-time strategy games." Journal of Artificial Intelligence Research, vol. 75, pp. 1103–1137, 2022. [Google Scholar]
- Medeiros, L. C., Aleixo, D. S., and Lelis, L. H. S. "What can we learn even from the weakest? Learning sketches for programmatic strategies." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, pp. 7761–7769, 2022. [Code]
- Moraes, R. O., Aleixo, D. S., and Lelis, L. H. S. "Choosing well your opponents: How to guide the synthesis of programmatic strategies." arXiv preprint arXiv:2307.04893 (IJCAI 2023). [arXiv] [Code]
- Huang, S., Ontañón, S., Bamford, C., and Grela, L. "Gym-μRTS: Toward affordable full game real-time strategy games research with deep reinforcement learning." In 2021 IEEE Conference on Games (CoG), pp. 1–8, IEEE, 2021. [PDF] [Code]
- Goodfriend, S. "A competition winning deep reinforcement learning agent in microRTS." In 2024 IEEE Conference on Games (CoG), pp. 1–8, IEEE, 2024. [IEEE] [arXiv] [ICLR 2024 Review]
- Tesauro, G. "Temporal difference learning and TD-Gammon." Communications of the ACM, vol. 38, no. 3, 1995. [ACM]
DeepMind Alpha Series & Deep RL
- Mnih, V., Kavukcuoglu, K., Silver, D., et al. "Human-level control through deep reinforcement learning." Nature, vol. 518, no. 7540, pp. 529–533, 2015. [Nature]
- Silver, D., Huang, A., Maddison, C. J., et al. "Mastering the game of Go with deep neural networks and tree search." Nature, vol. 529, no. 7587, pp. 484–489, 2016. [Nature]
- Silver, D., Schrittwieser, J., Simonyan, K., et al. "Mastering the game of Go without human knowledge." Nature, vol. 550, no. 7676, pp. 354–359, 2017. [Nature]
- Silver, D., Hubert, T., Schrittwieser, J., et al. "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play." Science, vol. 362, no. 6419, pp. 1140–1144, 2018. [Science]
- Schrittwieser, J., Antonoglou, I., Hubert, T., et al. "Mastering Atari, Go, chess and shogi by planning with a learned model." Nature, vol. 588, no. 7839, pp. 604–609, 2020. [Nature]
- Czarnecki, W. M., Gidel, G., Tracey, B., Tuyls, K., Omidshafiei, S., Balduzzi, D., and Jaderberg, M. "Real world games look like spinning tops." Advances in Neural Information Processing Systems, vol. 33, pp. 17443–17454, 2020. [PDF]
- Vinyals, O., Babuschkin, I., Czarnecki, W. M., et al. "Grandmaster level in StarCraft II using multi-agent reinforcement learning." Nature, vol. 575, no. 7782, pp. 350–354, 2019. [Nature]
- Li, Y., Choi, D., Chung, J., et al. "Competition-level code generation with AlphaCode." Science, vol. 378, no. 6624, pp. 1092–1097, 2022. [Science]
- Fawzi, A., Balog, M., Huang, A., et al. "Discovering faster matrix multiplication algorithms with reinforcement learning." Nature, vol. 610, pp. 47–53, 2022. [Nature]
- Mankowitz, D. J., Michi, A., Zhernov, A., et al. "Faster sorting algorithms discovered using deep reinforcement learning." Nature, vol. 618, no. 7964, pp. 257–263, 2023. [Nature]
- Trinh, T. H., Wu, Y., Le, Q. V., et al. "Solving olympiad geometry without human demonstrations." Nature, vol. 625, pp. 476–482, 2024. [Nature]
Other RTS Agents & General
- Berner, C., Brockman, G., Chan, B., et al. "Dota 2 with large scale deep reinforcement learning." arXiv preprint arXiv:1912.06680, 2019.
- Ye, D., Chen, G., Zhang, W., et al. "Towards playing full MOBA games with deep reinforcement learning." Advances in Neural Information Processing Systems, vol. 33, pp. 621–632, 2020.
- Silver, D., Singh, S., Precup, D., and Sutton, R. S. "Reward is enough." Artificial Intelligence, vol. 299, 103535, 2021. [Elsevier]
← Back to RTS Arena