Claude's Cycles

How Generative AI Solved a Combinatorial Open Problem — Knuth's Shock! Shock! paper (2026)

Introduction

Donald Knuth posed an open problem: can the arcs of a certain digraphA directed graph — a set of vertices connected by directed edges (arcs), where each arc points from one vertex to another. with vertices be decomposed into exactly three directed Hamiltonian cyclesA Hamiltonian cycle is a closed path that visits every vertex in a graph exactly once before returning to the start.? In February 2026, Claude Opus 4.6 (Anthropic's AI model) found the answer through 31 explorations — producing an elegant set of rules that work for all odd m > 1.

This page lets you interactively explore the digraph, understand the fiber structure, step through each Hamiltonian cycle, and verify that the decompositionA decomposition of a digraph's arcs partitions every arc into disjoint subsets. Here, each subset forms a directed Hamiltonian cycle, so the three cycles together use every arc exactly once. is correct.

Self-Guided Tour

Follow these 8 steps to explore every interactive feature on this page. Each step links to a section below — click to jump there.

Tip: Use the m = 3 | 5 | 7 | 9 | 11 tab bar (below the 3D illustration) to change the cube dimension at any time. All sections update instantly — try starting with m = 3, then increase to see how the construction scales.

  1. Visualizing the Digraph — See the 3D graph for m = 3. Drag to rotate, click a node to highlight its arcs.
  2. The Digraph — Read the adjacency grid. Click any vertex to see its three outgoing arcs.
  3. Fiber Structure — Understand how vertices group into fibers. Toggle fiber coloring in the 3D view.
  4. Bump Rules — Study the rule table that assigns each arc to a cycle. Try the rule tester with your own vertex.
  5. Cycle Explorer — Watch a cycle traverse all vertices. Step or Autoplay through the path.
  6. Verification — Confirm every arc is used exactly once. Click Verify to run the check.
  7. The Story — Read about Claude’s 31 explorations that solved the problem.
  8. RL Exercises — Advanced: explore reinforcement learning connections with the REINFORCE training script.

Visualizing the Digraph

Each vertex (i, j, k) has exactly 3 outgoing arcs — bump i, bump j, bump k — where "bump" means increment that coordinate mod m. Below are the two smallest cases drawn as actual graphs.

m = 1 (trivial case): 1 vertex, 3 self-loops.
Every bump wraps back to (0,0,0) — each
"cycle" is just a single self-loop.
m = 3: 27 vertices, 81 arcs — 3D view.
Drag to rotate. Click a vertex to see its 3 arcs.
m =

The DigraphA directed graph — a set of vertices connected by directed edges (arcs), where each arc points from one vertex to another.

The digraph has vertices, each labeled (i, j, k) where i, j, k ∈ {0, …, m−1}. Each vertex has exactly 3 outgoing arcs, one for each coordinate "bump"A "bump" increments one coordinate modulo m. Bumping i means i ← (i+1) mod m, leaving j and k unchanged. Similarly for bumping j or k.: bump i, bump j, bump k. This gives 3m³ arcs total, which must be partitioned into 3 Hamiltonian cycles of length m³.

Click any vertex to highlight its 3 outgoing arcs: ● bump i, ● bump j, ● bump k

Fiber Structure

Define s = (i + j + k) mod m. This partitions vertices into m fibers of m² each. All three arcs from fiber s go to fiber (s+1) mod m. Why?Because bumping any single coordinate increases the sum i+j+k by 1. So (i+j+k) mod m goes from s to (s+1) mod m, regardless of which coordinate is bumped. This makes the digraph "layered" — a structure that greatly aids finding Hamiltonian cycles. This layered structure makes the decomposition possible.

Claude's Solution — The Bump Rules

Claude discovered that the direction string d — which tells each cycle which coordinate to bump — depends only on the fiber index s and the vertex coordinates. The string d is a permutation of "012": d[0] tells cycle 0 what to bump, d[1] tells cycle 1, d[2] tells cycle 2.

Fiber sConditionDirection dCycle 0 bumpsCycle 1 bumpsCycle 2 bumps
s = 0j = m−1"012"ijk
s = 0j ≠ m−1"210"kji
0 < s < m−1i = m−1"201"kij
0 < s < m−1i ≠ m−1"102"jik
s = m−1i > 0"120"jki
s = m−1i = 0"210"kji

Rule Tester

Enter a vertex and click Test to see the rule applied.

Knuth's C Program

/* Knuth's verification program for Claude's construction */ for (c = 0; c < 3; c++) { i = j = k = 0; for (t = 0; ; t++) { visit(i, j, k); /* record this vertex */ if (t == m * m * m) break; s = (i + j + k) % m; if (s == 0) d = (j == m - 1) ? "012" : "210"; else if (s == m - 1) d = (i > 0) ? "120" : "210"; else d = (i == m - 1) ? "201" : "102"; switch (d[c]) { case '0': i = (i + 1) % m; break; case '1': j = (j + 1) % m; break; case '2': k = (k + 1) % m; break; } } }

Cycle Explorer

Step through each Hamiltonian cycle to see how the bump rules trace a path visiting every vertex exactly once.

Step 0 of --
Current Vertex
(0,0,0)
Fiber s
0
Rule Applied
Direction d
Bumps
Next Vertex
Drag to rotate. Scroll to zoom.
(path will appear here)

Verification

Generate all 3 cycles and verify: (1) each visits exactly m³ vertices (Hamiltonian), (2) all 3m³ arcs are used exactly once (perfect decomposition).

The Story

In February 2026, Donald Knuth challenged Claude (Anthropic's AI model) with an open problem from combinatorial algorithms. The question: for every odd m > 1, can the arcs of a specific digraph on vertices be decomposed into exactly three directed Hamiltonian cycles?

Over the course of 31 explorations, Claude developed the solution through a remarkable sequence of insights:

Explorations 1–8: Claude discovered the serpentine (snake-like) traversal pattern and the importance of the fiber structure s = (i+j+k) mod m.

Explorations 9–15: Claude identified that fiber-based rules could potentially govern all three cycles simultaneously, leading to the idea of direction strings.

Explorations 16–24: Claude experimented with simulated annealing and computational search, finding solutions for small m values and extracting patterns.

Explorations 25–31: Claude crystallized the final construction — a simple set of rules based on fiber index, yielding three Hamiltonian cycles that provably decompose all arcs for every odd m > 1.

The result is striking in its simplicity: just three conditional rules determine the entire decomposition. Knuth verified the construction by computer for all odd m up to 99 and wrote a paper documenting the discovery.

References
[1] D. E. Knuth, “Claude's Cycles,” preprint, February 2026. PDF
[2] D. E. Knuth, The Art of Computer Programming, Volume 4A: Combinatorial Algorithms, Part 1, Addison-Wesley, 2011.
[3] D. E. Knuth, “Hamiltonian Paths and Cycles,” TAOCP Pre-Fascicle 8a, 2024. PDF
[4] J. Aubert & B. Schneider, “Décomposition de la somme cartésienne d'un cycle et de l'union de deux cycles hamiltoniens,” Discrete Math., 38:7–16, 1982. (The m=2 impossibility result.)
RL Exercises: Can Reinforcement Learning Find Claude Cycles?

Why RL on a solved problem? Claude's Cycles has a clean algebraic solution — RL is overkill here. But that is precisely what makes it a valuable teaching example. When we train a PPO agent on tiny m=3 (27 vertices) and transfer to m=19 (6,859 vertices), the agent achieves 100% success from episode 1. This perfect transfer is itself a signal: it tells us the local action structure is scale-invariant, pointing straight back to the algebraic rules. If this problem were unsolved, such transfer results would be a powerful hint to a mathematician that a compositional, scale-independent solution exists. For students, this connection — RL success revealing algebraic structure — is more instructive than the RL training itself. See full experiment results.

MDPA mathematical framework for sequential decision-making where outcomes depend only on the current state and chosen action, not on prior history. (Markov Decision Process) Formulation

State: The current vertex (i, j, k), the set of already-visited vertices, and the step count. Action: Choose one of 3 bumps (increment i, j, or k mod m). Reward: +1 for each new vertex visited, +100 for completing a full Hamiltonian cycle (returning to start after visiting all m³ vertices), −10 for revisiting a vertex (episode ends). The agent must learn to visit all m³ vertices exactly once and return home.

Recommended Approaches

Baseline — REINFORCEA Monte Carlo policy-gradient algorithm that updates the policy by scaling the gradient by the episode return.: A simple policy-gradient method with an MLPMulti-Layer Perceptron — a feedforward neural network with one or more hidden layers. that takes the current coordinates and step count as input. Good for m=3 (27 vertices). Scaling up — GNNGraph Neural Network — a neural network that operates on graph-structured data, passing messages between neighboring nodes. + PPOProximal Policy Optimization — a policy-gradient algorithm that clips the objective to prevent destructively large updates.: For larger m, represent the digraph as a graph and use a Graph Neural Network policy trained with Proximal Policy Optimization. The GNN can exploit the regular structure of the m-ary cube. The key challenge is that the state space grows as 2 (visited-set tracking), so curriculum learningA training strategy that starts with easier tasks and gradually increases difficulty. (train on m=3, then m=5, …) is recommended.

Exercises

  1. Run the provided REINFORCE script for m=3. How many episodes does the agent need to find its first Hamiltonian cycle? How does this compare to random search?
  2. Modify the reward function: try giving a bonus proportional to the fiber index s = (i+j+k) mod m of each visited vertex. Does encoding domain knowledge in the reward speed up learning?
  3. After the agent finds one Hamiltonian cycle, can you train it to find a second cycle that uses completely different arcs? (Hint: mask out arcs already used by the first cycle.)
  4. For m=5 (125 vertices), REINFORCE alone will struggle. Implement a GNN-based policy or use Monte Carlo Tree Search (MCTS)A search algorithm that uses random simulations to evaluate moves in a decision tree. to handle the larger state space. Compare sample efficiency.

Training Script

A self-contained Python training script is provided: rl_hamiltonian.py. It implements the REINFORCE algorithm with a 2-layer MLP policy network (PyTorch) and includes a --no-torch mode for pure NumPy. Run python rl_hamiltonian.py --help for options.