Creating a reinforcment learning agent for OpenFrontIO
High level functionality of the Agent
The agent has to be able to see WHAT it is interacting with, WHO and WHERE. As a player it is important for me to see:
- my own nation statistics (cities, factories, ports, defense posts, missile silo, sam)
- buyables such as warship, atom bomb, hydrogen bomb, mirv.
- troop count
- max troops
- troop gain
- gold amount
- owned land in percentage of the total land mass
for combat
- adjustable attack ratio of the current troop count
- outgoing attacks
- incoming attacks from others
- i want to be able to attack both neutral land and a neighbor
- i want to be able to send transports ships to a specific tile
spatially
- i want to know the distance to all nations via manhattan distance. This distance would be between the closest border of my nation to the closest border of all other nations
- i want to be able to see neutral land that might be after a crossing/river and have the ability to send troops there.
diplomacy
- i want the capability to send alliance requests
- cancel alliances
- see betrayal status for me and others as it gives a temporary debuff
=== Now I have a complete picture. Let me provide a refined analysis based on the actual game mechanics.
Refined Analysis Based on OpenFront Docs
The Game's Core Decision Loop
Based on the documentation, a skilled player needs to reason about:
-
GROWTH OPTIMIZATION
- Am I at ~42% of max troops? (optimal growth rate)
- Should I build cities to increase max pop?
- How much can I commit to attacks without crippling growth?
-
TARGET SELECTION
- Who is my NEIGHBOR? (border distance = 0)
- Who is WEAK? (low troops/tile = low shield)
- Who is THREATENING? (silos + attacking me)
- Who can I ANNEX? (surrounded territory)
-
COMBAT COMMITMENT
- Attack ratio: 50% early game, 20% late game
- Committed troops don't defend OR grow
-
DIPLOMACY
- Alliance = 5 min protection + trade benefits
- Betrayal = 30s GLOBAL debuff (everyone attacks you easier)
- Can exploit enemy betrayals
-
ECONOMY
- Gold from: passive (1000/s), conquest, trade, factories
- Trade = f(port distance), benefits both parties
- Build order: ports → cities → defense → silos
Critical Missing Observations
| Feature | Why Critical | Status |
|---|---|---|
| border_distance | "Attack nearby weak enemies" is fundamental strategy | ❌ Missing |
| max_troops | Growth optimization at 42% requires knowing max | ❌ Missing |
| troop_gain_rate | Know if attacking will cripple growth | ❌ Missing |
| alliance_state | Per-enemy: allied/neutral/enemy | ❌ Missing |
| betrayal_timer | 30s window to exploit or avoid | ❌ Missing |
| entity_type | Bot (weak) vs Nation (builds) vs Human | ❌ Missing |
| troops_committed | Attack ratio currently in use | ❌ Missing |
| shield_value | troops/tiles = defense density | ❌ Missing |
| can_afford_X | Know what's purchasable | ❌ Missing |
| ports/factories | Enemy economy indicators | ❌ Missing |
Why CNN Is Wrong For This Game
From the docs, the key strategic questions are:
| Question | Requires | CNN Provides |
|---|---|---|
| "Who should I attack?" | Entity relationships, distance | Pixel patterns |
| "Is this enemy weak?" | troops/tiles ratio | Nothing |
| "Can I annex them?" | Surrounded + no water escape | Edge detection? |
| "Should I ally?" | Threat assessment, shared borders | Nothing |
| "Am I growing optimally?" | troops/max_troops ratio | Nothing |
The spatial grid (64×64×3) shows:
- Terrain (water/shore/land)
- Ownership (mine/enemy/neutral) - but not WHICH enemy
- Fallout
It does NOT show:
- Which tiles belong to which specific enemy
- Distance between my border and enemy borders
- Annexation opportunities
- Attack/transport paths
State-of-the-Art Architecture for OpenFront
Based on the game mechanics, here's what the model actually needs:
┌────────────────────────────────────────────────────────────────┐ │ ENTITY-RELATIONSHIP ARCHITECTURE │ ├────────────────────────────────────────────────────────────────┤ │ │ │ PER-ENTITY FEATURES (for each player i, including self): │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ Core Stats: │ │ │ │ tiles, troops, gold, max_troops, troop_gain_rate │ │ │ │ │ │ │ │ Units (strategic indicators): │ │ │ │ cities, ports, factories, silos, sams, defense_posts │ │ │ │ warships, transports_in_flight │ │ │ │ │ │ │ │ Entity Type: │ │ │ │ is_bot, is_nation, is_human (one-hot) │ │ │ │ │ │ │ │ RELATIONSHIP TO ME (this is the key!): │ │ │ │ border_distance (manhattan, 0 = neighbor) │ │ │ │ is_neighbor (shares border) │ │ │ │ alliance_state (allied/neutral/enemy) │ │ │ │ betrayal_active (they have traitor debuff) │ │ │ │ my_betrayal_timer (if I betrayed them) │ │ │ │ my_attacks_to_them (troop count committed) │ │ │ │ their_attacks_to_me (troop count incoming) │ │ │ │ incoming_nuke_type (0/atom/hydrogen/mirv) │ │ │ │ │ │ │ │ Derived Strategic: │ │ │ │ shield = troops / tiles (defense density) │ │ │ │ threat = silos / (distance + 1) │ │ │ │ vulnerability = 1 / shield │ │ │ └───────────────────────────────────────────────────────────┘ │ │ ↓ │ │ Linear → 64-dim Entity Embedding │ │ ↓ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ MULTI-HEAD SELF-ATTENTION (2-4 heads) │ │ │ │ │ │ │ │ Each entity attends to all others │ │ │ │ Learns: "who threatens whom", "alliance clusters" │ │ │ │ Output: Contextualized entity embeddings │ │ │ └───────────────────────────────────────────────────────────┘ │ │ ↓ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ SELF FEATURES (special - my own state): │ │ │ │ │ │ │ │ growth_efficiency = troops / max_troops (optimal: 0.42) │ │ │ │ attack_ratio_committed (% of troops in attacks) │ │ │ │ can_afford: [warship, atom, hydrogen, mirv, city, │ │ │ │ port, factory, defense_post, silo, sam] │ │ │ │ win_progress = my_tiles / total_land (win at 0.8) │ │ │ └───────────────────────────────────────────────────────────┘ │ │ ↓ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ GLOBAL FEATURES: │ │ │ │ game_tick, game_phase, fallout_ratio │ │ │ │ num_alive_players, num_alive_bots, num_alive_nations │ │ │ └───────────────────────────────────────────────────────────┘ │ │ ↓ │ │ Combined: [self_embed, global, attention_pooled] │ │ ↓ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ ACTION HEADS: │ │ │ │ │ │ │ │ TYPE HEAD: MLP → 9 logits │ │ │ │ (NoOp, Attack, Boat, Build, AllyRequest, AllyAccept, │ │ │ │ LaunchNuke, BreakAlliance, CancelAttack) │ │ │ │ │ │ │ │ TARGET HEAD: Query-Key attention over entity embeddings │ │ │ │ → Score for each entity as target │ │ │ │ (Principled! Directly attends to WHO to interact with) │ │ │ │ │ │ │ │ INTENSITY HEAD: MLP → 4 logits (attack ratio) │ │ │ └───────────────────────────────────────────────────────────┘ │ │ │ └────────────────────────────────────────────────────────────────┘
What About Spatial Features?
For OpenFront, spatial features are secondary. The key decisions are entity-level. However, some spatial info is useful:
Option 1: Aggregate Spatial Features (Recommended) terrain_composition: - pct_plains, pct_highlands, pct_mountains (in my territory) - pct_contested_border (border tiles under attack)
expansion_opportunities: - num_annexable_territories (surrounded enemies) - nearest_neutral_land_distance - water_crossing_available (can boat somewhere)
Option 2: Remove Spatial Entirely
- Entity features + distances may be sufficient
- Simplest, fastest, smallest model
Option 3: Tiny Spatial (8×8)
- Very coarse grid just for general map awareness
- Process with small CNN or just flatten
Concrete Implementation Plan
Phase 1: Fix Observations (High Impact)
Add to the JS/Rust observation pipeline: // Per-enemy (add to existing): border_distance // Manhattan distance between closest borders is_neighbor // Boolean: shares border alliance_state // 0=enemy, 0.5=neutral, 1=allied betrayal_active // Boolean: has traitor debuff ports // Port count factories // Factory count
// Self (add): max_troops // Population cap troop_gain_rate // Troops per tick at current state troops_committed // Troops in active attacks can_afford_warship // Boolean can_afford_atom // Boolean can_afford_hydrogen // Boolean can_afford_mirv // Boolean can_afford_city // Boolean can_afford_silo // Boolean
Phase 2: Entity Architecture (Medium Effort)
Replace CNN with attention-based entity encoder:
- ~30 features per entity × 10-20 entities
- 2-layer transformer encoder
- Target head uses attention over entity embeddings
- Total params: ~200-400K (vs 1.5M current)
Phase 3: Simplify Spatial (Optional)
Either remove or reduce to aggregate features.
Summary
The CNN architecture fundamentally misunderstands the game. OpenFront is a diplomacy and resource management game where the key questions are:
- WHO to interact with (requires entity relationships, distances)
- WHAT to do (requires self-state, affordability)
- HOW MUCH to commit (requires growth optimization knowledge)
The spatial grid tells you terrain and ownership but not which enemy owns what or how far away they are. An entity-based architecture with explicit relationship features would be both more effective and more parameter-efficient.
Want me to start implementing Phase 1 (adding the missing observation features)?
===
Notes
This is my first blog post An interesting finding is that the performance varies depending on the amount of real actions that are taken in environment:
If the agent spawns in and only takes NOOPs the samples per second are around 27,000. Whereas if the agent spawns in an takes real actions related to building and combat and the like, SPS is about 300. Self-play on the other hand seems to perform better than a single agent-bot environment with SPS being around 1,700. This is due to the saved compute by not having to delegate bot AI computation inside the game engine. The takeaway here seems to be that for performance the fewer FFI calls to V8, the better, and the goal should be to shift as much work to Rust as possible or at the very least try to reduce computation on the game engine.