We ran the standard VideoGameBench Doom II agent and changed only where the model was served. With the same Gemma‑4‑26B‑A4B weights, Overshoot averaged 1.7 s per decision while OpenRouter averaged 11.5 s. The same setup also compares Overshoot-hosted models against GPT, Gemini, and Claude APIs.
Two 2:15 clips, recorded straight from the harness browser. The HUD is live: per-turn latency, rolling average, turn counter. Nothing is sped up. The game simply does not wait.
Every arm plays the same game with the same agent. Overshoot-hosted open models anchor the fast corner. The same weights elsewhere and the frontier APIs pay seconds per decision.
The cleanest experiment in the suite: identical open models, identical agent, identical game. The only change is who serves the tokens.
Gemma 4 26B-A4B answers in 1.7 s on Overshoot and 11.5 s via OpenRouter. Same brain, seven times the reflexes.
Holo3 takes a full ReAct decision every second. In a game that never pauses, cadence is survival.
Across the full 3v3: three hosted open models vs GPT-5.4 mini, Gemini 3 Flash and Claude Sonnet 4.6.
Identical VideoGameBench ReAct agent, identical Doom II campaign, identical action parser; only the model serving changes. Aggregated over three benchmark runs.
| # | Model | Serving | Latency avg | p95 | Turns/min | Accuracy | $ / turn | vs fastest |
|---|
Sub-second decisions turn a spectator into a player. Stream once, reference any frame, and let the same open weights answer seven times sooner.