Xiaomi open-sources MiMo Code V0.1.0, claiming 200+ step wins vs Claude Code
The terminal coding agent is built around cross-session memory, and Xiaomi says the architecture boosts long-horizon accuracy.

Xiaomi’s MiMo AI team open-sourced MiMo Code V0.1.0, a terminal-native coding harness they say beats Anthropic’s Claude Code on agentic coding benchmarks, especially beyond 200 execution steps. For decision-makers, the consequence is simple: agent performance is shifting from raw model quality to the harness and memory layer.
On June 10, 2026, Xiaomi’s MiMo AI team open-sourced MiMo Code V0.1.0, and they are making a very specific bet: for long-horizon coding tasks, a terminal agent’s memory and state management matter more than the model alone. Xiaomi says its approach drives a win rate above 65% after 200 execution steps in an internal human double-blind A/B evaluation of 576 developers across 474 private repositories.
That is the crux of the headline claim, and it is the one that should make anyone building or budgeting for agentic coding pay attention. Xiaomi also says MiMo Code pairs with MiMo-V2.5-Pro to outperform Claude Code paired with Claude Sonnet 4.6 on all three evaluations it tested: SWE-bench Verified (82% vs. 79%), SWE-bench Pro (62% vs. 55%), and Terminal Bench 2 (73% vs. 69%). And it adds something even more pointed for skeptical operators: the harness itself, using the same underlying MiMo model in both setups, accounts for roughly five percentage points on SWE-bench Pro and Terminal Bench 2, attributable purely to the agent system rather than the model.
So what exactly is Xiaomi shipping? MiMo Code is a terminal-native AI coding assistant released under an MIT license and available on GitHub now. The install path is simple: curl -fsSL https://mimo.xiaomi.com/install | bash on macOS and Linux, or npm install -g @mimo-ai/cli on Windows. Xiaomi positions it as “more than an AI coding assistant in your terminal - it's the smartest coding partner you'll ever work with,” via a post on X from the official @XiaomiMiMo account.
Under the hood, MiMo Code is a fork of the open-source OpenCode agent, extended with Xiaomi’s own memory architecture, workflow modes, and model harness. The industry problem it targets is familiar to anyone who has watched an agent get progressively dumber the longer the session runs. As context fills, earlier decisions and task state can get compacted away or lost, forcing developers to restate the same situation repeatedly. Xiaomi’s argument is not that “better compression” fixes this. Instead, it says you need an explicit storage-and-retrieval mechanism that decides what information should be persisted and when it should be recalled.
Xiaomi’s answer is a cross-session memory system spanning four layers: a persistent MEMORY.md file for project memory, session checkpoints, scratch notes, and per-task progress logs. The note-taking design is also a key difference. Xiaomi describes deploying an independent “checkpoint-writer” subagent so the primary coding agent does not have to pause to write notes while it is working. The checkpoint-writer updates the “blueprints” in real time as the main agent progresses, and when the context window approaches its limits, the system can rebuild the relevant environment from structured checkpoints with the right context. The point is to avoid the agent “getting lost in the half-built mansion” by letting it consult structured state instead of hoping everything still fits in the prompt.
Two additional self-improvement mechanisms round out the harness. Xiaomi includes a /dream command that periodically, roughly every seven days, reviews historical sessions, deduplicates them, and compresses them into long-term memory. It also includes a distill function that mines past sessions for repeated workflows that can be automated, taking inspiration from a similar pattern recently pursued by OpenAI and Anthropic with their models.
On integration and day-to-day usefulness, MiMo Code aims to plug into existing developer routines rather than force teams into a new IDE-shaped workflow. It operates directly in the terminal, reading and writing files, running commands, and managing Git. Xiaomi also says it requires no configuration to get started, connecting automatically to “MiMo Auto,” a free-for-a-limited-time channel powered by MiMo V2.5 with a million-token context window, with no registration required. For teams migrating from Claude Code, Xiaomi claims MiMo Code can automatically import MCP servers, custom skills, and API configurations.
There are also user-facing workflow upgrades. Compose mode lets developers press Tab to switch the agent into a specification-driven flow where the developer describes a high-level goal and the system autonomously executes the full development cycle, including design, planning, coding, testing, and review, using a “heavy planning upfront, stable verification later” strategy described by Xiaomi. For hands-free operation, Xiaomi adds voice control built on MiMo-ASR with TenVAD voice activity detection, including spoken commands like “send” and “execute,” available for logged-in users.
Finally, there are performance and market-context wrinkles worth noting. Xiaomi did not publish comparisons against OpenAI’s Codex or Google’s Gemini CLI, and Claude Code is the only named competitor in its materials. Independent reference points complicate the narrative in a useful way: on the official Terminal-Bench 2.0 leaderboard at tbench.ai, OpenAI’s Codex CLI running GPT-5.5 scores 82.2%, around nine points above MiMo Code’s reported 73%. On SWE-bench Pro, however, the picture flips: OpenAI reports GPT-5.5 at 58.6%, below MiMo Code’s claimed 62%. Xiaomi also concedes that standard benchmarks still measure “one-shot problem-solving ability” and do not capture the tool’s multi-session design goals.
Still, the direction of travel is hard to ignore. Xiaomi’s internal double-blind A/B evaluation suggests the memory and state management architecture pays off specifically on long-horizon work, where after 200 steps MiMo Code’s win rate rises above 65%. For executives and boards, the second-order implication is that agentic coding is becoming a harness-and-memory product category, not just a model race. If teams buy tools without a robust state layer, they may get impressive demos that quietly degrade under real delivery timelines.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

Anthropic pledges $150M for 1,000 nonprofit AI fellows, paying $85,000 without a degree
Claude Corps is funding year-long placements across the U.S., with apps open Wednesday through July 17.

Comedians prank NYC subway with fake AI ads, then accidentally name a real company
A viral parody campaign cost about $200, hit 3M+ views, and exposed how easily AI branding can collide with reality.

Amazon refreshes Echo Hub home screen, adds Ring AI video search and event summaries
A free update modernizes Echo Hub’s 2024 interface and brings Ring AI search and Alexa Plus summaries to your cameras.
