Brian Armstrong says Coinbase aims to keep token costs roughly flat via model routing

Coinbase CEO Brian Armstrong spells out how routing prompts to cheaper models may cap costs while token usage grows fast.

ByYousef Al-ZahraniTechnology Correspondent, The Executives Brief

about 2 months ago·4 min read

Brian Armstrong says Coinbase aims to keep token costs roughly flat via model routing

Executive summary

Coinbase CEO Brian Armstrong wrote on X that Coinbase is working on routing prompts to cheaper models, in some cases keeping costs roughly flat while token usage grows exponentially. For decision-makers, the implication is clear: budgeting for AI spend may increasingly depend on infrastructure constraints and allocation strategy, not just chasing the newest flagship models.

Coinbase CEO Brian Armstrong just dropped a cost-control blueprint on X, and it is basically the opposite of token flexing. He said Coinbase is working on “routing prompts to cheaper models where appropriate,” and in some cases has been able to keep costs “roughly flat” while token usage continues to grow exponentially.

The punchline lands fast because Armstrong also framed what will limit AI progress for real: “the limiting factor will be energy and compute, not better models.” In other words, the next wave of efficiency might not come from a miracle model upgrade. It might come from deciding, prompt by prompt, which model earns the right to burn compute.

This matters for anyone budgeting AI. When people talk about tokens, they often default to a simplistic narrative: bigger models generate better outputs, so more tokens and more usage are a badge of honor. Armstrong is challenging that assumption directly. Even if the latest systems promise cutting-edge performance, they can also devour more tokens. The source example is telling: “before you turn on Fast mode,” the most advanced models like Opus 4.8 or GPT-5.5 are positioned as potentially expensive at the token level.

Armstrong also provided a timeline and a boundary condition. He anticipated that “80% of workloads will be running on 99% cheaper models within 12-18 months.” That is a huge operational claim, and it points to a practical pattern: high volume does not need “IQ maxing.” He described the only times users should use the latest models as when they need to be “IQ maxing,” including “scientific breakthroughs or agent orchestration.” This is not just a pricing strategy. It is an architecture strategy. It implies a system that can classify tasks, route them, and still deliver acceptable outcomes.

And routing is not a weird fringe concept anymore, at least not according to the reaction in the source. Venture capitalist Marc Andreessen called Armstrong’s comment “interesting.” Hugging Face cofounder Julien Chaumond wrote that “model routing is growing a lot these days.” Box CEO Aaron Levie said Armstrong’s numbers were “a bit extreme,” but still predicted AI use would stratify: “high end” work would be done by leading models, while “high volume” work would be relegated to cheap models. Harvey cofounder Winston Weinberg added that “Intelligence allocation is going to be extremely important.” Even if executives disagree on the exact percentage, the direction is hard to miss: capacity and cost discipline will move from the back office to the product design conversation.

Armstrong’s post also stirred debate because the market has recently trained itself to equate more tokens with more progress. The source notes that “tokenmaxxing” was once the dominant mindset, and tech leaders would post high token bills or flex usage of the latest models. In the startup world, the advice wasn’t always subtle either. The source references Y Combinator CEO Garry Tan advising founders to “let it rip” with tokens. Another cited data point: Lance Yan, a YC-backed startup founder, told Business Insider in April that rationing tokens was “stupid.”

So what changed? The article frames it as a tide turning toward efficiency as the economics get real. Some of that is technical reality. For example, it references complaints when Anthropic launched Opus 4.7, with many users saying they were quickly hitting rate limits. Constraints like rate limits, compute, and energy are the places where “better models” can become a luxury. If you run at scale, you stop asking “Can this model do it?” and start asking “Can we afford to do it this way, every time?” Armstrong’s energy and compute point reinforces that, tying cost behavior to the physical limits of compute rather than to marketing names.

There is also a second-order market effect hiding in plain sight: the AI labs now have to compete not only on raw model quality, but on their placement in a routing stack. If 80% of workloads move to 99% cheaper models within 12-18 months, the revenue model for frontier systems can change from broad default usage to narrower, specialist usage. That is consistent with Levie’s “high end” versus “high volume” stratification view. It also changes how boards should ask management questions. Instead of “Which model are we using?” boards may need to ask “How are we allocating intelligence across models, and how does that allocation keep costs stable as usage grows?”

Finally, the broader stakes are regulatory and governance-adjacent, even though the source does not mention regulators directly. Public companies, payment-adjacent platforms, and AI-heavy products typically face scrutiny around operational risk and cost control, especially when costs can scale with usage. Armstrong’s central thesis, “roughly flat” costs while token usage grows exponentially, is an operational stability promise. If it holds, it can reshape the confidence investors and operators place in AI unit economics. If it does not, it becomes a warning sign that token growth can outrun budget discipline.

For executives watching the space, Armstrong’s post is a reset button on how to think about AI spend: treat token consumption like a capacity management problem. Measure which workloads truly need the newest models. Route the rest. And accept that, as Armstrong argued, the limiting factors may be energy and compute, not better models. That is a strategy shift with real implications across product teams, infrastructure planners, and capital allocators trying to model what AI costs will do next.

Executive ActionsLocked