Brian Armstrong says Coinbase will cut AI spend using cheaper Chinese defaults, not usage caps
Five concrete tactics aim to keep engineers experimenting with tokens while making scaling AI “sustainable.”

Coinbase CEO Brian Armstrong outlined five strategies to keep AI spending low while encouraging engineers to keep using tokens. The approach trades blanket token limits for infrastructure choices, model routing, and tighter cost visibility.
Coinbase CEO Brian Armstrong is trying to solve a problem that keeps showing up across the tech stack: AI costs are rising, but telling engineers to stop is a dead end. In a Friday X post, Armstrong laid out five strategies for keeping AI spending low without limiting tokens, arguing the goal is not to suppress usage, but to build the infrastructure that makes exponential growth sustainable.
Armstrong’s first lever is also the most blunt. He said Coinbase is experimenting with defaulting to open weight Chinese large language models through its LLM gateway, specifically GLM 5.2 and Kimi 2.7, which he described as significantly cheaper than models from frontier American AI labs like Anthropic and OpenAI. The pitch is simple: let engineers keep speed and experimentation, but shift the default baseline so the average request costs less, even before you touch anything else.
This matters for decision-makers because “AI spend control” has turned into two competing philosophies. One is the old-school response: impose usage caps or enforce stricter controls to curb token consumption. The other is an engineering-led response: keep usage high where it drives value, but reduce cost per output through better systems. Armstrong explicitly rejected the first philosophy. Instead, he positioned Coinbase’s approach as cost engineering, not behavior suppression.
The second strategy extends that cost-per-request idea. Armstrong said Coinbase will route prompts to the most appropriate models based on their difficulty levels. His example was the classic one: you might use a frontier model for planning, but not for execution, where a smaller model can be sufficient. Importantly, he added that humans should not be choosing models. “AI can automate this task,” he wrote, which is basically a claim that the optimization can happen inside the workflow, not via extra managerial approvals.
His third and fourth strategies are about squeezing waste out of inference. The third is better caching, which reduces inference costs. If a system keeps recomputing the same intermediate work, caching prevents that. The fourth is keeping context lean, meaning starting new sessions when switching between tasks. In plain English, you reduce the amount of text the model has to “carry around” for each new job, which usually lowers both latency and token burn.
Then there is the fifth strategy, which gets overlooked because it is less glamorous than model selection but more powerful than it sounds: visibility. Armstrong said Coinbase will improve visibility into AI spending across the company. The policy, per his post, is that engineers can use as many tokens as they want, but they can see their usage. The graph he attached tracks token usage and AI spend over time, though he did not specify the exact timeline. The key point, as described in the post: token usage has reached one of the highest levels in Coinbase history, while AI spending has fallen significantly, to nearly half its peak level.
That combination, high token use with lower spend, is the core promise of the approach. Armstrong framed it as a goal rather than a compromise: “The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.” There is also a timing story here. His post comes less than two months after Coinbase laid off 14% of its staff, partly due to AI changing how people work. Armstrong also previously said that engineers using AI can ship in days what used to take teams weeks. Taken together, the message reads as: AI is reshaping productivity, budgets and headcount, so the company has to make AI cheaper per unit of value without blunting the gains.
For broader industry context, Armstrong’s strategy lines up with a shift the source notes: the market moved on from the short-lived tokenmaxxing trend, where people pushed tokens hard, in favor of usage caps meant to curb rampant consumption. Coinbase appears to be aiming for a middle path that keeps engineers unblocked while still controlling costs via system design. For executives sitting on boards or running engineering finance, this is the kind of tradeoff that decides whether AI becomes a competitive advantage or a budget leak.
Second-order implications are straightforward but serious. If you can sustain higher AI usage without a proportional rise in spend, you can justify more experimentation, faster deployment cycles, and more automation, rather than constantly tightening limits. But it also means the finance story depends on operational discipline: caching, context management, model routing, and cost dashboards have to work consistently, otherwise “no caps” quickly turns into “no brakes.” The strategic stakes for peers like CFOs, CTOs, and AI platform leaders are simple: Armstrong is betting that infrastructure and observability can beat blunt quotas. If it works, it unlocks exponential usage safely; if it fails, you end up back at caps anyway, only after you let costs run hot.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Business

Bungie cuts most Destiny 2 staff as Sony says Marathon still matters
Herman Hulst confirms layoffs affecting most Destiny and some Marathon teams after Bungie admits Destiny fell short.

SK Hynix jumps 11% after seeking up to $29.4B in Nasdaq listing
The chip giant filed for a Nasdaq listing plan that could raise $29.4 billion, instantly reshaping investor expectations.

Micron revenue hits nearly $42B as AI memory lifts gross margins above 81%
Fiscal Q3 results crush estimates, prove AI memory is rewriting Micron's margins, and change the momentum math for the whole chip stack.

