Bold Metrics’ Morgan Linton uses cheaper models strategically, not tokenmaxxing
The model-switching playbook is replacing “use AI nonstop” as costs, bills, and caps force smarter routing.

Morgan Linton, CTO of Bold Metrics, tells his 16 engineers twice a week which AI models to use and when, from Claude Fable on low to GPT-5.5 on high. For decision-makers, the shift matters because controlling model spend now beats pushing maximum usage.
Twice a week, Morgan Linton does something that would have sounded heretical during the tokenmaxxing craze: he assigns models like a coach, not a roulette wheel. Linton, the Lake Tahoe-based chief technology officer of AI startup Bold Metrics, is preparing his team’s standup 50 minutes beforehand, lining up which model goes where. One team gets Claude Fable on “low,” another gets GPT-5.5 on “high.” A third uses Cursor with Composer 2.5 and, Linton says, is getting “totally perfect results.”
The point is not just picking the “best” model. It is deciding when you actually need the expensive one, so you do not have to manage the team with hard token caps. Linton explains that being specific about model use means his team can “use the best stuff, but they're using it a lot more efficiently.” That philosophy is increasingly replacing tokenmaxxing, which the AI community described in the first half of 2026 as companies urging employees to use AI as much as possible.
Tokenmaxxing made sense on the surface. If employees use AI more, they ship faster. But then the invoices landed. After reviewing the AI bills their employees were racking up, companies from Uber to Microsoft have been taking a more considered approach. As budgets tighten and usage caps show up, one cost-saving hack is moving from “nice-to-have” to “operational necessity”: model switching. Instead of running every task on a single premium model, teams route difficult, intellectually challenging work to pricier frontier models and offload easier, repetitive tasks to older and cheaper ones.
There are also good reasons to use the newest models when you genuinely need them. OpenAI’s Kaylin Voss wrote on LinkedIn that better models “reduce retries, supervision, and wasted effort.” That is the clean logic: if a model gets it right the first time, you may spend more per call but waste less overall. Coinbase CEO Brian Armstrong laid out the business case even more bluntly in an X post on June 7, predicting “80% of workloads will be running on 99% cheaper models within 12-18 months,” while the remaining 20% runs on the latest models where “IQ maxxing is important.”
So if the incentives are clear, why did tokenmaxxing happen at all? Partly because the hype cycles made “more usage” feel like progress. Chris Maconi, Huntsville-based cofounder of AI startup Hechura, is on the other side of the argument. He was never a fan of tokenmaxxing, describing his company with a “human-in-the-loop” attitude and saying he is not setting up overnight bots to keep coding. For Maconi, model choice is part of that anti-tokenmaxxing outlook. He remembers the OpenClaw hype cycle, including how its broad autonomy and 24/7 use could burn through tokens fast. When he set up OpenClaw, Maconi started with cheaper Gemini models before switching to Anthropic’s Haiku, and he says he is “not afraid to go and try some of these lower-end models to see if they can provide the intelligence that we need.”
Other people arrived at the same conclusion through painful trial and error. Tanvi Pisal, a 29-year-old Big Tech user-experience designer, describes using tools such as Figma, ChatGPT, and Claude to brainstorm and write product requirement documents. Early on, she tried to use Claude to brainstorm UX from scratch, “wasted months of tokens” and still did not finish. Her workaround is practical and specific: design everything in Figma first, then put screenshots into Claude and instruct it to keep the UI as-is while building the full functionality and flow. She adds that brainstorming ideas with ChatGPT can be free for her thanks to her enterprise plan, then the refined ideas go to Claude for polished documents. Alejandra Thomas, a software engineer and tech content creator in New York City, runs tests on every new model release to see what each is good at, and she says she avoids the most expensive model for simple tasks.
Even when you accept the need for switching, doing it manually can feel exhausting. That is why model routing startups are surging. These companies sell software that designates tasks to specific models, sometimes including open-source, based on complexity. They have been venture hits, with startups like OpenRouter being “showered with cash.” David Gilmore runs Rayline, where the tool intercepts requests and decides if they can go to cheaper, often open-source, models. Gilmore says many clients get hit by the “FOMO moment,” then realize they need to scale back after seeing their API bills. The adoption curve looks like a quickening: Ramp’s lead economist, Ara Kharazian, told Business Insider that last year, around 1% of firms used a model router, and this year it is 5%. BlockSpaceForce, a San Francisco investment firm, uses OpenRouter, Fireworks, and Together AI. Spencer Yang, its managing partner, also advocates asking a cheaper model first whether a more expensive one is needed, arguing that “the models themselves are actually getting really good at assessing their own complexity.”
Second-order implications are where the board-level conversation starts. If routing and switching become standard, companies can reduce spend without gutting output, because the “cheap model first” mindset can cut retries and supervision, echoing Kaylin Voss’s point. It also changes how teams measure productivity: not just “how much AI did you use,” but “how smartly did you allocate it.” The strategic risk for executives is simple: defaulting to the most recent highest-costing models can become an expensive form of laziness. Hecura cofounder Maconi pegs it that way, saying people “don't want to do the hard work of understanding which models are good at which things” and “just want to ride the hype train.” In a world of tightening budgets, that hype train is no longer just inefficient. It is a margin problem.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

Fanfiction groups launch AI hunting drive, but detection flags any writer as collateral
A new “fanworks” push aims to expose generative-AI fanfic, yet its questionable detection can misfire on real authors.

Attested TLS lets attackers reroute “trusted” servers; it breaks real confidential AI links
Two years of formal verification found intra-handshake attestation fails, enabling relay attacks across production deployments.

An ETH Zurich team built a CAPTCHA solver that cracks reCAPTCHA v2 100% by 2024
The puzzle layer is getting obsolete. Companies are shifting from “can you solve this” to “can you prove you’re real.”

