Shopify’s LLM proxy seamlessly fails over when models like Claude Fable 5 disappear
Farhan Thawar says the proxy prevents engineer workflows from breaking when providers change, update, or go down.

Shopify built an LLM proxy that routes engineers to multiple AI providers and automatically transfers usage when a model shuts down, changes, or disappears. For decision-makers, it turns model volatility into a managed risk, and it reframes spending, accuracy, and governance around portability.
Shopify’s head of engineering, Farhan Thawar, describes an internal LLM proxy that keeps engineers productive even when specific models vanish. His example is blunt: when Claude Fable 5 shut down, Shopify did not push engineers into panic or manual reroutes. The proxy automatically shifted them to other options like Claude Opus or GPT 5.5 “without interrupting their workflows.”
That is the real business point. In a world where model availability can change overnight due to provider decisions, updates, or outages, Shopify is treating “provider volatility” as an engineering problem with a productized answer. Thawar says the proxy lets Shopify “spray across the different providers,” so when one model comes, then goes, or when something changes innocuously, engineers keep moving instead of rebuilding prompts, reconnecting tools, or re-allocating token budgets mid-task.
Under the hood, Shopify buys tokens in bulk, and engineers and other internal users connect to models through the proxy. Thawar frames this as giving Shopify two operational capabilities at once: reporting visibility across providers and automatic failover when a provider has an availability issue. The failover part matters because it shifts the failure mode. Instead of “your agent stops,” the likely outcome becomes “your agent keeps running, just on a different model behind the proxy.” Thawar also emphasizes that enterprises should plan for disruption the same way, including having a backup plan and avoiding being “super tied” to a single provider.
This is where the story gets bigger than uptime. Shopify’s approach is built for a reality executives often underestimate: even if the model is “the same” product name, updates can change behavior enough to affect outputs. Thawar points out that a proxy can respond not only to a shutdown like Claude Fable 5, but also to smaller changes that are “innocuous” on paper. For boards and operators, that means model strategy becomes less about betting the company’s workflow on one frontier option and more about designing optionality. You can switch providers, switch model families, and maintain continuity.
Shopify’s second lever is distillation, a strategy aimed at improving cost and speed without losing too much accuracy. With distillation, a “student model” learns from a “teacher model” and typically becomes specialized in a narrower task. Thawar connects this directly to product economics and internal delivery. Shopify’s flagship AI assistant, Sidekick, performs specialized subtasks for merchants so they can “remove toil” from day-to-day work. In those settings, smaller distilled models can be faster and cheaper than generalized, off-the-shelf models. Thawar gives examples: in some cases they have proven to be 2x cheaper and faster, and in more extreme cases 30x cheaper and faster.
But he also draws a line executives should notice: “it isn’t just about cost and latency, which are big; it’s about accuracy.” That is why Shopify’s engineering process includes a pipeline where engineers feed the UDP their teacher model, training data, evals, and a target model example being “Opus 4.8 distilling down to Qwen 3.5.” The pipeline runs for about a day, then returns an evaluation showing what the fine-tuned model achieved on speed, cost, and accuracy for that subtask. If the tradeoff looks good, Thawar says there is no approval process required for deployment. Internally, Shopify’s platform, Tangle, lets anyone visualize the pipeline as it runs.
Then comes the next escalation, and it is basically the automation execs love: Thawar’s “dream” is to eventually not give the distillation pipeline a target model at all. Instead, users would provide the teacher model with data and evals plus a directive to evaluate over time and recommend the right distillation target across different model sizes and types. The pipeline’s output could be a smaller model that even runs on a phone, or it could return the unglamorous answer that there is no better distilled version than what exists at the frontier. Either way, the system is meant to convert experimentation into an operational loop.
Finally, Shopify’s proxy story ties into usage governance and “AI leverage,” not just tooling. Thawar says Shopify exposes users to different harnesses so they can test what works in their workflow, including Claude Code, Codex, Cursor, and GitHub Copilot for VS Code. The company also implemented a usage dashboard to answer questions beyond token spend, such as who is using the most expensive tokens, who is spending more time on reasoning, and what types of models are being used across disciplines and levels. On the “tokenmaxxing” risk, Thawar describes circuit breakers: if a model runs a long time, like 10 hours, and consumes a lot of tokens, the user is pinged. Sometimes they respond “Oh, absolutely.” Other times they say they forgot it was running and would rather stop it.
The strategic stakes for other enterprises are obvious and urgent: AI systems do not live in a stable environment. Providers change, models shut down, and costs can balloon silently. Shopify’s proxy and distillation approach converts those risks into measurable controls: failover for continuity, portability across providers, and data-backed tradeoffs for accuracy versus efficiency. Thawar calls the goal moving from “AI reflexivity” to “AI leverage,” pushing teams to think deeply about where AI delivers the most value in their workflows, while Shopify keeps building infrastructure before features, as he puts it: “We’ve always built more infra. We will continue to always build more infra.”
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

Google’s app store billing rules shift next week, replacing the 30% flat cut
Even without court approval of Epic’s settlement, Google will roll out “lower, decoupled fees” that change how developers pay.

AWS’s Matt Garman: half of white-collar jobs may change, not get wiped out
Garman argues AI will shift roles, while Amazon expands early-career hiring to prepare for that change.

Figma drops executable code into the design canvas, enabling code-layer workflows at Config 2026
Figma’s new code layers let teams pull flows from repositories into design layers for faster testing and fewer handoffs.
