MassMutual CIO Sears Merritt caps AI vendor contracts to keep model “optionality”
A 12-month approach, a 30% productivity uptick, and a “trust score” tell enterprises how to avoid AI lock-in.

MassMutual CIO Sears Merritt says the insurer is running AI with capped, time-bound vendor relationships so it can swap models as the market shifts. He also points to measured outcomes: about 30% higher developer productivity and faster, cheaper AI-assisted contact center workflows.
MassMutual CIO Sears Merritt is betting that AI will keep changing faster than most enterprise procurement cycles. His strategy, outlined in a VB Beyond the Pilot podcast, is blunt about the core problem: “The world of AI today is extremely dynamic.” So MassMutual aims to avoid long-term bets by structuring AI vendor relationships so they do not trap the business when better models arrive.
The headline stake is simple and very boardroom-friendly: MassMutual is measuring results now while keeping the right to change what is underneath. The company reports a roughly 30% increase in developer productivity. In customer service, AI-powered contact center workflows reduced resolution times from 10 minutes to one and cut costs from dollars to cents. The message is not “AI works.” It is “AI works, and we can still pivot without starting over from scratch.”
Why this matters for enterprise leaders is that AI introduces a new version of technical debt. If you commit to a model stack for too long, you can end up paying for yesterday’s performance while competitors quietly adopt today’s improvements. Merritt’s answer is to cap vendor relationships so MassMutual maintains optionality for best-of-breed tools as they mature, then settles and stabilizes once things “settle down and stabilize.” In plain English: you do not sign away your future decision-making while the market is still shaking out.
This “optionality” principle also shows up in how MassMutual treats model choices, including open-source. Merritt says his team is “100%” looking at open-source tools and sees them playing a big role in how MassMutual (and similar companies) use AI. The logic follows the same thread: enterprises will need frontier models and leading edge capabilities for what today makes no sense, but tomorrow will be possible. Meanwhile, they still need a workable near-term system that does not cost a fortune just to keep the lights on.
MassMutual then ties the strategy to measurement in a way that reduces the “pilot purgatory” risk. Its AI efforts fall into two broad categories. The first is enablement, like productivity-enhancing tools such as Copilot and virtual assistants placed in front of all employees. The second is what Merritt calls “deepen and focus” initiatives, where teams target a specific workflow or business process designed to have a strong impact on advisors, policyholders, or employees. Crucially, these projects do not start with vague goals or hope metrics. Merritt describes upfront success criteria: “Everything we do is measured,” and there is a success metric defined before deciding whether something scales.
This measurement mindset pairs with deliberate experimentation. Employees get access to a range of best-in-class models, token-consumptive workflows, and other capabilities so they can weigh benefits against simpler, lower cost large language models (LLMs). At the same time, MassMutual is collecting granular analytics around usage patterns, developer workflows, model performance, and costs. The endgame is operational intelligence: reduce spending while building the ability to route workloads to the right model based on cost, response quality, and user experience. Merritt also frames how those analytics will influence later optimization decisions around model routing, prompt selection, response times, and infrastructure design.
One of the most revealing parts of the podcast is how MassMutual decides what “good” means. Instead of leaning only on benchmarks or token cost, it uses a “trust score” framework that combines user feedback with operational metrics. The goal is to understand both how employees perceive AI responses and whether those responses actually improve outcomes. The contact center rebuild provides a clean example of how that can overturn instinct. During development, employees were given access to two different LLMs. One generated responses in near-real-time but with noisier quality. The other, a more expensive option, took several additional seconds but consistently delivered higher-quality answers. Conventional speed-first reasoning might predict users would choose the faster model.
Instead, users overwhelmingly chose quality. Merritt describes user responses as a recurring theme: they want the more expensive one, they are willing to wait, and the quality difference is so high that the two extra seconds are worth it. MassMutual factored that experience into decision-making and concluded, on a relative basis, that the costs were immaterial, so it deployed the more complex model. This is the kind of evidence that matters to executives because it connects model selection to real work outcomes, not just lab performance.
There is a second layer here that is easy to miss if you only look at the headline numbers. By designing capped relationships, running experiments, and building analytics-driven routing, MassMutual is effectively turning model choice into an operating capability. That can matter just as much as the initial productivity gains when leadership has to defend budgets, negotiate with vendors, or update governance as regulations and security expectations evolve. The direction is consistent: avoid lock-in, measure what matters, and make decisions you can explain. For other IT leaders and decision-makers, the lesson is that AI strategy is not only about getting to “best model.” It is about making sure you can keep improving without resetting the entire system every time the market moves.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

Anthropic plugs Claude AI into Japan for automated software development
Japan becomes the next proving ground for Claude, with a focus on turning code requests into software outputs faster.

Brad Smith tells grads to talk AI through after viral booing clips
Microsoft’s vice chair and president responds to commencement heckles as AI hype collides with student backlash.

AI-pilled firms burn $7,500 per employee monthly on AI, Ramp AI Index says
That spend is eyebrow-raising for a reason: it is a budget line, not just a tool purchase.
