Enterprises aren’t failing models: 77% burn engineering time on runtime plumbing

VentureBeat’s Pulse Research says AI agents break in production because they are stateless, fragile, and hard to observe.

ByHessa Al-FalehBusiness Desk, The Executives Brief

about 2 months ago·4 min read

Enterprises aren’t failing models: 77% burn engineering time on runtime plumbing

Executive summary

VentureBeat’s Pulse Research in May 2026 found a “Governance Mirage” and, more importantly, a “runtime problem” for enterprise AI agents. The result is that most teams are spending real engineering capacity on durability work, not agent intelligence.

In Q1 2026, VentureBeat’s Pulse Research surfaced the “Governance Mirage”: the gap between the governance org charts enterprises drew and the control layers they actually built. Forty-three percent said a central team owned AI governance. Twenty-three percent couldn’t agree on who owned it at all. And 31% named vendor opacity as the biggest obstacle.

But the next question from this research is the one that actually decides whether an AI program lives or dies: once you admit governance is messy, what breaks first when you try to fix it? The answer from the respondents in the same Q1 2026 wave is unambiguous. It’s not mainly the model. It’s the runtime. And the headline number is the proof: 77% of respondents said meaningful engineering time each week is consumed by building and maintaining custom “plumbing” like manual retries, state-persistence, and checkpointing, rather than agentic logic.

Here’s why that distinction matters. The enterprise agent conversation often gets stuck in the “spine vs. brain” debate. The brain is model reasoning capability. The spine is the runtime infrastructure that manages state, survives failures, and coordinates execution. In this survey, respondents explicitly framed the issue as infrastructure durability and operational control, not just model smarts. Still, 17% said the brain is the primary failure mode, which tells you this is not a one-sided story. It is three-sided. The models are not yet reliable enough for the edge cases some workflows generate, but the dominant pattern is that stateless infrastructure cannot survive production reality for long-running, multi-step agentic processes.

The runtime failures are the kinds that turn pilots into “why did this break on Tuesday?” tragedies. Container restarts can erase context. Token costs can breach business cases. Hallucinations early in a workflow can compound into catastrophic failures by later steps. The most expensive part is often that the work required to keep an agent steady in the real world is not glamorous. It looks like engineering time spent on retries and debugging “ghost failures” rather than differentiated logic. And per the survey, the market is stuck in a dangerous middle: 77% paying a DIY tax, and only 23% in camps where frameworks or managed approaches handle reliability well enough to avoid that overhead.

That “two camps plus a middle” picture has second-order consequences for budgets and roadmaps. When teams are in the Crisis zone, every engineering hour spent writing retry logic or chasing a silent API timeout is an hour not spent building the agent intelligence that was supposed to justify the investment. Meanwhile, Efficiency Zone teams might be using managed platforms that abstract some durability concerns, or they simply might not have hit the scale where stateless architectures start failing. The survey describes the Complexity Trap as the place the “efficiency” story ends. In other words: you can get something working at small scale, then run into production-grade state management, cost predictability, and failure recovery the moment the workload becomes real.

Zoom out one layer and governance becomes part of the same operational story, not a separate checkbox. VentureBeat’s Pulse Research already pointed to vendor opacity as a top governance obstacle. This shows up again when respondents were asked about observability costs: which vendor ecosystem demands the most custom telemetry, manual instrumentation, and logging glue to see inside agentic failures. Microsoft landed at the top of that ranking, and VentureBeat frames it as structural, not noise. The implication is sharp for teams evaluating orchestration architecture: if your observability requires heavy custom work in a specific vendor ecosystem, you are effectively paying rent for visibility you do not fully control.

The survey also identifies a technical bottleneck shift. When AI agents fail to reach production or scale, the number one technical obstacle has changed. Cost and hallucination now lead state failures, with hallucination propagation cited by 24% of respondents as compounding silently over multiple steps. Ghost failures were cited by 20%, which matters because “invisible by definition” problems tend to be underestimated. The survey doesn’t treat this as a pure model failure either. Instead it paints a production reality where state management and failure observability are the hinge points.

All of this comes from VentureBeat’s methodology: the survey was conducted in May 2026 as part of Pulse Research series on agentic AI adoption in the enterprise. Respondents were filtered to organizations with 100 or more employees, producing a final qualified sample of 132 verified technology leaders. Demographics include Directors of AI/Analytics, Directors of Engineering/IT, VPs across Data and Engineering, CIOs/CTOs/CISOs, Product and Program Managers, consultants, software and ML engineers, enterprise architects, and other roles. Company sizes skewed toward mid-to-large enterprises (500 to 9,999 employees at 48% and large enterprises of 10,000+ at 35%).

So for executives, founders, and investors tracking enterprise AI readiness, the message is less about which model wins and more about which teams build the runtime system that can actually hold the workload together. The strategic stake is straightforward: the organizations that survive the “Agentic Reckoning” are the ones that treat runtime durability as first-class engineering, not an afterthought patched with retries and prompting. The ones that do not risk repeating the RPA graveyard pattern, where impressive prototypes fail the moment Day Two arrives.

Executive ActionsLocked