AI boosts PR merges 16.2%, but incidents-to-PR ratio jumps 242.7%: software factory reckoning

LLM-driven throughput is rising, yet delivery stability is sliding unless teams build a real factory with guardrails and traceability.

ByLama Al-RashidTechnology Correspondent, The Executives Brief

2 days ago·4 min read

AI boosts PR merges 16.2%, but incidents-to-PR ratio jumps 242.7%: software factory reckoning

Executive summary

VentureBeat describes how AI and the “software factory” approach can increase task throughput and PR merge rate, citing Faros AI and Google’s DORA research. The consequence: executives may scale output faster while also scaling defects and instability unless quality is engineered into the system, not bolted on.

AI is making software teams faster, but the data VebentureBeat highlights is a little brutal: Faros AI reports task throughput per developer up 33.7% and PR merge rate up 16.2%, while the incidents-to-PR ratio rises 242.7% and bugs per developer are up 54%. In other words, the factory-like shift is not automatically producing a healthier supply chain for code. It is producing more code, more quickly. And without a stronger production system, that speed turns into a reliability problem.

That mismatch is exactly why “software factory” thinking is moving from a buzzword to a management problem. The article argues that the standard software development lifecycle plus CI/CD practices “won’t hold up under that pressure.” The bottleneck is no longer purely “How fast can someone write this?” It becomes, “Should this be written?” and, more importantly, “Can we actually create end products that are durable and reliable, and not just build tech debt?” If your org treats AI-assisted shipping as a throughput game, you effectively manufacture more mistakes faster, even when smaller teams suddenly create codebases with tech-company scale in months.

So what changed? The article points to multiple forces hitting at the same time. LLMs have lowered the barrier to writing code, and that is the part everyone focuses on. Code creation is easier, even if it is not always cheaper or better, a point the piece links to high-profile companies fretting over their “high AI bills.” More importantly, one engineer can generate more code than they could a few years ago. When more code can be produced by fewer humans, the classic bottleneck shifts toward decision-making and governance. That includes standards, review quality, testing depth, and the ability to stop a bad change before it becomes an incident.

This is where the “software factory” concept gets real, and where it also gets easy to misunderstand. The article says the concept can mean everything from coding agents and skills files to faster CI/CD, better review systems, or more automation around delivery. But the stronger frame is to treat it less like a tool category and more like a set of principles. A software factory cannot be “a loose collection of prompts, agents, and plugins.” Like physical factories, it needs a platform that defines how work moves through the system, including how code is generated, reviewed, tested, traced, deployed, and improved when something goes wrong. Otherwise, you do not have a factory. You have a pile of one-off machines in an empty room.

The risks of treating software delivery like a speed-only factory are showing up in outcomes. Besides Faros AI’s incident and bug deltas, the article references Google’s DORA research, saying that more AI adoption was associated with worse delivery stability. It also describes two projects where AI-generated data infrastructure gradually morphed over time. Multiple engineers moving quickly and a lack of standards made the systems unruly. Codebases that previously evolved more slowly developed five to six different styles within months. Layer by layer, engineers stopped understanding what was happening, because the LLMs created mutations as those styles blended. The piece compares the pattern to what happened a decade ago with self-service tooling: early productivity gains that masked downstream complexity.

On the “what to build” side, the article lays out principles executives can translate into governance requirements. First: platform over tools. Teams adding an agent at the edges are not building a factory; they are integrating isolated help. A platform unifies the foundation so tools share data and work as one system, with standards, processes, and the work itself connected. Second: rerunability and traceability. A real platform must let you go back into any run, identify what went wrong, and rerun it. The article emphasizes tracing via a serial ID and the ability to understand each step that led to output. It even notes that state machines can fit AI workflows better than loops because they make reruns and step-level understanding easier. Third: safety and guardrails, including pushing testing and quality control earlier in the process to catch bugs at the lowest possible stage and reduce the blast radius. Fourth: standardization. Without it, layering assistants on top produces an amalgamation of styles. Fifth: quality control. In older manufacturing models, QC happened at the end; defects were found and fixed later. Toyota-style production pushed quality into the process itself, including the idea that workers should stop the line when something is wrong. The article argues the same logic applies to software factory design, with QC baked into everything, including how the spec is written, static code analysis for obvious errors, and templates that tell LLMs the structure the code should follow.

Finally, the payoff executives care about: speed is not productivity if downstream issues are unmanaged. The article draws the line between shipping more and building durable outputs. A company is not productive because it produces “millions of cars” that all fail within 100 miles. It is not productive by generating endless proofs-of-concept that never reach production. Real productivity in this framework is turning ephemeral tokens into durable outcomes, measured by the fewest defects downstream, not the most lines of code. For boards and leaders, the second-order implication is straightforward: if you approve AI-driven throughput targets without insisting on platform-level traceability, guardrails, and standardized quality control, you may be optimizing the metric that looks good in the dashboard while quietly inflating incidents, instability, and engineering drag.

That is the software factory reckoning: LLMs can industrialize the act of writing code, but factories are only factories when quality management is engineered into the system. Otherwise, your organization is not scaling engineering. It is scaling risk.

Executive ActionsLocked

This story's Key Insights and Take-aways are locked.

Create a free account to unlock Executive Actions for one credit.

Always free for Executives Club members. Join the Club

Taggedsoftware-factory llm ci-cd delivery-stability engineering-quality traceability guardrails dora devops

AI boosts PR merges 16.2%, but incidents-to-PR ratio jumps 242.7%: software factory reckoning

This story's Key Insights and Take-aways are locked.

More in Technology

Spyro speedrunner Lumilaura says a native PC Spyro 1 port runs 60FPS, no AI

DeepSeek’s DSpark boosts V4 response speed up to 85%, cutting inference bottlenecks

MSI’s Claw 8 EX AI+ adds real wins over Legion Go in cooling and ergonomics