Nace’s 90/10 agent split runs on demand hypernetwork adapters, not context or retraining
The autonomy pitch finally gets a mechanism: generate task models from policy at inference time and keep humans only for validation.

Nace.AI, a Palo Alto company that raised a $21.5 million seed round in May, uses a hypernetwork generator called a MetaModel to produce parameter adaptations from a company’s policies at inference time. For enterprise agent teams, this shifts the autonomy bottleneck from “more context” to “smaller, calibrated, grounded specialist models,” changing what decision-makers must demand for production.
Enterprise agent demos usually fail the same way: the assistant runs for a bit, then hits a wall and needs a human to top up context and check its output. The promised efficiency drains into supervision. The pitch on the other side of that wall is the magic ratio teams want to believe in: an agent that can run the bulk of a workflow while humans validate only the last 10%. Nace markets exactly that split, saying its agents handle the bulk of a workflow while human experts validate the result, a split it markets as 90/10.
What matters for decision-makers is not the marketing shorthand. It’s the mechanism Nace claims to use to avoid the two familiar failure modes that keep agents from scaling: fine-tuning’s forgetting and retrieval’s “context rot.” Fine-tuning bakes business knowledge into the model’s weights, but it remains subject to catastrophic forgetting, a problem identified in the 1980s and still unresolved in 2026. Teams work around it by isolating each task in its own fine-tuned model or adapter, which produces a sprawling estate of models, raising cost and governance overhead. And a fine-tuned model is a snapshot, stale the day a policy changes, when the expensive, slow retraining cycle starts over.
Retrieval-augmented generation (RAG) takes the other route: it skips retraining by putting relevant policies in the prompt at run time. This is where context rot bites. Retrieval narrows what goes into the prompt, but a retrieval miss looks identical to a confident answer, and both cost and latency climb with every token added. In other words, whether the system used last quarter’s policy (fine-tuning) or lost a detail in the middle of a long prompt (in-context learning), the output can look equally assured. The result is predictable: the human never gets to leave. Some teams run both at once, fine-tuning stable knowledge and retrieving the rest. That softens each failure but removes neither, so teams still check output because they still cannot be sure the model is both current and working from the right context.
A third path is moving from research into early product: generate the specialist model on demand. Instead of retraining one model or stuffing a giant prompt, a generator builds a small, task-specific model on demand from your policies at inference time. The source calls the generator a hypernetwork, meaning a network whose output is the weights of another network. The idea was named in 2016, and applying it to produce specialist language models from text or documents is recent and active. Sakana AI’s Text-to-LoRA, presented at ICML 2025, generates a model adapter from a plain-language description in a single pass. A 2026 system called SHINE calls hypernetwork adaptation a promising new frontier, precisely because it sidesteps both the retraining cost of fine-tuning and the context limits of prompting.
Nace’s commercial wedge is to generate adapters rather than train and store them. The point of generating task-specific adapters is to collapse a sprawling library of per-task LoRAs into one network that can produce them on demand, including for tasks it has not seen. The elegant part is that this maps directly onto the same pain that fine-tuning and RAG try to solve separately. The per-task adapters teams hand-build to dodge catastrophic forgetting are the same object a hypernetwork produces automatically. In plain terms for enterprise governance teams: the “model zoo” can stop being a governance headache if it becomes an on-demand generated output rather than a library of separately trained artifacts.
Why all this is suddenly strategically important is that the underlying agent workflow is not just “smarter models,” it is orchestration. Routing, durable execution, and observability all assume each agent is already competent enough to coordinate in the first place. The deeper question is how long an agent can run before a human has to step in, and that comes down to where company knowledge lives relative to the model. A hypernetwork-built model attempts to raise the autonomy ceiling by reducing the surface area where things can go wrong. The source frames the logic this way: a narrow, current, small model has fewer errors confined to a known domain, which means fewer outputs must be escalated to a person. That is where a number like 90/10 comes from in this model, not as a dial set in advance, but as an outcome of how little the system needs to hand back.
Two design choices decide whether that autonomy is trustworthy or merely fast. The first is grounding: tying every output to its source so a reviewer can verify rather than redo. Research models built for this, such as HalluGuard, label each claim as supported or not and cite the passage they relied on. Nace ships its agents with grounding models and reasoning traces for the same reason. A 10% review only means something if the human can confirm provenance in seconds. The second is the feedback loop: when experts validate the output, whose model improves, and where does it live? Ownership of the improving asset is where enterprise risk becomes real. Arrangements differ. Nace, for instance, uses an external network of certified experts for some engagements, and for direct enterprise deployments, uses the customer’s own staff, with the resulting model kept inside the customer’s cloud. Each choice routes learning to different places, which matters for both compliance and operational control.
Still, the third path is early, and a few questions decide how far it goes. Calibration is the linchpin, because the value depends on the model knowing when it is unsure. The source notes recent work found generated adapters do not automatically improve calibration over ordinary fine-tuning, with gains appearing only under specific constraints. Quality depends heavily on the policy data it is built from, putting a premium on data curation. Scale is also an open research frontier, since hypernetworks shown in published work so far have been small. For boards and exec teams, this translates into a simple diligence checklist: autonomy metrics are not enough. You need to ask how grounding works, how calibration is measured under real policy churn, what happens on retrieval misses and long-horizon runs, and exactly where the learning and model updates reside when humans validate.
The strategic stakes are blunt. If you are the person responsible for turning an agent pilot into a production system, the bottleneck is not that agents are too dumb. It is that the architecture keeps hiding which part of your business knowledge is missing, stale, or misunderstood. Nace’s approach tries to make that failure mode structurally rarer by moving from prompts and retraining cycles to hypernetwork-generated specialist adapters anchored to policy, with grounding and a feedback loop designed for ownership. Whether it delivers consistently will come down to calibration and the quality of the policy inputs. But the direction is clear: the autonomy ceiling is an engineering problem, not just an orchestration ambition.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

Aura’s e-ink photo frame makes “digital” feel old-fashioned again
Aura Ink uses e-ink to display rotating family photos in a way that visually escapes the “tech gadget” vibe.

NASA’s ERNEST rover hits 16 miles in 37 hours, 10x Mars speed
JPL’s active-suspension prototype drove 0.6 mph in desert tests, using reinforcement learning to move faster than rovers in orbit.

Fitness trackers can work on tattooed skin, but the right tech decides
How tattoos interact with optical sensors, what to test before you buy, and why regulators care.
