MRAgent trims long-horizon prompts to 118K tokens, versus 3.26M in LangMem

Researchers at NUS replace passive “retrieve-then-reason” with active memory reconstruction that cuts cost and runtime.

ByMaha Al-JuhaniEntertainment Correspondent, The Executives Brief

about 5 hours ago·5 min read

MRAgent trims long-horizon prompts to 118K tokens, versus 3.26M in LangMem

Executive summary

NUS researchers introduced MRAgent (Memory Reasoning Architecture for LLM Agents), a framework that dynamically rebuilds an agent’s memory during reasoning instead of relying on static retrieval. For decision-makers, it slashes prompt tokens to 118K per sample in LongMemEval, down from 3.26M in LangMem, and reduces runtime versus A-Mem.

If you have ever watched an AI agent “think” and then quietly drown in its own context window, this new framework is the antidote. On LongMemEval, MRAgent (Memory Reasoning Architecture for LLM Agents) used just 118K prompt tokens per sample. In the same test setup, LangMem burned through 3.26M tokens per query. That is not a minor optimization. It is a different economics model for long-horizon tasks.

The reason is baked into how MRAgent works: it is built by researchers at the National University of Singapore to abandon the static retrieve-then-reason approach. Instead of fetching a pile of documents up front and hoping the model can extract signal from noise, it performs multi-step memory reconstruction inside the LLM’s reasoning process. And in practice, that active approach also cuts runtime compared to A-Mem, dropping from 1,122 seconds to 586 seconds in the reported results.

Here’s the core problem MRAgent is attacking. Classic retrieval pipelines typically retrieve documents using vector search or graph traversal, then pass the results to an LLM for reasoning. That “passive read-out” pattern creates three bottlenecks. First, the agent cannot revise its retrieval strategy mid-reasoning. If it fetches something and realizes a crucial cue is missing, it has no clean way to issue a new query based on that new realization. Second, fixed similarity scores and predefined graph expansions can flood the LLM’s context window with irrelevant noise, which degrades reasoning. Third, systems often rely on pre-constructed structures like top-k results and static relevance functions, which limits flexibility when interactions become unpredictable over dozens of sessions and hundreds of turns.

MRAgent’s bet is that long-horizon agents need memory access that behaves less like a search engine tab and more like an adaptive exploration. The researchers argue developers should shift toward an “active and associative reconstruction process,” inspired by cognitive neuroscience. In this paradigm, memory recall unfolds sequentially rather than operating as a passive read-out of a static database. The system starts with small, specific triggers from the user prompt, such as a person’s name, an action, or a place. Those initial hints point to connecting concepts or categories, not a massive dump of text. It then gathers evidence piece by piece. Each new piece updates what the agent should look for next, until it can assemble an accurate story.

Technically, MRAgent treats memory as an interactive environment, not a static store. When processing a complex query, the agent uses the backbone LLM’s reasoning ability to explore multiple candidate retrieval paths across a structured memory graph. At each step, it evaluates intermediate evidence it has gathered, uses that evidence to iteratively optimize search, infers new search constraints, pursues the paths with the best information, and prunes irrelevant branches. The claimed payoff is that it can piece together deeply buried information without filling the LLM’s context with noise.

To make this exploration efficient, MRAgent organizes the database using a “Cue-Tag-Content” mechanism, described as a multi-layer associative graph with three node types. Cues are fine-grained keywords extracted from user interactions, like entities or contextual attributes. Content is the stored memory itself, divided into multi-granular layers such as episodic memory for concrete events and semantic memory for stable facts and user preferences. Tags are semantic bridges that summarize relational associations between specific Cues and Content.

This structure enables a two-stage retrieval process. First, the LLM navigates from Cues to candidate Tags. Because Tags expose semantic relationships and structural associations in compact form, the agent can judge relevance using short summaries. Then it identifies promising traversal paths and discards irrelevant branches before spending compute and prompt tokens on the heavier, detailed Content retrieval. The paper’s example makes the loop feel tangible: for a query like “How did Nate use the prize money when he won his third video game tournament?” the agent extracts starting cues such as “Nate,” “video game tournament,” and “win.” It then maps those cues to associative Tags and drops a tag like “Tournament Participation” to pursue a “Tournament Victory” path. It retrieves three episodic memories where Nate won a tournament, selects one as relevant, discards the other two, updates its cues, and repeats until it can answer something like “Nate saved the money.” The key idea for operators is that the system stops exploring when it knows it has enough, rather than blindly collecting redundant context.

On benchmarks, MRAgent is reported to outperform multiple baselines across models and question types. The system was tested on the LoCoMo and LongMemEval industry benchmarks, which evaluate agents resolving queries on long-horizon tasks and conversations across dozens of sessions and hundreds of turns of dialogue. Backbone models included Gemini 2.5 Flash and Claude Sonnet 4.5. MRAgent was compared against standard RAG, A-MEM, MemoryOS, LangMem, and Mem0, and it “consistently outperformed every baseline across both models and all question types by a significant margin.” The most enterprise-relevant metric in the results is computational cost: in LongMemEval, MRAgent reduced prompt token consumption to 118k per sample versus 632k tokens for A-Mem and 3.26M tokens for LangMem. Runtime is also reported to improve: it halved runtime versus A-Mem, going from 1,122 seconds to 586 seconds.

Finally, there is the implementation detail that matters when you are trying to ship this: the Cue-Tag-Content structure must be prepared before querying. Developers need to architect the underlying memory database so the LLM can efficiently navigate associative items and prune irrelevant paths without compute blowups. The good news, per the authors, is you do not have to manually label or structure the data. They designed an automated distillation pipeline that uses LLMs to process raw interaction histories and populate the memory graph. Developers’ job becomes implementing and orchestrating an ingestion pipeline, passing raw user interactions through prompt templates to extract metadata before storing it in the graph database. The authors emphasize this construction phase is lightweight and keep ingestion simple, and they released the code on GitHub.

Second-order implication: if long-horizon memory cost drops by an order of magnitude, the “limits” that forced product teams to shorten conversations, reduce retrieval depth, or cap agent memory become less binding. That shifts board-level discussions from “can we afford long context” toward “how do we responsibly build memory, and how do we know when the agent has enough evidence to stop.” In a world where compute costs and latency directly affect margin, MRAgent’s active reconstruction could be the difference between agents that feel helpful and agents that become expensive background noise.

Executive ActionsLocked

This story's Key Insights and Take-aways are locked.

Create a free account to unlock Executive Actions for one credit.

Always free for Executives Club members. Join the Club

Taggedagentic-memory rag llm-reasoning long-context token-cost runtime-optimization cue-tag-content memory-graph benchmarks n-us

MRAgent trims long-horizon prompts to 118K tokens, versus 3.26M in LangMem

This story's Key Insights and Take-aways are locked.

More in Entertainment

Stuart Saves The Universe takes Big Bang spin-offs somewhere fans haven’t seen yet

Mike White, Universal and Illumination face copyright suit over ‘Migration’

Vanilla Ice’s Freedom 250 concert gets canceled after heavy rain shuts the fair