Liquid AI’s 230M-parameter LFM2.5-230M beats 4x larger models at data extraction
A small LLM built for on-device agent workflows targets AI ETL and edge deployment without massive memory overhead.

Liquid AI, founded by former MIT computer scientists, just released its smallest AI language model yet, LFM2.5-230M, designed for on-device agentic workflows. For enterprises, the consequence is clear: lower-cost, lower-latency “AI ETL” and tool-calling can move from cloud APIs to smartphones, laptops, and robotics.
Liquid AI’s new model, LFM2.5-230M, packs 230 million parameters and still manages to outperform models more than 4x its size on selected benchmarks. Specifically, Liquid says it does better at data extraction than Alibaba’s 800 million parameter Qwen3.5-0.8B (Instruct) and Google’s 1 billion parameter Gemma 3 1B.
That matters because LFM2.5-230M is not positioned as a general “frontier” model. Liquid explicitly designed it for on-device agentic workflows where small size makes it possible to run nearly “anywhere,” including smartphones, laptops, and robotics. In other words, this is a model built to be useful under constraints: limited memory, limited compute, and the need to run locally rather than relying on constant cloud calls.
So what is Liquid selling, beyond the headline parameter count? The model targets developers and engineers building lightweight data extraction pipelines and autonomous edge systems. Liquid argues that the traditional approach to moving enterprise data is brittle: organizations have often depended on rigid, rule-based Extract, Transform, Load (ETL) scripts, and a document layout change or schema update can break the whole pipeline. The industry is shifting toward “AI ETL,” where machine learning infers mappings, detects schema drift, and adapts automatically.
In that setup, a smaller language model has a practical edge: it can translate unstructured inputs, like PDFs, emails, and web forms, into structured outputs such as JSON without hardcoded rules. Liquid also frames the economics: using a massive flagship model for routine extraction and formatting tasks is economically unviable. The source highlights an example cost for Claude Opus 4.6 at $5.00 per million input tokens, emphasizing why enterprises need an alternative for repetitive, extraction-heavy workflows.
Technically, Liquid says it achieves high inference speed without the massive memory overhead typical of parameter-heavy transformers. The company ties this to the LFM2 architecture, which diverges from standard transformer-only designs by interleaving gated short-range convolutions with grouped-query attention to process information efficiently. The stated result is a smaller memory footprint and competitive speed. Liquid reports a memory footprint under 400MB, prefill and decode speeds that outpace comparable models like Gemma 3 1B IT and Granite 4.0-H-350M, and specific device benchmarks.
On a Samsung Galaxy S25 Ultra with a Qualcomm Snapdragon Gen4 CPU, Liquid reports a decode speed of 213 tokens per second. Even on a Raspberry Pi 5, it reports 42 tokens per second. Liquid also notes internal benchmarking that the GPU inference stack delivers lower end-to-end latency than competing small models across all concurrency levels. Add that to the model’s 32K context window, and the intended story becomes coherent: extraction and tool-calling often involve long documents or continuous streams like robotic telemetry, and the architecture aims to handle that without the quadratic memory blowups of pure attention mechanisms.
Liquid also anchors “on-device” with a concrete robotics demo. The company says LFM2.5-230M was deployed on a Unitree G1 humanoid robot running entirely on-device via the robot’s onboard NVIDIA Jetson Orin compute module. Liquid describes a free-form instruction turning into a structured multi-step plan: “Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters.” The model then calls on pre-trained low-level skills from NVIDIA’s SONIC framework. The point for execs is not just the demo. It is that “agentic workflows” at the edge need a bridge from natural language to structured steps, and Liquid is positioning this model as that bridge, a skill-selection layer focused on tool calling.
Benchmark claims reinforce that positioning. Liquid reports LFM2.5-230M scoring 43.26 on the BFCLv3 tool-use benchmark, ahead of IBM’s Granite 4.0-350M (39.58) and far above Google’s Gemma 3 1B IT (16.61). On CaseReportBench for data extraction, it scores 22.51, outperforming Qwen3.5-0.8B (Instruct). Liquid acknowledges a boundary condition: it is not meant to compete on reasoning-heavy workloads like advanced math, coding, or creative writing.
Finally, the release comes with deployment and licensing details that matter for procurement and legal review. LFM2.5-230M ships under the LFM Open License v1.0. Despite the word “open,” the source says it is not OSI-compliant and functions as a restricted, dual-use commercial framework. For independent developers, researchers, and early-stage startups, the license works “identically to open-source software,” granting a perpetual, worldwide, royalty-free right to reproduce, modify, and distribute, as long as users retain original copyright notices and prominently state modifications. But the license includes a strict “Commercial Use Limitation,” with the source stating that the model remains free for individuals and companies generating less than $10 million in annual revenue, and requires a paid enterprise agreement for larger corporations.
If you run data workflows, edge products, or internal developer platforms, this launch raises a simple strategic question: will your next “agent” project be a cloud cost center, or can it run locally with tool-calling performance that improves extraction reliability? Liquid’s bet is that architectural efficiency beats brute-force scaling for specific enterprise jobs. The second-order implication is that boards and CFOs should treat small-model roadmaps as more than R and D experiments, because the economics and latency of extraction and tool calling can shift quickly when models are built to “run nearly anywhere.”
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

Android 17’s foldable gaming mode adds a virtual gamepad built for physical-controller games
Google’s new foldable feature aims to make flippy-phone gaming easier, by mapping touch controls to system-level button presses.

OpenAI may delay its IPO to 2027, report says, after SpaceX's rocky debut
The planned late-2026 listing could slip, changing how investors, boards, and rivals time their next moves.

YouTube Shorts adds TikTok-style “clear screen,” hearts, and 2x playback controls
The latest Shorts tweaks push further into TikTok territory, tightening engagement loops and shifting how viewers interact minute-by-minute.
