Meituan opens LongCat-2.0: a 1.6T agent model trained on Chinese ASICs

The Owl Alpha engine is now on GitHub and Hugging Face, with a 1M-token context window and sharp pricing for developers.

ByOmar Al-BalawiTechnology Correspondent, The Executives Brief

about 16 hours ago·4 min read

Meituan opens LongCat-2.0: a 1.6T agent model trained on Chinese ASICs

Executive summary

Meituan has released LongCat-2.0 on GitHub, Hugging Face, and its native platform, identifying it as the computational engine behind “Owl Alpha.” The move matters because it pressures the global AI infrastructure status quo at the exact moment U.S. controls are tightening access to frontier model APIs.

Meituan just unmasked LongCat-2.0 as the computational engine behind “Owl Alpha,” and the release is already creating a market-sized aftershock: a 1.6-trillion-parameter Mixture-of-Experts (MoE) model with a native 1-million-token context window, trained entirely on a cluster of over 50,000 domestic Chinese Application-Specific Integrated Circuits (ASICs). Meituan put LongCat-2.0 on GitHub and Hugging Face (and also on its native platform), but it has not posted the full weights yet. Both model pages say, “Model weights coming soon - stay tuned!”

The second you care about what this changes, you hit the practical stuff. While the architecture is open for commercial use under a highly permissive, enterprise grade, commercially viable MIT license, Meituan’s API pricing is aggressively structured around context-cache economics. Context-cache hits are processed completely free of charge. For uncached usage, it uses a time-limited “Token Pack” flash-sale paradigm alongside a standard pay-as-you-go API: $0.75/$2.95 per million tokens in/out. Under the limited-time promo, operational costs drop to $0.30 per million tokens for uncached input and $1.20 per million tokens for output. In the release’s pricing table, LongCat-2.0 sits at $0.30 in and $1.20 out for the limited-time promo total of $1.50 per million tokens, and $0.75 in and $2.95 out for the standard tier total of $3.70.

Why this is a real inflection point is not just “cool model, big numbers.” It is operational independence. Meituan says LongCat-2.0 was trained entirely on over 50,000 domestic Chinese ASICs, which it positions as proof that near-frontier AI models can be scaled without relying on the Nvidia GPUs that have powered much of the generative AI frontier training effort. If Chinese conglomerates can iterate trillion-parameter architectures on homegrown silicon, it chips away at the supply-and-leverage model that has benefited Nvidia in the AI stack. And importantly, it lands during a period of U.S. pressure on top American labs.

The timing is the political part of the plot. The source notes Washington pressures American labs to restrict access to their latest models. It says that following a U.S. governmental request, OpenAI was forced to limit access to its new GPT-5.6 models, and that Anthropic was previously also ordered by the U.S. to restrict access to its latest Claude Fable 5 / Mythos 5 models, which it took entirely offline in response. The “backfired” framing in the source is that defensive moves can unintentionally widen the operational window for developers seeking affordable, high-performance alternatives, including Chinese open source models like LongCat-2.0.

That operational window is not theoretical. The source links developer enthusiasm to concrete token volume and top rankings during “Owl Alpha’s” unbranded OpenRouter residency. It says Owl Alpha accounted for approximately 10.1 trillion monthly tokens, averaging 559 billion tokens per day. It also reports a 242% month-over-month explosion in volume and that the model propelled into the platform’s global top three. After Meituan claimed the architecture, the source says it had already secured the top ranking on the Hermes Agent workspace, second place on Claude Code deployments, and third place across international OpenClaw environments.

Under the hood, LongCat-2.0 is built to make a massive context window practical. The source describes an aggressive Mixture-of-Experts sparsity strategy: total parameters scale to 1.6 trillion, but active computation averages 48 billion parameters per token. Dynamic activation ranges from 33 billion to 56 billion parameters depending on query complexity. It also describes a “Zero-Compute Experts” framework where routine execution elements route through lighter subnetworks, eliminating idle compute overhead that typically penalizes ultra-dense models.

To sustain the 1-million-token sparse context without running into quadratic scoring costs and memory fragmentation, Meituan introduces LongCat Sparse Attention (LSA). The source frames LSA as an evolutionary iteration of DeepSeek Sparse Attention and lists three orthogonal mechanisms. Streaming-aware Indexing (SI) restructures the token selection pipeline by blending hardware-aligned contiguous data reads with dynamic random selection, converting fragmented memory access into more predictable sequential blocks for coalesced High Bandwidth Memory utilization. Cross-Layer Indexing (CLI) leverages attention saliency stability across adjacent hidden layers so one indexing pass can guide multiple consecutive layers during inference, reinforced by cross-layer distillation in training. Hierarchical Indexing (HI) uses a coarse-to-fine two-stage scoring layout: approximate block-level recall to filter candidates, followed by fine-grained token selection on the remaining population.

The architecture also includes an N-gram Embedding module inherited from lighter model lines. The source says it expands parameter allocation in sparse dimensions orthogonal to the MoE expert layout and appends 135 billion parameters to a 5-gram token combination framework. That expands the core embedding space by roughly 100-fold, aiming to capture dense local token relationships and reduce memory I/O bottlenecks to accelerate large-batch inference.

Finally, LongCat-2.0 is framed as agentic-first rather than general conversational. In standardized assessments, the source reports an empirical 59.5 on SWE-bench Pro, surpassing GPT-5.5’s benchmark of 58.6. It also reports 70.8 on Terminal-Bench 2.1, 77.3 on SWE-bench Multilingual, and 73.2 on FORTE, described as a general corporate workflow simulator.

For executives, the strategic stake is not only “can the model write code.” It is who controls the cost structure and compute pathways for agentic software engineering. Meituan’s combination of an open MIT-licensed architecture, a 1-million-token context window, and a pricing model that can make cached context effectively free creates a tangible alternative path for developers building tooling now. And it arrives while U.S. restrictions are reshaping what’s accessible from American frontiers, tightening incentives for boards and operators to diversify both model sources and infrastructure assumptions.

Executive ActionsLocked