Meituan just opened 1.6T LongCat-2.0, the 1M-token agent model beating OpenRouter charts

A near-frontier coding model runs on 50,000+ Chinese ASICs, with MIT-licensed weights and aggressive cached token pricing.

ByOmar Al-BalawiTechnology Correspondent, The Executives Brief

about 4 hours ago·4 min read

Meituan just opened 1.6T LongCat-2.0, the 1M-token agent model beating OpenRouter charts

Executive summary

Meituan unveiled LongCat-2.0 on GitHub, Hugging Face, and its native platform, identifying it as the engine behind the anonymous “Owl Alpha” model dominating global developer charts on OpenRouter. The release is a credibility and cost shock for decision-makers building AI infrastructure during tightening US access controls on top closed models.

Meituan has made LongCat-2.0 real in public, and the numbers attached to its prior performance are the part that should make every AI infrastructure lead sit up. The company released the 1.6-trillion-parameter Mixture-of-Experts (MoE) model with a native 1-million-token context window, and it now powers the anonymous “Owl Alpha” agent that spent about the last two months leading developer charts on OpenRouter.

That prior run was not subtle. During its unbranded residency on OpenRouter, Owl Alpha accounted for approximately 10.1 trillion monthly tokens, averaging 559 billion tokens per day, a 242% month-over-month explosion in volume that propelled it into the platform’s global top three. Once Meituan stepped forward to claim the architecture, the model also showed up at the top elsewhere: first on the Hermes Agent workspace, second on Claude Code deployments, and third across international OpenClaw environments. For operators, that combination of huge token volume plus cross-platform ranking is basically a stress test already passed.

Now add the part that changes how the market might allocate future compute: LongCat-2.0 was trained entirely on a cluster of over 50,000 domestic Chinese Application-Specific Integrated Circuits (ASICs). The source frames this as near-frontier scaling without the usual reliance on US Nvidia GPUs that have powered much of the global generative AI frontier model training effort. In practical terms, that is not just a technical brag. It is a signal about supply chain independence, iteration speed, and whether trillion-parameter progress can be repeated without waiting for the same bottlenecked silicon ecosystem.

Meituan’s release also comes with an unusual mix of openness and monetization. LongCat-2.0 lands under a highly permissive, enterprise-grade, commercially viable MIT license for the public weights, but it also introduces aggressive commercial access pricing. Context-cache hits are processed completely free of charge. For non-cache hits, there is a pay-as-you-go API priced at $0.75 per million input tokens and $2.95 per million output tokens. On top of that, a limited-time promotional “Token Pack” flash-sale paradigm cuts operational costs to $0.30 per million tokens for uncached input and $1.20 per million for output. The model’s token economics are set out in the source in a comparative table, where LongCat-2.0’s limited-time promo total of $1.50 per million tokens is positioned toward the cheaper end of top performing models globally.

Why does that matter right now? Because the release arrives while Washington pressures top-tier American labs to restrict access to their newest models. Following a US governmental request, OpenAI was forced to limit access to its new GPT-5.6 models, while Anthropic was previously ordered to restrict access to its latest Claude Fable 5 / Mythos 5 models, which it took entirely offline in response. The source argues that these defensive regulatory moves have backfired by locking down Western closed-source models and driving up API costs, which in turn creates a wide operational window for developers seeking affordable, high-performance alternatives, including Chinese open source offerings like Meituan LongCat-2.0.

So what exactly is LongCat-2.0, beyond the headline numbers? Under the hood, the model centers on an aggressive MoE sparsity approach that scales total parameters to 1.6 trillion while limiting active computation to an average of 48 billion parameters per token. Depending on query structure, dynamic activation ranges from 33 billion to 56 billion parameters. The source also describes a “Zero-Compute Experts” framework aimed at eliminating idle computational overhead that typically penalizes ultra-dense models.

To support the 1-million-token context window without quadratic attention becoming a hardware bottleneck, Meituan uses LongCat Sparse Attention (LSA), described as an evolutionary iteration of DeepSeek Sparse Attention. LSA targets the quadratic scoring costs and memory fragmentation that often plague fine-grained sparse attention through three orthogonal vectors. First is Streaming-aware Indexing (SI), which restructures token selection by combining hardware-aligned contiguous reads with dynamic random selection to improve effective bandwidth. Second is Cross-Layer Indexing (CLI), which amortizes an indexing pass across multiple consecutive layers because attention saliency stays stable across adjacent hidden layers, supported by cross-layer distillation. Third is Hierarchical Indexing (HI), a coarse-to-fine two-stage scoring layout that narrows candidates quickly with approximate block-level recall before fine-grained token selection.

The model also incorporates an N-gram Embedding module inherited from lighter model lines, adding 135 billion parameters into a 5-gram token combination framework. The source claims this expands the embedding space by roughly 100-fold, improving dense local token relationships and reducing memory I/O bottlenecks, which matters when you scale to very large contexts and batch throughput.

Finally, LongCat-2.0 is tuned for agentic engineering rather than just fluent conversation. In standardized assessments, it scores 59.5 on SWE-bench Pro, above GPT-5.5’s 58.6. It also posts 70.8 on Terminal-Bench 2.1, 77.3 on SWE-bench Multilingual, and 73.2 on FORTE. The source attributes these agent-specialized behaviors to a structural post-training layer called Multi-Teacher Optimization via Mixture of Specialized... and stops mid-phrase, but the core takeaway is consistent: this model is optimized for multi-step engineering tasks, tool integration, and automated repository manipulation.

For executives, the strategic stakes are straightforward. LongCat-2.0 combines four ingredients that tend to move markets: near-frontier scaling claims, a 1-million-token context promise, a pricing model designed to reduce marginal costs for repeated context through free cache hits, and a developer adoption footprint measured in trillions of tokens. If open, cheaper, locally trained agentic models keep rising while access to top closed models tightens, board-level AI infrastructure plans will have to treat “who controls compute supply” and “who controls marginal token cost” as the same problem.

Executive ActionsLocked