GLM-5.2 crushes GPT-5.5 on long-horizon coding, costs 1/6th on API
Z.ai’s 753B open-weights model lands on Hugging Face with MIT-licensed core weights, lower compute bills, and benchmark wins.

Chinese AI startup Z.ai (formerly Zhipu AI) just released GLM-5.2, a 753-billion-parameter open-weights LLM aimed at long-horizon autonomous coding. It outperforms or nears GPT-5.5 across multiple long-task benchmarks while Z.ai positions its API pricing as about 1/6th the cost for similar workloads.
Z.ai just dropped GLM-5.2, a 753-billion-parameter open-weights model built for long-horizon autonomous coding. And on the exact stuff that matters when software takes hours, not minutes, GLM-5.2 is landing ahead of GPT-5.5 on multiple third-party benchmarks. In SWE-bench Pro it scored 62.1 versus GPT-5.5’s 58.6. In FrontierSWE (Dominance) it hit 74.4% versus GPT-5.5’s 72.6%. And in PostTrainBench it posted 34.3% versus GPT-5.5’s 25.0%. For teams paying by the token, the pitch is simple: better long-run coding results with a lower bill.
The other part of the hook is pricing. Z.ai’s API is $1.40 per million input tokens and $4.40 per million output tokens. The long-context cost angle gets even sharper with a cached input rate of just $0.26 per million tokens (plus a limited-time offer for free cached input storage). In VentureBeat’s pricing snapshot, GLM-5.2 totals $5.80 per 1 million tokens (input + output), while GPT-5.5 totals $35.00. That’s the “about 1/6th” framing: same decade, very different math.
But the bigger business story is not just the leaderboard. Z.ai is releasing the core weights under an unrestricted MIT open-source license, available immediately on Hugging Face, the Z.ai API, and more than 20 third-party coding environments. For enterprise technical decision-makers, that means enterprises can download the model freely, customize or fine-tune it, and run it locally or via virtual machines, paying only for compute and electricity. In other words, you can reduce dependency on any single vendor’s API usage, outage risk, or pricing escalation.
This matters more right now because the source frames a regulatory and export-control headwind for certain Western frontier models. It references the Trump Administration’s export control directive from last week prohibiting foreign nationals from using Anthropic’s new Claude Fable 5 model, and notes that Anthropic responded by taking the models in question entirely offline for all users. Whether you’re a CTO, a VP of engineering, or procurement, this is the kind of uncertainty that leads to contingency plans: “Can we keep shipping if a model gets geographically fenced or policy-limited?” Open-weights are the most direct answer.
Under the hood, GLM-5.2 is built to attack the specific compute pain of long documents. It introduces “IndexShare,” described as reusing the same indexer across every four sparse attention layers. At the maximum 1-million-token context length, that single approach is said to reduce per-token compute FLOPs by 2.9 times. It also upgrades Multi-Token Prediction (MTP) for speculative decoding, claiming up to a 20% increase in accepted token length during inference. Then there are selectable “Thinking Modes,” with “Max” pushing logic and “High” balancing performance with latency and token efficiency.
Those “Thinking Modes” are where cost control turns from theory into something you can operate. Under “Max,” GLM-5.2 uses nearly 85k output tokens per task. Switching to “High” is described as sacrificing only a few points in performance while effectively halving required token output. For workloads like agentic tool use and long-horizon engineering, that trade can be the difference between a model that’s impressive in a demo and one that stays viable in production.
On performance, GLM-5.2 is positioned as strong even against closed models in tool-heavy and long-running tasks. On MCP-Atlas (tool usage) it scored 77.0 versus GPT-5.5’s 75.3, and it’s close to Claude Opus 4.8 at 77.8. On Humanity’s Last Exam (w/ Tools) it reached 54.7 ahead of GPT-5.5’s 52.2, and within range of Claude Opus 4.8 at 57.9. It also tops extended multi-hour engineering workloads: PostTrainBench (34.3% vs 25.0%) and SWE-Marathon (13.0% vs 12.0%). It does trail Claude Opus 4.8 and slightly trail GPT-5.5 on raw Terminal-Bench 2.1 scores, with 81.0 versus 85.0 and 84.0 respectively, while still outperforming Google’s Gemini 3.1 Pro at 74.0.
Z.ai also launched the GLM Coding Plan to operationalize the model for developer workflows, not just chat. It offers out-of-the-box support for third-party U.S. and global agentic coding harnesses and tools including Claude Code, OpenClaw, Cline, Kilo Code, Crush, and Factory. Pricing tiers (billed annually) are Lite at $12.60 per month ($151.20 per year starting in the 2nd year), Pro at $50.40 per month with 5x the Lite usage allowance, and Max at $112.00 per month with 20x the Lite usage and dedicated resources during peak hours. For enterprises integrating the raw model into their own apps, Z.ai’s API pricing is positioned as mid-priced globally and undercutting Western rivals while matching the exact rates of the previous GLM-5.1 generation.
The second-order implication for executives is brutal in a useful way: the market’s “frontier” advantage is increasingly not just about raw capability, it’s about cost per successful outcome over long horizons, plus the ability to keep systems running if policy, licensing, or vendors shift. If GLM-5.2’s blend of 1-million-token context, IndexShare compute efficiency, and MIT-licensed weights scales in real engineering workflows, it gives boards and CFOs a clearer story: you can pursue frontier-class coding without locking the entire company into one closed pipeline.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Technology

U.S. cut off Anthropic’s Mythos access Friday, forcing Europe to accelerate “sovereign AI.”
The kill switch reality is colliding with Europe’s cloud, compute, and regulatory bottlenecks.

Databricks’ 80%+ growth comes with margin shrink as AI agents raise costs
Databricks is selling more as AI agents accelerate analysis, but the cost of all that “help” is biting margins.

AMD buys Mext to tame its AI-driven RAM crunch with flash memory tiering
The $2 to $4x flash expansion claim turns memory tiering into an AI-powered “cold storage” play for enterprise workloads.
