ExoBrain
GLM 5.2Open weightsSelf-hostingCoding

GLM 5.2 democratises coding power

Z.ai's GLM 5.2, an open-weight 744B model under an MIT licence, lands within a few points of Claude Opus on coding and runs on a high-memory workstation. With Fable blocked, self-hosting a near-frontier model becomes a continuity requirement.

Joel Miller

Joel Miller

2 min read
GLM 5.2 democratises coding power

This week's chart shows Z.ai's GLM-5.2 landing within a few points of Claude Opus 4.8 on long-horizon coding tasks. It is a 744-billion parameter mixture-of-experts model with a 1 million-token context window, and the industry response has been very positive. Many engineers now rate it as effectively on a par with Opus 4.7 and 4.8 and ahead of GPT-5.5.

GLM-5.2 ships under an MIT licence, so the weights can be downloaded, modified, and run inside an organisation's own boundary, with no provider able to revoke access. As Article 1 this week sets out, the US government's decision to disable and block Fable 5 has made that property concrete. When a frontier model can be switched off by a foreign government, the ability to host a near-equivalent model yourself stops being a preference and becomes a continuity requirement.

So how would you actually run it. The most accessible route uses Unsloth's dynamic quantisation, which shrinks the full model to 239GB at 2-bit, enough to fit on a 256GB unified-memory Mac Studio. That path suits one or two users and has been shown writing complete, working software on the first attempt. For a team, the economics change shape. A single EU-hosted three-GPU Blackwell box, around 540GB of VRAM, can serve roughly eight heavy agentic engineers at near-frontier capability for about £400 to £540 per engineer each month, running vLLM with an NVFP4 checkpoint.

The constraint has not disappeared. You still need fast memory measured in hundreds of gigabytes, the cheapest quantisations trade away accuracy on harder work, and a self-hosted model running under near-100% duty cycle needs real operational care. This is a workstation and server story, not a laptop one. But the option now exists where it did not before.

Takeaways: the open-weight question has shifted from capability to control. GLM-5.2 shows a frontier-class model can be self-hosted today, under a licence no government can revoke, at a cost that competes with metered seats once usage is heavy. With Fable 5 blocked, that is no longer theoretical. The practical step for any organisation handling sensitive work is to pilot a self-hosted model now, learn what local inference really costs in memory and latency, and stop assuming that access to frontier capability is something only a vendor can grant.

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.

Follow us on LinkedIn