BenchmarkJune 17, 20263 min read

Zhipu AI's GLM-5.2 Closes Gap with Industry Leaders in Coding Marathon Benchmarks

Zhipu AI's latest model, GLM-5.2, has achieved impressive results in coding marathon benchmarks, trailing industry leaders by just a few percentage points. This milestone marks a significant improvement over its predecessor, GLM-5.1, and solidifies GLM-5.2's position as a top open-source model.

The latest release from Zhipu AI, GLM-5.2, has made significant strides in coding marathon benchmarks, demonstrating its capability to handle complex, hours-long coding tasks. With a stable 1-million-token context, GLM-5.2 has narrowed the gap with closed-source leaders, including Anthropic's Opus models, to just a few percentage points. On the FrontierSWE benchmark, which evaluates open engineering projects, GLM-5.2 scored 74.4 percent, a mere one point behind Anthropic's Claude Opus 4.8 and slightly ahead of OpenAI's GPT-5.5.

In comparison to its predecessor, GLM-5.1, GLM-5.2 has shown remarkable improvement, particularly on standard coding tasks. On Terminal-Bench 2.1, GLM-5.2 achieved a score of 81, up from 63.5 for GLM-5.1, placing it within striking distance of Claude Opus 4.8. Similarly, on SWE-bench Pro, the score increased from 58.4 to 62.1. This significant jump in performance is a testament to the advancements made in GLM-5.2, making it an attractive option for developers and businesses seeking a reliable open-source model.

One of the key features of GLM-5.2 is its ability to adjust the thinking effort, allowing users to dial up or down the model's computational resources. At a similar token budget, GLM-5.2 delivers much stronger coding results than GLM-5.1, with the highest setting, Max, providing extra compute for the most challenging problems. However, it's worth noting that the Max setting comes at a significant cost, with barely any extra points gained for the increased token expenditure. The High effort level, on the other hand, extracts nearly full performance, making it a more practical choice for most users.

While GLM-5.2 has made significant strides in coding marathon benchmarks, it still lags behind closed-source rivals in terms of reasoning capabilities. On Humanity's Last Exam, GLM-5.2 fell behind Claude Opus 4.8 and Gemini 3.1 Pro, highlighting the ongoing challenges faced by open-source models in matching the performance of their closed-source counterparts. Nevertheless, GLM-5.2 remains the strongest open-source model, offering a viable alternative for developers and businesses seeking a cost-effective solution for their coding needs.

The release of GLM-5.2 marks a significant milestone in the development of open-source AI models, demonstrating the rapid progress being made in this field. As the gap between open-source and closed-source models continues to narrow, users can expect to see more capable and affordable solutions for their coding needs. For developers and businesses, this means access to powerful AI models that can accelerate their coding tasks, improve productivity, and reduce costs. As the AI landscape continues to evolve, the importance of open-source models like GLM-5.2 will only continue to grow, providing a vital alternative to closed-source solutions and driving innovation in the field.

Models Mentioned

Anthropic: Claude Opus 4.8 (Fast)

Browse Models Compare All News

Zhipu AI's GLM-5.2 Closes Gap with Industry Leaders in Coding Marathon Benchmarks

Models Mentioned

Language Models' Uniformity Betrays Their Artificial Nature

Explore