Ziwei Chai

I worked at Zhipu AI (Feb. 2025 - Feb. 2026), where I co-led the pre-training data pipeline specifically for high-quality math and science data. My work involved end-to-end data processing—including cleaning, parsing, synthesis, and evaluation—for the GLM model family (spanning GLM-4.5 through GLM-5). I also led the On-Policy Cross-Stage Distillation of GLM-5, where the GLM series achieved open-source SOTA performance.

Before joining Zhipu AI, I was a Research Intern at ByteDance Seed (Nov. 2023 - Feb. 2025) working on multi-agent LLM systems. I am currently a final-year Ph.D. student at Zhejiang University advised by Prof. Yang Yang, where I also received my B.E. degree in Software Engineering in 2021.

~~I am currently open to industrial opportunities in LLM teams. If you have suitable positions, please feel free to contact me via email or WeChat.~~

"...Die Bedeutung eines Wortes ist sein Gebrauch in der Sprache."

"...The meaning of a word is its use in the language."

— Ludwig Wittgenstein, Philosophische Untersuchungen, §43

News

Feb 2026	👋 I have left Zhipu AI after one amazing year. Huge thanks to the GLM O team — it was a privilege building something great together. Wishing everyone the very best ahead!
Feb 2026	🚀 GLM-5 is released! Blog Technical Report
Dec 2025	🚀 GLM-4.7 is released! Blog
Sep 2025	🚀 GLM-4.6 is released! Blog
Jul 2025	🚀 GLM-4.5 is released! Blog Technical Report
Feb 2025	💼 Joined Zhipu AI (GLM Team) as a Technical Intern
May 2024	🎉 Two papers accepted: ETR (Multi-Agent Collaboration) to ACL 2024 and InfiAgent (Agent Evaluation) to ICML 2024.
Jan 2024	🚀 Released InfiAgent-DABench, a comprehensive benchmark for Agentic Data Analysis.
Nov 2023	💼 Joined Bytedance Seed as a Research Intern

Selected Publications

Full Paper List | Citation: 820

Technical Report

GLM-5: from Vibe Coding to Agentic Engineering

Contributor to the GLM Team

arXiv preprint arXiv: 2602.15763, 2026

Abs HTML PDF

We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. To advance model alignment and autonomy, we implement a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Furthermore, we propose novel asynchronous agent RL algorithms that further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. Through these innovations, GLM-5 achieves state-of-the-art performance on major open benchmarks. Most critically, GLM-5 demonstrates unprecedented capability in real-world coding tasks, surpassing previous baselines in handling end-to-end software engineering challenges. Code, models, and more information are available at https://github.com/zai-org/GLM-5
Technical Report

Glm-4.5: Agentic, reasoning, and coding (arc) foundation models

Contributor to the GLM Team

arXiv preprint arXiv:2508.06471, 2025

Abs HTML PDF

We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at https://github.com/zai-org/GLM-4.5.
NeurIPS

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

Hanchen Su, Xuyuan Li, Yan Zhou, Ziwei Chai, Haozheng Wang, Chen Zhang, and 1 more author

In Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

Abs HTML PDF

The increasing utilization of graph databases across various fields stems from their capacity to represent intricate interconnections. Nonetheless, exploiting the full capabilities of graph databases continues to be a significant hurdle, largely because of the inherent difficulty in translating natural language into Cypher. Recognizing the critical role of schema selection in database query generation and drawing inspiration from recent progress in reasoning-augmented approaches trained through reinforcement learning to enhance inference capabilities and generalization, we introduce Cypher-RI, a specialized framework for the Text-to-Cypher task. Distinct from conventional approaches, our methodology seamlessly integrates schema selection within the Cypher generation pipeline, conceptualizing it as a critical element in the reasoning process. The schema selection mechanism is guided by textual context, with its outcomes recursively shaping subsequent inference processes. Impressively, our 7B-parameter model, trained through this RL paradigm, demonstrates superior performance compared to baselines, exhibiting a 9.41% accuracy improvement over GPT-4o on CypherBench. These results underscore the effectiveness of our proposed reinforcement learning framework, which integrates schema selection to enhance both the accuracy and reasoning capabilities in Text-to-Cypher tasks.
ACL

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, and 5 more authors

In Annual Meeting of the Association for Computational Linguistics, 2024

Abs HTML PDF

We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user’s perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.
WWW

Can GNN be Good Adapter for LLMs?

Xuanwen Huang, Kaiqiao Han, Yang Yang, Dezheng Bao, Quanjin Tao, Ziwei Chai, and 1 more author

In Proceedings of the ACM Web Conference, 2024

Abs HTML PDF

Recently, large language models (LLMs) have demonstrated superior capabilities in understanding and zero-shot learning on textual data, promising significant advances for many text-related domains. In the graph domain, various real-world scenarios also involve textual data, where tasks and node features can be described by text. These text-attributed graphs (TAGs) have broad applications in social media, recommendation systems, etc. Thus, this paper explores how to utilize LLMs to model TAGs. Previous methods for TAG modeling are based on million-scale LMs. When scaled up to billion-scale LLMs, they face huge challenges in computational costs. Additionally, they also ignore the zero-shot inference capabilities of LLMs. Therefore, we propose GraphAdapter, which uses a graph neural network (GNN) as an efficient adapter in collaboration with LLMs to tackle TAGs. In terms of efficiency, the GNN adapter introduces only a few trainable parameters and can be trained with low computation costs. The entire framework is trained using auto-regression on node text (next token prediction). Once trained, GraphAdapter can be seamlessly fine-tuned with task-specific prompts for various downstream tasks. Through extensive experiments across multiple real-world TAGs, GraphAdapter based on Llama 2 gains an average improvement of approximately 5% in terms of node classification. Furthermore, GraphAdapter can also adapt to other language models, including RoBERTa, GPT-2. The promising results demonstrate that GNNs can serve as effective adapters for LLMs in TAG modeling.
ICML

Infiagent-dabench: Evaluating agents on data analysis tasks

Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai, Guoyin Wang, Xuwu Wang, and 9 more authors

In International Conference on Machine Learning, 2024

Abs HTML PDF

In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution environment. This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files, and an agent framework which incorporates LLMs to serve as data analysis agents for both serving and evaluation. Since data analysis questions are often open-ended and hard to evaluate without human supervision, we adopt a format-prompting technique to convert each question into a closed-form format so that they can be automatically evaluated. Our extensive benchmarking of 34 LLMs uncovers the current challenges encountered in data analysis tasks. In addition, building on top of our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench. Evaluation datasets and toolkits for InfiAgent-DABench are released at https://github.com/InfiAgent/InfiAgent .
TBD

Graphllm: Boosting graph reasoning ability of large language model

Ziwei Chai, Tianjie Zhang, Liang Wu, Kaiqiao Han, Xiaohai Hu, Xuanwen Huang, and 1 more author

IEEE Transactions on Big Data, 2025

Abs HTML PDF

The advancement of Large Language Models (LLMs) has remarkably pushed the boundaries towards artificial general intelligence (AGI), with their exceptional ability on understanding diverse types of information, including but not limited to images and audio. Despite this progress, a critical gap remains in empowering LLMs to proficiently understand and reason on graph data. Recent studies underscore LLMs’ underwhelming performance on fundamental graph reasoning tasks. In this paper, we endeavor to unearth the obstacles that impede LLMs in graph reasoning, pinpointing the common practice of converting graphs into natural language descriptions (Graph2Text) as a fundamental bottleneck. To overcome this impediment, we introduce GraphLLM, a pioneering end-to-end approach that synergistically integrates graph learning models with LLMs. This synergy equips LLMs with the ability to proficiently interpret and reason on graph data, harnessing the superior expressive power of graph learning models. Our empirical evaluations across four fundamental graph reasoning tasks validate the effectiveness of GraphLLM. The results exhibit a substantial average accuracy enhancement of 54.44%, alongside a noteworthy context reduction of 96.45% across various graph reasoning tasks.
AAAI

Towards Learning to Discover Money Laundering Sub-network in Massive Transaction Network

Ziwei Chai, Yang Yang, Jiawang Dan, Sheng Tian, Changhua Meng, Weiqiang Wang, and 1 more author

In Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Abs HTML PDF

Anti-money laundering (AML) systems play a critical role in safeguarding global economy. As money laundering is considered as one of the top group crimes, there is a crucial need to discover money laundering sub-network behind a particular money laundering transaction for a robust AML system. However, existing rule-based methods for money laundering sub-network discovery is heavily based on domain knowledge and may lag behind the modus operandi of launderers. Therefore, in this work, we first address the money laundering sub-network discovery problem with a neural network based approach, and propose an AML framework AMAP equipped with an adaptive sub-network proposer. In particular, we design an adaptive sub-network proposer guided by a supervised contrastive loss to discriminate money laundering transactions from massive benign transactions. We conduct extensive experiments on real-word datasets in AliPay of Ant Group. The result demonstrates the effectiveness of our AMAP in both money laundering transaction detection and money laundering sub-network discovering. The learned framework which yields money laundering sub-network from massive transaction network leads to a more comprehensive risk coverage and a deeper insight to money laundering strategies.
IJCAI

Can abnormality be detected by graph neural networks?

Ziwei Chai, Siqi You, Yang Yang, Shiliang Pu, Jiarong Xu, Haoyang Cai, and 1 more author

In International Joint Conferences on Artificial Intelligence, 2022

Abs HTML PDF

Anomaly detection in graphs has attracted considerable interests in both academia and industry due to its wide applications in numerous domains ranging from finance to biology. Meanwhile, graph neural networks (GNNs) is emerging as a powerful tool for modeling graph data. A natural and fundamental question that arises here is: can abnormality be detected by graph neural networks? In this paper, we aim to answer this question, which is nontrivial. As many existing works have explored, graph neural networks can be seen as filters for graph signals, with the favor of low frequency in graphs. In other words, GNN will smooth the signals of adjacent nodes. However, abnormality in a graph intuitively has the characteristic that it tends to be dissimilar to its neighbors, which are mostly normal samples. It thereby conflicts with the general assumption with traditional GNNs. To solve this, we propose a novel Adaptive Multi-frequency Graph Neural Network (AMNet), aiming to capture both low-frequency and high-frequency signals, and adaptively combine signals of different frequencies. Experimental results on real-world datasets demonstrate that our model achieves a significant improvement comparing with several state-of-the-art baseline methods.