Haoyang Zou

THINK, ACT, EVOLVE

Haoyang Zou 邹皓阳

Ph.D. student at Shanghai Jiao Tong University.

I study how strong reasoning becomes reliable agent behavior, and I like building the full loop from training to real-world systems.

About Me

I work on models that can think, act, and hold up once they leave controlled environments and enter real work.

I am a first-year Ph.D. student at Shanghai Jiao Tong University, advised by Prof. Pengfei Liu, and a research intern at ByteDance Seed, where I work closely with Yujia Qin. My research centers on reasoning, reinforcement learning, and agents, with a focus on training foundation models for capable and reliable agent behavior. Before that, I received my B.S. in Artificial Intelligence from Fudan University.

Research Interests

  • Reasoning and Reinforcement Learning for Agents Studying how agents can plan, adapt, and remain reliable in complex environments.
  • Coding Agents in Practice Designing coding agents that optimize for real-world usefulness rather than benchmark performance.
  • Self-Improving Agent Systems Exploring systems whose capability grows through use and interaction.

News

Selected Work

Seed 2.0
ByteDance Seed Team (Core Contributor)
Product Launch, Feb 2026
As a core contributor, I am responsible for the coding agent ability of Seed 2.0, a multimodal foundation model series built for strong agentic performance.
Seed 1.8
ByteDance Seed Team (Core Contributor)
Product Launch, Dec 2025
As a core contributor, I am responsible for the coding agent ability of Seed 1.8, a generalized agentic model unifying search, code, and GUI abilities in one system.
UI-TARS-2: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Haoming Wang, Haoyang Zou, Huatong Song, Jiazhan Feng, Junjie Fang, et al.(alphabetical order)
Tech Report, Sep 2025
A GUI-centered agent trained with multi-turn RL and a data flywheel, reaching SOTA on OSWorld, Mind2Web, and AndroidWorld, and supporting the Doubao Mobile Assistant.
ToRL: Scaling Tool-Integrated RL
Xuefeng Li*, Haoyang Zou*, Pengfei Liu
Preprint, Mar 2025
A tool-integrated RL framework that lets base models acquire tool use through training rather than prompting alone.
LIMR: Less is More for RL Scaling
Xuefeng Li*, Haoyang Zou*, Pengfei Liu
Preprint, Feb 2025
A data selection method showing that better-chosen trajectories can match larger RL runs at a fraction of the scale.
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
Zhen Huang*, Haoyang Zou*, Xuefeng Li*, Yixiu Liu*, Yuxiang Zheng*, Ethan Chern*, Shijie Xia*, Yiwei Qin, Weizhe Yuan, Pengfei Liu
Preprint, Nov 2024
A distillation study showing that simple pipelines can push open models past o1-preview on targeted reasoning evaluations.
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin*, Xuefeng Li*, Haoyang Zou*, Yixiu Liu*, Shijie Xia*, Zhen Huang, Yixin Ye, Weizhe Yuan, Hector Liu, Yuanzhi Li, Pengfei Liu
Preprint, Oct 2024
An early progress report documenting what did and did not work while replicating o1-style reasoning behavior.
Generative AI for Math: Abel
Ethan Chern*, Haoyang Zou*, Xuefeng Li*, Jiewen Hu*, Kehua Feng, Junlong Li, Pengfei Liu
Project 2024
A mathematical reasoning model that reached state-of-the-art results on GSM8K and MATH at the time.
Generative AI Act II: Test Time Scaling Drives Cognition Engineering
Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu
Preprint, Apr 2025
A survey framing test-time scaling as an emerging engineering discipline for reasoning systems.
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning
Zhongzhen Huang*, Gui Geng*, Shengyi Hua*, Zhen Huang*, Haoyang Zou*, Shaoting Zhang, Pengfei Liu, Xiaofan Zhang
Preprint, Jan 2025
An inference-time scaling study for medical reasoning, focused on where extra computation actually improves clinical problem solving.
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Yanheng He*, Jiahe Jin*, Shijie Xia, Jiadi Su, Runze Fan, Haoyang Zou, Xiangkun Hu, Pengfei Liu
Preprint, Dec 2024
A GUI agent project inspired by human cognitive workflows for completing computer tasks end to end.
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, ... , Pengfei Liu
NeurIPS 2024 Datasets & Benchmarks
A multimodal benchmark designed to test cross-discipline reasoning at olympiad difficulty.
Reformatted Alignment
Run-Ze Fan, Xuefeng Li, Haoyang Zou, Junlong Li, Shwai He, Ethan Chern, Jiewen Hu, Pengfei Liu
Preprint, Feb 2024
A data-centric alignment method that improves model behavior by reformatting responses rather than only scaling data or compute.

Experiences