Yihang Yao yihangya[at]andrew.cmu.edu

Hi, welcome to my website! I am a 4th-year Ph.D. candidate in the Safe AI Lab at Carnegie Mellon University, advised by Prof. Ding Zhao. I received my Bachelor's degree from Shanghai Jiao Tong University in 2022 and spent a wonderful time as a visiting student in the Intelligent Control Lab at CMU, working with Prof. Changliu Liu. I previously interned at MIT-IBM Watson AI Lab, IBM Research and have also been working with Google DeepMind since 2023.

My research focuses on LLM Reasoning and RL. Specifically, I am interested in:

  • Scaling RL with high sampling efficiency;
  • Building agentic pipelines for real-world applications.

News

Selected Works
(* indicates equal contribution)
Tailored Primitive Initialization is the Secret Key to Reinforcement Learning

Yihang Yao, Guangtao Zeng, Raina Wu, Yang Zhang, Ding Zhao, Zhang-Wei Hong, Chuang Gan

2025

TL;DR: Tailor is a mid-training data pipeline that automatically discovers and curates diverse, high-quality reasoning primitives to warm-start LLMs, leading to more stable and sample-efficient RL with higher downstream performance.

Paper

Behavior Injection: Preparing Language Models for Reinforcement Learning

Zhepeng Cen*, Yihang Yao*, William Han, Zuxin Liu, Ding Zhao

NeurIPS 2025

TL;DR: This paper proposes behavior injection, a mid-training approach that augments demonstration data with exploratory and exploitative behaviors to improve RL data co-influence and sampling efficiency.

Paper / Website / Code

Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training

Yihang Yao, Zhepeng Cen, Miao Li, William Han, Yuyou Zhang, Emerson Liu, Zuxin Liu, Chuang Gan, Ding Zhao

ACL 2025 Findings

TL;DR: MEND enhances LLM robustness by improving query symmetry awareness, boosting reasoning performance through structured dataset curation.

Paper

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

Yihang Yao*, Zhepeng Cen*, Wenhao Ding, Haohong Lin, Shiqi Liu, Tingnan Zhang, Wenhao Yu, Ding Zhao

NeurIPS 2024

TL;DR: We investigate offline RL from a data-centric perspective and propose a diffusion model-based data generator to curate training datasets aligned with user preferences.

Paper / Website / Code

Feasibility Consistent Representation Learning for Safe Reinforcement Learning

Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao

ICML 2024

TL;DR: We introduce FCSRL, a framework that improves safety constraint estimation in RL through representation learning and self-supervised techniques.

Paper / Website / Code

Gradient Shaping for Multi-Constraint Safe Reinforcement Learning

Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao Yu, Ding Zhao

L4DC 2024

TL;DR: We introduce GradS, a gradient-based method for improving training efficiency in multi-constraint RL by manipulating gradients, optimizing both reward and constraint satisfaction.

Paper / Website / Code

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

Yihang Yao*, Zuxin Liu*, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao

NeurIPS 2023

TL;DR: We introduce CCPO, a framework for versatile/adaptive safe RL that enables efficient training and zero-shot adaptation to varying safety constraints.

Paper

CrashAgent: Crash Scenario Generation via Multi-modal Reasoning

Miao Li, Wenhao Ding, Haohong Lin, Yiqi Lyu, Yihang Yao, Yuyou Zhang, Ding Zhao

2025

TL;DR: CrashAgent, a multi-agent framework that leverages multi-modal large language models to convert real-world crash reports into diverse, executable simulation scenarios for training and evaluating autonomous driving in safety-critical situations.

Paper

Learning from Sparse Offline Datasets via Conservative Density Estimation

Zhepeng Cen, Zuxin Liu, Zitong Wang, Yihang Yao, Henry Lam, Ding Zhao

ICLR 2024

TL;DR: We introduce CDE, a DICE-based method, which addresses OOD errors in offline RL, achieving SOTA results on the D4RL benchmark, particularly in sparse reward and low-data scenarios.

Paper / Code

Datasets and Benchmarks for Offline Safe Reinforcement Learning

Zuxin Liu*, Zijian Guo*, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao

Journal of Data-centric Machine Learning Research (DMLR); RSS 2023 Safe Autonomy Workshop (Spotlight)

TL;DR: We present a comprehensive benchmarking suite for offline safe RL, featuring expertly crafted safe policies, diverse datasets, and baseline implementations across 38 tasks, designed to accelerate the development and evaluation of safe RL algorithms in both training and deployment phases.

Paper / Website / Code (OSRL) / Code (DSRL) / Code (FSRL)

Constrained Decision Transformer for Offline Safe Reinforcement Learning

Zuxin Liu*, Zijian Guo*, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao

ICML 2023

TL;DR: We propose CDT for offline safe RL, which leverages a multi-objective optimization approach to balance safety and task performance, achieving superior adaptability, robustness, and high-reward policies with zero-shot adaptation capabilities.

Paper / Code (DSRL)



Services
Talk