Yihang Yao yihangya[at]andrew.cmu.edu
Hi, welcome to my website! I am a 4th-year Ph.D. candidate in the Safe AI Lab at Carnegie Mellon University, advised by Prof. Ding Zhao. I received my Bachelor's degree from Shanghai Jiao Tong University in 2022 and spent a wonderful time as a visiting student in the Intelligent Control Lab at CMU, working with Prof. Changliu Liu. I previously interned at MIT-IBM Watson AI Lab, IBM Research and have also been working with Google DeepMind since 2023.My research focuses on reinforcement learning and large language models, with a current emphasis on training proactive agents for user-centric applications.
Yihang Yao, Zhepeng Cen, Haohong Lin, Shiqi Liu, Zuxin Liu, Jiacheng Zhu, Zhang-Wei Hong, Laixi Shi, Ding Zhao
ICLR 2026 LLA Workshop, Preprint
TL;DR: BAO is an agentic RL framework that advances the Pareto frontier between task performance and user engagement in proactive agent training.
Yihang Yao, Guangtao Zeng, Raina Wu, Yang Zhang, Ding Zhao, Zhang-Wei Hong, Chuang Gan
2025, Preprint
TL;DR: Tailor is a mid-training data pipeline that automatically discovers and curates diverse, high-quality reasoning primitives to warm-start LLMs, leading to more stable and sample-efficient RL with higher downstream performance.
Zhepeng Cen*, Yihang Yao*, William Han, Zuxin Liu, Ding Zhao
NeurIPS 2025
TL;DR: This paper proposes behavior injection, a mid-training approach that augments demonstration data with exploratory and exploitative behaviors to improve RL data co-influence and sampling efficiency.
Yihang Yao, Zhepeng Cen, Miao Li, William Han, Yuyou Zhang, Emerson Liu, Zuxin Liu, Chuang Gan, Ding Zhao
ACL 2025 Findings
TL;DR: MEND enhances LLM robustness by improving query symmetry awareness, boosting reasoning performance through structured dataset curation.
Yihang Yao*, Zhepeng Cen*, Wenhao Ding, Haohong Lin, Shiqi Liu, Tingnan Zhang, Wenhao Yu, Ding Zhao
NeurIPS 2024
TL;DR: We investigate offline RL from a data-centric perspective and propose a diffusion model-based data generator to curate training datasets aligned with user preferences.
Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao
ICML 2024
TL;DR: We introduce FCSRL, a framework that improves safety constraint estimation in RL through representation learning and self-supervised techniques.
Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao Yu, Ding Zhao
L4DC 2024
TL;DR: We introduce GradS, a gradient-based method for improving training efficiency in multi-constraint RL by manipulating gradients, optimizing both reward and constraint satisfaction.
Yihang Yao*, Zuxin Liu*, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao
NeurIPS 2023
TL;DR: We introduce CCPO, a framework for versatile/adaptive safe RL that enables efficient training and zero-shot adaptation to varying safety constraints.
Zuxin Liu*, Zijian Guo*, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao
ICML 2023
TL;DR: We propose CDT for offline safe RL, which leverages a multi-objective optimization approach to balance safety and task performance, achieving superior adaptability, robustness, and high-reward policies with zero-shot adaptation capabilities.