Yihang Yao yihangya[at]andrew.cmu.edu
Hi, welcome to my website! I am a 4th-year Ph.D. candidate in the Safe AI Lab at Carnegie Mellon University, advised by Prof. Ding Zhao. I received my Bachelor's degree from Shanghai Jiao Tong University in 2022 and spent a wonderful time as a visiting student in the Intelligent Control Lab at CMU, working with Prof. Changliu Liu. I previously interned at MIT-IBM Watson AI Lab, IBM Research and have also been working with Google DeepMind since 2023.My research focuses on LLM Reasoning and RL. Specifically, I am interested in:
Yihang Yao, Guangtao Zeng, Raina Wu, Yang Zhang, Ding Zhao, Zhang-Wei Hong, Chuang Gan
2025
TL;DR: Tailor is a mid-training data pipeline that automatically discovers and curates diverse, high-quality reasoning primitives to warm-start LLMs, leading to more stable and sample-efficient RL with higher downstream performance.
Zhepeng Cen*, Yihang Yao*, William Han, Zuxin Liu, Ding Zhao
NeurIPS 2025
TL;DR: This paper proposes behavior injection, a mid-training approach that augments demonstration data with exploratory and exploitative behaviors to improve RL data co-influence and sampling efficiency.
Yihang Yao, Zhepeng Cen, Miao Li, William Han, Yuyou Zhang, Emerson Liu, Zuxin Liu, Chuang Gan, Ding Zhao
ACL 2025 Findings
TL;DR: MEND enhances LLM robustness by improving query symmetry awareness, boosting reasoning performance through structured dataset curation.
Yihang Yao*, Zhepeng Cen*, Wenhao Ding, Haohong Lin, Shiqi Liu, Tingnan Zhang, Wenhao Yu, Ding Zhao
NeurIPS 2024
TL;DR: We investigate offline RL from a data-centric perspective and propose a diffusion model-based data generator to curate training datasets aligned with user preferences.
Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao
ICML 2024
TL;DR: We introduce FCSRL, a framework that improves safety constraint estimation in RL through representation learning and self-supervised techniques.
Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao Yu, Ding Zhao
L4DC 2024
TL;DR: We introduce GradS, a gradient-based method for improving training efficiency in multi-constraint RL by manipulating gradients, optimizing both reward and constraint satisfaction.
Yihang Yao*, Zuxin Liu*, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao
NeurIPS 2023
TL;DR: We introduce CCPO, a framework for versatile/adaptive safe RL that enables efficient training and zero-shot adaptation to varying safety constraints.
Miao Li, Wenhao Ding, Haohong Lin, Yiqi Lyu, Yihang Yao, Yuyou Zhang, Ding Zhao
2025
TL;DR: CrashAgent, a multi-agent framework that leverages multi-modal large language models to convert real-world crash reports into diverse, executable simulation scenarios for training and evaluating autonomous driving in safety-critical situations.
Zhepeng Cen, Zuxin Liu, Zitong Wang, Yihang Yao, Henry Lam, Ding Zhao
ICLR 2024
TL;DR: We introduce CDE, a DICE-based method, which addresses OOD errors in offline RL, achieving SOTA results on the D4RL benchmark, particularly in sparse reward and low-data scenarios.
Zuxin Liu*, Zijian Guo*, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
Journal of Data-centric Machine Learning Research (DMLR); RSS 2023 Safe Autonomy Workshop (Spotlight)
TL;DR: We present a comprehensive benchmarking suite for offline safe RL, featuring expertly crafted safe policies, diverse datasets, and baseline implementations across 38 tasks, designed to accelerate the development and evaluation of safe RL algorithms in both training and deployment phases.
Paper / Website / Code (OSRL) / Code (DSRL) / Code (FSRL)
Zuxin Liu*, Zijian Guo*, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao
ICML 2023
TL;DR: We propose CDT for offline safe RL, which leverages a multi-objective optimization approach to balance safety and task performance, achieving superior adaptability, robustness, and high-reward policies with zero-shot adaptation capabilities.