Publications
* denotes equal contribution and α-β order denotes alphabetical authorship ordering
Conference Publications
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong*, Zikang Shan*, Guhao Feng*, Wei Xiong*, Xinle Cheng, Li Zhao, Di He, Jiang Bian, Liwei Wang
International Conference on Machine Learning (ICML) 2025
ICML 2024 Workshop on Models of Human Feedback for AI Alignment
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang
International Conference on Machine Learning (ICML) 2025
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Zhihan Liu*, Miao Lu*, Wei Xiong*, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
Conference on Neural Information Processing Systems (NeurIPS) 2023
Journal Publications
Preprints
|