Publications

* denotes equal contribution and α-β order denotes alphabetical authorship ordering

Conference Publications

  • DPO Meets PPO: Reinforced Token Optimization for RLHF
    Han Zhong*, Zikang Shan*, Guhao Feng*, Wei Xiong*, Xinle Cheng, Li Zhao, Di He, Jiang Bian, Liwei Wang
    International Conference on Machine Learning (ICML) 2025
    ICML 2024 Workshop on Models of Human Feedback for AI Alignment

Journal Publications

Preprints