Han Zhong (钟涵)

Han Zhong
Ph.D. Student
Peking University
Email: hanzhong@stu.pku.edu.cn
Google Scholar / Twitter

About Me

I am a Ph.D. student at Peking University, where I am fortunate to be advised by Professor Liwei Wang. Before that, I obtained a bachelor's degree in Mathematics from University of Science and Technology of China (USTC). Additionally, I had the privilege of conducting research at the Hong Kong University of Science and Technology (HKUST) with Professor Tong Zhang, Microsoft Research Asia (MSRA) with Doctor Wei Chen, and Northwestern University with Professor Zhaoran Wang.

I work on machine learning. The primary goal of my research is to design provably efficient and practical machine learning algorithms, particularly in the context of interactive decision-making problems. To achieve this goal, my recent researches focus on reinforcement learning theory and its connections with operations research, statistics, and optimization. Currently, I am also interested in exploring the role of reinforcement learning in foundation models, particularly for aligning large language models with human preferences and enhancing their reasoning capabilities. If you share common interests and would like to explore collaboration or simply have a discussion, feel free to contact me.

Selected Publications

Theoretical Foundation of Interactive Decision Making: We propose a unified framework, GEC, to study the statistical complexity of interactive decision making. We reveal a potential representation complexity hierarchy among different reinforcement learning paradigms, including model-based RL, policy-based RL, and value-based RL. Beyond classical RL, we provide the first proof that quantum computing enables quadratic speedup in online exploration.

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond
Han Zhong*, Wei Xiong*, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang
Mathematics of Operations Research (MOR), 2025+

Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
(α-β order) Guhao Feng, Han Zhong
Conference on Neural Information Processing Systems (NeurIPS) 2024

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
Han Zhong*, Jiachen Hu*, Yecheng Xue, Tongyang Li, Liwei Wang
International Conference on Machine Learning (ICML) 2024

Reinforcement Learning from Human Feedback: We provide the first theoretical result for RLHF with function approximation. We also initialize the studies on RLHF with KL-constraint, under both sentence-wise bandit and token-wise MDP frameworks. Our theoretical insights lead to the development of iterative learning in RLHF and Reinforced Token Optimization (RTO).

DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong*, Zikang Shan*, Guhao Feng*, Wei Xiong*, Xinle Cheng, Li Zhao, Di He, Jiang Bian, Liwei Wang
International Conference on Machine Learning (ICML) 2025

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong*, Hanze Dong*, Chenlu Ye*, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang
International Conference on Machine Learning (ICML) 2024

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen*, Han Zhong*, Zhuoran Yang, Zhaoran Wang, Liwei Wang
International Conference on Machine Learning (ICML) 2022

Multi-Agent Reinforcement Learning: We develope the first line of efficient equilibrium-finding algorithms for offline Markov games and Stackelberg Markov games.

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan
Journal of Machine Learning Research (JMLR) 2023

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
Han Zhong*, Wei Xiong*, Jiyuan Tan*, Liwei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang
International Conference on Machine Learning (ICML) 2022

Robust Machine Learning: We provide a comprehensive study of distributionally robust RL, exploring its role in reducing sim-to-real gaps and investigating sample-efficient learning in online and offline settings.

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Miao Lu*, Han Zhong*, Tong Zhang, Jose Blanchet
Conference on Neural Information Processing Systems (NeurIPS) 2024

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
(α-β order) Jose Blanchet, Miao Lu, Tong Zhang, Han Zhong
Conference on Neural Information Processing Systems (NeurIPS) 2023

Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
Jiachen Hu*, Han Zhong*, Chi Jin, Liwei Wang
International Conference on Learning Representations (ICLR) 2023

Policy Optimization: We provide theoretical guarantees for policy optimization algorithms, especially optimistic proximal policy optimization (PPO).

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong, Tong Zhang
Conference on Neural Information Processing Systems (NeurIPS) 2023

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Tianhao Wu*, Yunchang Yang*, Han Zhong*, Liwei Wang, Simon S. Du, Jiantao Jiao
International Conference on Machine Learning (ICML) 2022