Han Zhong (钟涵)

alt text 



Han Zhong
Ph.D. Student
Peking University
Email: hanzhong@stu.pku.edu.cn
Google Scholar / Twitter / WeChat

About Me

I am a Ph.D. student at Peking University, where I am fortunate to be advised by Professor Liwei Wang. Before that, I obtained a bachelor's degree in Mathematics from University of Science and Technology of China (USTC). Additionally, I had the privilege of conducting research at the Hong Kong University of Science and Technology (HKUST) with Professor Tong Zhang, Microsoft Research Asia (MSRA) with Doctor Wei Chen, and Northwestern University with Professor Zhaoran Wang.

I work on machine learning. The primary goal of my research is to design provably efficient and practical machine learning algorithms, particularly in the context of interactive decision-making problems. To achieve this goal, my recent researches focus on reinforcement learning theory and its connections with operations research, statistics, and optimization. Currently, I am also interested in exploring the role of reinforcement learning in foundation models, particularly for aligning large language models with human preferences and enhancing their reasoning capabilities. If you share common interests and would like to explore collaboration or simply have a discussion, feel free to contact me.

Selected Publications

Theoretical Foundation of Interactive Decision Making: We propose a unified framework, GEC, to study the statistical complexity of interactive decision making. We reveal a potential representation complexity hierarchy among different reinforcement learning paradigms, including model-based RL, policy-based RL, and value-based RL. Beyond classical RL, we provide the first proof that quantum computing enables quadratic speedup in online exploration.

Reinforcement Learning from Human Feedback: We provide the first theoretical result for RLHF with function approximation. We also initialize the studies on RLHF with KL-constraint, under both sentence-wise bandit and token-wise MDP frameworks. Our theoretical insights lead to the development of iterative learning in RLHF and Reinforced Token Optimization (RTO).

Multi-Agent Reinforcement Learning: We develope the first line of efficient equilibrium-finding algorithms for offline Markov games and Stackelberg Markov games.

Robust Machine Learning: We provide a comprehensive study of distributionally robust RL, exploring its role in reducing sim-to-real gaps and investigating sample-efficient learning in online and offline settings.

Policy Optimization: We provide theoretical guarantees for policy optimization algorithms, especially optimistic proximal policy optimization (PPO).