Han Zhong (钟涵)
About MeI am a Ph.D. student at Peking University, where I am fortunate to be advised by Professor Liwei Wang. Before that, I obtained a bachelor's degree in Mathematics from University of Science and Technology of China (USTC). Additionally, I had the privilege of conducting research at both the Hong Kong University of Science and Technology (HKUST), where I collaborated with Professor Tong Zhang, and at Microsoft Research Asia (MSRA), where I had the opportunity to work with Doctor Wei Chen. This fall, I am a visiting student at Northwestern University, hosted by Professor Zhaoran Wang. I work on machine learning. The primary goal of my research is to design provably efficient and practical machine learning algorithms, particularly in the context of interactive decision-making problems. To achieve this goal, my recent researches focus on reinforcement learning theory. Currently, I am also interested in exploring the role of reinforcement learning in foundation models (such as aligning large language models with RLHF). If you share common interests and would like to explore collaboration or simply have a discussion, feel free to contact me. Selected PublicationsTheoretical Foundation of Interactive Decision Making: We propose a unified framework, GEC, to study the statistical complexity of interactive decision making. We also reveal a potential representation complexity hierarchy among different reinforcement learning paradigms, including model-based RL, policy-based RL, and value-based RL.
Reinforcement Learning from Human Feedback: We provide the first theoretical result for RLHF with function approximation. We also initialize the studies on RLHF with KL-constraint, under both sentence-wise bandit and token-wise MDP frameworks. Our theoretical insights lead to the development of iterative learning in RLHF and Reinforced Token Optimization (RTO).
Multi-Agent Reinforcement Learning: We develope the first line of efficient equilibrium-finding algorithms for offline Markov games and Stackelberg Markov games.
Robust Machine Learning: We provide a comprehensive study of distributionally robust RL, exploring its role in reducing sim-to-real gaps and investigating sample-efficient learning in online and offline settings.
Policy Optimization: We provide theoretical guarantees for policy optimization algorithms, especially optimistic proximal policy optimization (PPO).
|