北京大学信息科学技术学院教授。主要从事机器学习理论研究。在机器学习国际权威期刊会议发表高水平论文百余篇。担任国际机器学习旗舰会议NIPS, IJCAI领域主席。2011年入选由人工智能国际期刊IEEE Intelligence Systems评选的AI's 10 to Watch，是该奖项自设立以来首位获此荣誉的中国学者。2012年获得首届国家自然科学基金优秀青年基金。
报告摘要：Deep learning has achieved great success in many applications. However, deep learning is a mystery from a learning theory point of view. In all typical deep learning tasks, the number of free parameters of the networks is at least an order of magnitude larger than the number of training data. This rules out the possibility of using any model complexity-based learning theory (VC dimension, Rademacher complexity etc.) to explain the good generalization ability of deep learning. Indeed, the best paper of ICLR 2017 “Understanding Deep Learning Requires Rethinking Generalization” conducted a series of carefully designed experiments and concluded that all previously well-known learning theories fail to explain the phenomenon of deep learning.
In this talk, I will give two theories characterizing the generalization ability of Stochastic Gradient Langevin Dynamics (SGLD), a variant of the commonly used Stochastic Gradient Decent (SGD) algorithm in deep learning. Building upon tools from stochastic differential equation and partial differential equation, I show that SGLD has strong generalization power. The theory also explains several phenomena observed in deep learning experiments.