报告题目：Towards Understanding Deep Learning: Two Theories of Stochastic Gradient Langevin Dynamics
报告摘要：Deep learning has achieved great success in many applications. However, deep learning is a mystery from a learning theory point of view. In all typical deep learning tasks, the number of free parameters of the networks is at least an order of magnitude larger than the number of training data. This rules out the possibility of using any model complexity-based learning theory (VC dimension, Rademacher complexity etc.) to explain the good generalization ability of deep learning. Indeed, the best paper of ICLR 2017 “Understanding Deep Learning Requires Rethinking Generalization” conducted a series of carefully designed experiments and concluded that all previously well-known learning theories fail to explain the phenomenon of deep learning.
In this talk, I will give two theories characterizing the generalization ability of Stochastic Gradient Langevin Dynamics (SGLD), a variant of the commonly used Stochastic Gradient Decent (SGD) algorithm in deep learning. Building upon tools from stochastic differential equation and partial differential equation, I show that SGLD has strong generalization power. The theory also explains several phenomena observed in deep learning experiments.