1:30 pm VIA ZOOM
The catapult phase of deep learning.
Bio: Guy Gur-Ari is a staff research scientist in Google and leads a research team focused on taking a scientific approach to advancing the field of deep learning. His recent research focuses on applying ideas from theoretical physics to understand the training dynamics of large neural networks. Before becoming interested in the science of deep learning, Guy worked on high-energy theoretical physics as a postdoctoral fellow at the Institute for Advanced Study in Princeton and Stanford University. He obtained his PhD in physics from the Weizmann Institute of Science in 2014.
Abstract: Why do large learning rates often produce better results? Why do “infinitely wide” networks trained using kernel methods tend to underperform ordinary networks? In the talk I will argue that these questions are related. Existing kernel-based theory can explain the dynamics of networks trained with small learning rates. However, optimal performance is often achieved at large learning rates, where we find qualitatively different dynamics that converge to flat minima. The distinction between the small and large learning rate phases becomes sharp at infinite width, and is reminiscent of nonperturbative phase transitions that appear in physical systems.