If you are like me, entering into the field of deep learning with experience in traditional machine learning, you may often ponder over this question: Since a typical deep neural network has so many parameters and training error can easily be perfect, it should surely suffer from substantial overfitting. How could it be ever generalized to out-of-sample data points?
Tremendous piece by Lilian Weng about deep learning. If you ever wondered why and how deep learning works without hopelessly overfitting this article is for you. Includes a lot of references to interesting and current research.