These notes were made while studying Francis Bach’s Learning Theory from First Principles, with the guidance of Prof. Michael Riis Andersen. The material is graduate-level, covering the mathematical foundations of why and how machine learning algorithms generalize.
The notes follow Bach’s text chapter by chapter. Each chapter includes preliminary definitions, worked derivations, standalone proofs, and a brief discussion of the significance of the results. I also drew on a range of outside sources to supplement proofs and build intuition.
Topics covered include concentration inequalities (Hoeffding, McDiarmid), PAC learning, empirical risk minimization, Rademacher complexity, and optimization for machine learning.