Probability in high dimensions

Instructor: Yaniv Plan
Office: 1219 Math Annex
Email:  yaniv (at) math (dot) ubc (dot) ca

Lectures: TuTh, 11:00 am to 12:30 pm, in MATX 1102.

Office hours: Mo 1:00 pm to 2:00 pm, MATX 1219 and by request.

Prerequisites: The course will assume basic knowledge of probability, linear algebra, and functional analysis.  For example, I will assume you have familiarity with stochastic processes, norms, singular values, and Lipschitz functions.

Overview:  See here.  Given student interest, we will delve into machine learning theory in the later part of the class (VC dimension and Rademacher complexity).  We will apply this theory to deep learning, discuss why it is not explaining the success of deep learning, and pursue alternate routes.

Detailed course outline: See here.  (We probably won’t cover all of those topics.)

Textbook:  This course is based on a similar course from Roman Vershynin.  There is no required textbook.  The following references cover some of the material, and they are available online:

  1. R. Vershynin, High-dimensional probability.  This book has the most overlap with our course.
  2. T. Tao, Topics in random matrix theory.
  3. D. Chafai, O. Guedon, G. Lecue, A. Pajor, Interactions between compressed sensing, random matrices and high dimensional geometry, preprint.
  4. R. Vershynin, Lectures in geometric funcitonal analysis.
  5. R. Adler, J. Taylor, Random fields and geometry.
  6. S. Foucart, H. Rauhut, A mathematical introduction to compressive sensing.
  7. J. Lee, notes on generic chaining and majorizing measures theorem.
  8. Theory of deep learning class at UdeM.
  9. Berkeley 2-month program on theory of deep learning, summer 2019.
  10. Earlier version of this course, which contains a series of notes.  For the beginning of the course, we will roughly follow the same notes.

Grading: 50% bi-weekly homework, 50% project.  The project may be a literature review or a mini-research problem.

Homework 1.  Due Tuesday, February 5.  Also see Nick Harvey’s notes on a stochastic gradient descent question.