Instructor: Yaniv Plan
Office: 1219 Math Annex
Email: yaniv (at) math (dot) ubc (dot) ca

Lectures: TuTh, 11:00 am to 12:30 pm, in MATX 1102.

Office hours: Mo 1:00 pm to 2:00 pm, MATX 1219 and by request.

Prerequisites: The course will assume basic knowledge of probability, linear algebra, and functional analysis. For example, I will assume you have familiarity with stochastic processes, norms, singular values, and Lipschitz functions.

Overview: See here. Given student interest, we will delve into machine learning theory in the later part of the class (VC dimension and Rademacher complexity). We will apply this theory to deep learning, discuss why it is not explaining the success of deep learning, and pursue alternate routes.

Detailed course outline: See here. (We probably won’t cover all of those topics.)

Textbook: This course is based on a similar course from Roman Vershynin. There is no required textbook. The following references cover some of the material, and they are available online:

Homework 2. Due Thursday, March 14. Please feel free to come talk to me for more perspective on the open questions.

Project: First, please read over this paper. Let’s concentrate on ReLU activation functions as in Theorem 1.1. For simplicity, let’s first assume \epsilon = \eta = 0. Do you think that the requirement m > k d log(n) is tight? What would be the right bound if the activation function was the identity?

Fast Convex Pruning of Deep Neural Networks. This one could be useful if deciding to add sparsity as an extra structure to the neural net. If the weight matrices are sparsified, does this effect the Gaussian mean width of the range?

Probability in high dimensionsInstructor:Yaniv PlanOffice: 1219 Math Annex

Email: yaniv (at) math (dot) ubc (dot) ca

Lectures:TuTh, 11:00 am to 12:30 pm, in MATX 1102.Office hours:Mo 1:00 pm to 2:00 pm, MATX 1219 and by request.Prerequisites:The course will assume basic knowledge of probability, linear algebra, and functional analysis. For example, I will assume you have familiarity with stochastic processes, norms, singular values, and Lipschitz functions.Overview:See here. Given student interest, we will delve into machine learning theory in the later part of the class (VC dimension and Rademacher complexity). We will apply this theory to deep learning, discuss why it is not explaining the success of deep learning, and pursue alternate routes.Detailed course outline:See here. (We probably won’t cover all of those topics.)Textbook:This course is based on a similar course from Roman Vershynin. There is no required textbook. The following references cover some of the material, and they are available online:Grading:50% bi-weekly homework, 50% project. The project may be a literature review or a mini-research problem.Homework 1. Due Tuesday, February 5. Also see Nick Harvey’s notes on a stochastic gradient descent question.

Homework 2. Due Thursday, March 14. Please feel free to come talk to me for more perspective on the open questions.

Project:First, please read over this paper. Let’s concentrate on ReLU activation functions as in Theorem 1.1. For simplicity, let’s first assume \epsilon = \eta = 0. Do you think that the requirement m > k d log(n) is tight? What would be the right bound if the activation function was the identity?Potential related papers to present: