PKU

A Mathematical Introduction to Data Science (数据分析的数学导论)
Spring 2016


Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics and statistics who are interested in learning from data. Students with other backgrounds such as engineering and biology are also welcome, provided you have certain maturity of mathematics. It starts from two curses of dimensionality: Stein's Phenonema and random matrix theory in PCA, then covers some fundamental topics on high dimensional statistics, manifold learning, diffusion geometry, random walks on graphs, concentration of measure, random matrix theory, geometric and topological methods, etc.
Prerequisite: linear algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains); familiarity with Matlab or R.
Note: the website was broken due to a recent collapse of math.pku.edu.cn server and is still under recovery...

Lecture Notes (constantly updated)

[pdf download]

Time and Place:

Tuesday 3:10-6:00pm;
Science Lecture Hall (理教) Rm 408
eBanshu classroom

Homework and Projects:

We are targeting weekly homeworks with mini-projects, and a final major project. No final exam. Scribers will get bonus credit for their work!

Teaching Assistant (助教):

Yuan, Huizhuo (袁会卓) Email: datascience_hw (add "AT 126 DOT com" afterwards)

Schedule (时间表)

Date Topic Instructor Scriber
02/23/2016, Tue Lecture 00: Introduction to Course Syllabus [pdf]
Yuan Yao
03/01/2016, Tue Lecture 01: Maximum Likelihood Estimate and Stein's Phenomenon [pdf]
    [Homework 1]:
  • Homework 1 [pdf]. Deadline: 3/08/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao
03/08/2016, Tue Lecture 02: Random Matrix Theory and Phase Transitions in PCA [pdf]
    [Homework 2]:
  • Homework 2 [pdf]. Deadline: 3/15/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao
03/15/2016, Tue Lecture 03: Geometry of PCA and MDS [pdf]
    [Homework 3]:
  • Homework 3 [pdf]. Deadline: 3/22/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao
03/22/2016, Tue Lecture 04: MDS and Random Projections [pdf]
    [Homework 4]:
  • Homework 4 [pdf]. Deadline: 3/29/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao Zhu, Weizhi
03/29/2016, Tue Lecture 05: Introduction to Compressed Sensing and High Dimensional Statistics: OMP, BP, LASSO, and ISS [pdf]
    [Homework]: Enjoy the holiday break!
Yuan Yao Zhan, Ruohan
Zhu, Weizhi
04/05/2016, Tue Lecture 06: Robust PCA and Sparse PCA: SDP approach[pdf]
    [Homework 5]:
  • Homework 5 [pdf]. Deadline: 4/12/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao
04/12/2016, Tue Lecture 07: Generalized MDS (Sensor Network Localization): SDP approach[pdf]
    [Homework 6]:
  • Homework 6 [pdf]. Deadline: 4/19/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao
04/19/2016, Tue Lecture 08: Manifold Learning I: ISOMAP and LLE[pdf]
    [Homework 7]:
  • Homework 7 [pdf]. Deadline: 4/26/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao
04/26/2016, Tue Lecture 09: Manifold Learning II: generalized LLE -- Laplacian, Hessian, Diffusion, LTSA, and VDM [pdf]
    [Homework 8]:
  • Homework 8 [pdf]. Deadline: 5/3/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
    [Project 1]:
  • Mini Project 1 [pdf]. Deadline: 5/17/2016, Tuesday. Mark on the head of your homework: Name - Student ID.
Yuan Yao
05/03/2016, Tue Lecture 10: Perron-Frobenius Theory vs. PageRank and Fiedler/Cheeger Cut vs. Spectral Bi-partition
Yuan Yao
05/10/2016, Tue Lecture 11: Lumpable Markov Chains vs. Multiple Spectral Clustering and Transition Path Theory vs. Semi-supervised Learning
Yuan Yao
05/17/2016, Tue Lecture 12: An Introduction to Topological Data Analysis [pdf]
    [Seminar]:
  • Speaker: Dr. Ke Ye, Department of Statistics, University of Chicago
  • Title: The distance between linear subspaces of different dimensions
  • Abstract: The distance between linear subspaces of the same dimension is well known. Such a distance can be easily computed by singular value decomposition (SVD). In this talk, we first review classical results then we will discuss how to generalize the notion of the distance between linear subspaces of different dimensions, from the geometric point of view. Our results are based on the observation that for linear subspaces A, B of different dimensions, there are two natural candidates for the distance between A and B. It turns out that the two candidates actually coincide. With this observation, we are able to derive distances on the Sato grassmannian, which is defined as the union of all grassmannians. Such a distance can also be easily computed by SVD. If time permits, we also explain how to generalize our results to affine linear subspaces of different dimensions. This is a join work with Lek-Heng Lim.
  • Reference: Ke Ye and Lek-Heng Lim Schubert varieties and distances between subspaces of different dimensions.
Yuan Yao
05/18/2016, Tue Seminar: The Cohomology of the Cryo-EM problem
  • Speaker: Dr. Ke Ye, Department of Statistics, University of Chicago
  • Abstract: Cryo-electron microscopy (Cryo-EM) is a device to study the 3D structure of molecules. For a given type of molecule, we freeze samples rapidly in a thin layer of ice (without crystals). We take tomographic images of each sample to obtain a large collection of 2D images. The goal of the Cryo-EM problem is to reconstruct the 3D structure of the molecule from those 2D images. In this talk, we first present the Hadani - Singer model for the Cryo-EM problem and then we review necessary results from algebraic topology. Lastly, we relate the data obtained from 2D projection images to the cohomology of a two dimensional simplicial complex with coefficient in SO(2). This is a joint work with Lek-Heng Lim.
Yuan Yao
05/24/2016, Tue Lecture 13: Applied Hodge Theory[pdf]
Yuan Yao
05/31/2016, Tue Lecture 14: Final Project [pdf]
Yuan Yao

Reference


by YAO, Yuan.