A Mathematical Introduction to Data Science (数据分析的数学导论) Spring 2016

 Course Information

### Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics and statistics who are interested in learning from data. Students with other backgrounds such as engineering and biology are also welcome, provided you have certain maturity of mathematics. It starts from two curses of dimensionality: Stein's Phenonema and random matrix theory in PCA, then covers some fundamental topics on high dimensional statistics, manifold learning, diffusion geometry, random walks on graphs, concentration of measure, random matrix theory, geometric and topological methods, etc.
Prerequisite: linear algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains); familiarity with Matlab or R.
Note: the website was broken due to a recent collapse of math.pku.edu.cn server and is still under recovery...

### Time and Place:

Tuesday 3:10-6:00pm;
Science Lecture Hall (理教) Rm 408
eBanshu classroom

### Homework and Projects:

We are targeting weekly homeworks with mini-projects, and a final major project. No final exam. Scribers will get bonus credit for their work!

### Teaching Assistant (助教):

Yuan, Huizhuo (袁会卓) Email: datascience_hw (add "AT 126 DOT com" afterwards)

### Reference

• Books
• Papers
• [Achlioptas01] Achlioptas, Dimitris (2001) "Database-friendly Random Projections". Proc 20th ACM Symp Principles of Database Systems, Santa Barbara, CA, 2001, 274-281. [pdf].

• [Arun87] Arun, K. S., Huang, T. S., and Blostein, S. D. (1987) Least-squares fitting of two 3-D point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 (5), pp. 698-700. [pdf].

• [Bavaud10] Bavaud, Francois (2010) "On the Schoenberg Transformations in Data Analysis: Theory and Illustrations". .

• [Laplacian] Belkin, M. and P. Niyogi (2003) "Laplacian eigenmaps for dimensionality reduction and data representation.". Neural Computation 15:1373-1396. [pdf].

• [Belkin_Niyogi_NIPS2002] Belkin, M. and P. Niyogi (2002) "Using Manifold Structure for Partially Labelled Classification". NIPS 2002 [pdf].

• [Ye06] P. Biswas, T.-C. Liang, K.-C. Toh, T.-C. Wang, and Y. Ye (2006) "Semidefinite programming approaches for sensor network localization with noisy distance measurements". IEEE Transactions on Automation Science and Engineering, 3 (2006), pp. 360--371. [pdf].

• [BrinPage98] Sergey Brin, Larry Page (1998) "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Proceedings of the 7th international conference on World Wide Web (WWW). Brisbane, Australia. pp. 107-117. [pdf].

• [RPCA] E. J. Candes, X. Li, Y. Ma, and J. Wright (2009) "Robust Principal Component Analysis?". Journal of ACM, 58(1), 1-37. [pdf].

• [Parrilo_SIAM09] V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A. Willsky (2009) "Rank-Sparsity Incoherence for Matrix Decomposition". http://arxiv.org/pdf/0906.2220 . [pdf].

• [Chang08] Chang, Kung Ching, Kelly Pearson, and Tan Zhang (2008) "Perron-Frobenius theorem for nonnegative tensors". Commun. Math. Sci. Volume 6, Number 2 (2008), 507-520. [pdf].

• [Chung07] Chung, Fan R.K. (2007) "Four proofs for the Cheeger inequality and graph partition algorithms". ICCM 2007. [pdf].

• [Coifman05] Coifman, Lafon, Lee, Maggioni, Nadler, Warner, and Zucker (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps I; Multiscale Methods II. PNAS [I.pdf][II.pdf].

• [CoifmanLafon06] Coifman and Lafon (2006) Diffusion maps. Applied and Computational Harmonic Analysis. [pdf].

• [Dasgupta99] Dasgupta and Gupta (1999) "An Elementary Proof of Johnson-Lindenstrauss Lemma" .

• [SPCA_SDP] A. d'Aspremont, L. El Ghaoui, M. Jordan, and G. Lanckriet (2006) "A Direct Formulation of Sparse PCA using Semidefinite Programming". preprint arxiv.org/pdf/cs/0406021. Published at SIAM Review, vol. 49, no. 3, 2007. .

• [Hessian] Donoho, D.L. and C. Grimes (2003) "Hessian Eigenmaps: New Locally Linear Embedding techniques for high dimensional data" .

• [EfronMorris74] Efron, Bradley and Carl Morris (1974). Data Analysis using Stein's Estimator and Its Generalizations.

• [FanHoffman55] Fan, K. and Hoffman, A. J. (1987) Some Metric Inequalities in the Space of Matrices. Proceedings of the American Mathematical Society, 6 (1), pp. 111-116. [pdf].

• [Fouss07-CommuteDistance] Fouss, Francois, Alain Pirotte, Jean-michel Renders, and Marco Saerens (2007) Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3), pp. 355-369. [pdf].

• [GobelJagers74] Gobel, F. and A. Jagers (1974). "Random Walks on Graphs". Stochastic Processes and Their Applications, 2: 311-336. [pdf].

• [Hein05] Hein, M., J. Audibert, and U. von Luxburg (2005) From graphs to manifolds: weak and strong pointwise consistency of graph Laplacians, COLT, 2005. [pdf].

• [Hochbaum10] Hochbaum, Dorit (2010) "Polynomial Time Algorithms for Ratio Regions and a Variant of Normalized Cut". IEEE Trans. Pattern Analysis and Machine Intelligence, 32, 2010. [pdf].

• [Hunter06] Hunter, J.J. (2006) "Variances of first passage times in a Markov chain with applications to mixing times". Res. Lett. Inf. Math. Sci., 10:17-48, 2010. [pdf].

• [Indyk98] Indyk, P. and R. Motwani (1998) "Approximate nearest neighbors: Towards removing the curse of dimensionality". Proc 30th Annu ACM Symp Theory of Computing, Dallas, TX, 1998, pp. 604-613. [pdf].

• [Johnstone06] Johnstone, I (2006) High Dimensional Statistical Inference and Random Matrices. .

• [Jones11] Peter Wilcox Jones, Andrei Osipov, and Vladimir Rokhlin (2011) Randomized Approximate Nearest Neighbhors Algorithm. PNAS, 2011 [pdf].

• [Keller75] Keller, J. B. (1975) Closest Unitary, Orthogonal and Hermitian Operators to a Given Operator. Mathematics Magazine, 48 (4), pp. 192-197. [pdf].

• [Kleinberg99] Kleinberg, Jon (1999). "Authoritative sources in a hyperlinked environment". Journal of the ACM 46 (5): 604-632. [pdf].

• [KleinRandic93] Klein, D.J. and M. Randic (1993). "Resistance Distance". J. Math. Chemistry 12: 81-95. [pdf].

• [Li2008] Li J.Z., et al. (2008). "Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation". .

• [Luxburg07] Ulrike von Luxburg (2007). A tutorial on spectral clustering.

• [Luxburg08] Ulrike von Luxburg, Mikhail Belkin, and Olivier Bousquet (2008). Consistency of Spectral Clustering. Ann. Stat. 36(2): 555-586. [pdf]

• [MeilaShi01] Meila and Shi (2001). "A random walk view of spectral segmentation". AISTAT'01 [pdf, 7.7 MB].

• [Nadler_Srebro_NIPS2009] Nalder, Boaz, Nathan Srebro, and Xueyuan Zhou (2009) "Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data". NIPS 2009 [pdf].

• [Nadakuditi10] Nadakuditi, R. R. and F. Benaych-Georges (2010) The breakdown point of signal subspace estimation. IEEE Sensor Array and Multichannel Signal Processing Workshop (October 2010), pg. 177-180 [pdf].

• [QiuHancock07] Qiu, Huaijun, and E.R. Hancock (2007) "Clustering and Embedding Using Commute Times", IEEE Trans. Pattern Analysis and Machine Intelligence, 29(11): 1873-1890. [pdf].

• [RadLuxHei09] Radl, Agnes, Ulrike von Luxburg, and Matthias Hein (2007) The Resistance Distance is Meaningless for Large Random Geometric Graphs. .

• [LLE] Roweis, Sam T. and Saul K. Lawrence (2000) Locally Linear Embedding. Science, 290:2323-2326. [LLE Website].

• [ShiMalik00] Shi, Jianbo and Jitendra Malik (2000). "Normalized Cuts and Image Segmentation". IEEE Transactions on Pattern Analysis and Machine Intelligence,22(8): 888-905. [pdf].

• [Singer06] Singer, Amit (2006) From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis. [pdf].

• [Stein56] Stein, Charles (1956). Inadmissibility of the usual estimator for the mean of a multivariate distribution. 1974. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. 1. pp. 197-206. [pdf]

• [ISOMAP] Tenenbaum, J.B., V. de Silva and J. C. Langford (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290:2319-2323. [ISOMAP Website]

• [MDS'2014] Tsang, Jeffrey and Rajesh Pereira (2014). Taking all positive eigenvectors is suboptimal in classical multidimensional scaling. arXiv:1402.2703v1

• [MVU] Weinberger, Killian Q. and Lawrence K. Saul (2006). "Unsupervised Learning of Image Manifolds by Semidefinite Programming". International Journal of Computer Vision 70(1), 77-90, 2006 [pdf]

• [ZhaZha09] Hongyuan Zha and Zhenyue Zhang (2009). "Spectral properties of the alignment matrices in manifold learning". SIAM Review. [pdf]

• [LTSA] Zhenyue Zhang and Hongyuan Zha (2005). "Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment". SIAM Journal of Scientific Computing 26(1)[pdf]

• [ZhuLaf_ICML2003] Xiaojin Zhu, Zoubin Ghahramani and John Lafferty (2003). "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions". ICML 2003 [pdf]

• Datasets
• Latex Template for Lecture Notes

by YAO, Yuan.