Dates and Titles 
Topics 
Lecture Slides 
Suggested Further Readings 
Lecture 1
Introduction to machine learning 

Superised learning

Unsupervised learning

Probability primer



Lecture 2
Density estimation 

Maximum likelihood estimation

MAP estimation

Bayesian estimation



Lecture 3
Clustering I 
 kmeans clustering
 Mixture of Gaussians (MoG)


 Chapter 9.1 and 9.2 in Bishop's PRML.

Lecture 4
Expectation Maximization 
 Jensen's inequality
 Information theory preliminaries
 EM optimization
 Generalized EM
 Incremental EM
 EM for exponetial families


 Chapter 9.3 and 9.4 in Bishop's PRML.

A. P. Dempster, N. M. Laird, and D. B. Rubin (1977)
"Maximum likelihood from incomplete data via the EM algorithm
(with discussion),"
Journal of the Royal Statistical Society B,
vol. 39, pp. 138, 1977.

R. M. Neal and G. E. Hinton(1999)
"A view of the EM algorithm that
justifies incremental, sparse, and other variants,"
Learning in Graphical Models (edited by M. Jordan),
pp. 355368, 1999.

Lecture 5
Latent variable models 
 Maximum likelihood factor analysis
 Probabilistic PCA
 Mixture of factor analyzers
 Mixture of probabilistic principal component analyzers
 SVD
 Probabilistic latent semantic analysis (PLSA)


 Chapter 12.1 and 12.2 in Bishop's PRML.
 C. M. Bishop (1999),
"Latent variable models,"
In M. I. Jordan (Ed.), Learning in Graphical Models, 1999.
 M. E. Tipping and C. M. Bishop (1999),
"Probabilistic principal component analysis,"
Journal of the Royal Statistical Society, Series B,
vol. 21, pp. 611622, 1999.
 M. E. Tipping and C. M. Bishop (1999),
"Mixtures of probabilistic principal component analyzers,"
Neural Computation,
vol. 11, pp. 443482, 1999.
 Z. Ghahramani and G. E. Hinton (1996),
"The EM algorithm for mixtures of factor analyzers,"
University of Toronto Technical Report CRGTR961.
 S. Roweis (1997),
"EM algorithms for PCA and SPCA,"
NIPS1997.
 T. Hofmann (1999),
"Probabilistic latent semantic analysis,"
UAI1999.

Lecture 6
Clustering II 
 Nonnegative matrix factorization
 Spectral clustering


 D. D. Lee and H. S. Seung (1999),
"Learning the Parts of Objects by Nonnegative Matrix Factorization",
Nature,
vol. 401, pp. 788791, 1999.
 C. Ding, T. Li, W. Peng, and H. Park (2006),
"Orthogonal nonnegative matrix trifactorizations for clustering,"
KDD2006.
 A. Cichocki, H. Lee, Y.D. Kim, and S. Choi (2008),
"Nonnegative matrix factorization with alphadivergence,"
Pattern Recognition Letters,
vol. 29, no. 9, pp. 14331440, July 2008.
 J. Shi and J. Malik (2000),
"Normalized Cuts and Image Segmentation",
IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 22, no. 8, pp. 888905, 2000.
 U. von Luxburg (2007),
"A tutorial on spectral clustering,"
Statistics and Computing,
vol. 17, no. 4, pp. 395416, 2007.
 See my note on
extremal properties of eigenvalues.

Lecture 7
Regression 
 Regression
 Linear models for regression
 Least suares and RLS
 Biasvariance dilemma
 Bayesian linear regression
 Gaussian process regression



Lecture 8
Linear models for classification 
 Bayes decision theory
 Fisher's linear discriminant analysis
 Logistic regression
 Perceptron
 Support vector machine



Lecture 9
Neural networks 
 Adaline
 Perceptron
 Multilayer perceptron (MLP)
 Radial basis functoin (RBF) network


 Chapter 5 in Bishop's PRML.

Lecture 10
Mixture of experts 


 Chapter 14.5 in Bishop's PRML.

Lecture 11
Kernel methods 


 Chapter 12.3 in Bishops' PRML.

Lecture 12
Hidden Markov models 
 Hidden Markov models (HMMs)


 Chapter 13.2 in Bishops' PRML.
