Dates and Titles 
Topics 
Lecture Slides 
Suggested Further Readings 
Lecture 1
Introduction to machine learning 

Superised learning

Unsupervised learning

Probability primer



Lecture 2
Density estimation 

Maximum likelihood estimation

MAP estimation

Bayesian estimation



Lecture 3
Clustering I 
 kmeans clustering
 Mixture of Gaussians (MoG)


 Chapter 9.1 and 9.2 of Bishop's

Lecture 4
Expectation Maximization 
 Jensen's inequality
 Information theory preliminaries
 EM optimization
 Generalized EM
 Incremental EM
 EM for exponetial families


 Chapter 9.3 and 9.4 of Bishop's

A. P. Dempster, N. M. Laird, and D. B. Rubin (1977)
"Maximum likelihood from incomplete data via the EM algorithm
(with discussion),"
Journal of the Royal Statistical Society B,
vol. 39, pp. 138, 1977.

R. M. Neal and G. E. Hinton(1999)
"A view of the EM algorithm that
justifies incremental, sparse, and other variants,"
Learning in Graphical Models (edited by M. Jordan),
pp. 355368, 1999.

Lecture 5
Clustering II 
 Spectral clustering
 Nonnegative matrix factorization


 J. Shi and J. Malik (2000),
"Normalized Cuts and Image Segmentation",
IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 22, no. 8, pp. 888905, 2000.
 U. von Luxburg (2007),
"A tutorial on spectral clustering,"
Statistics and Computing,
vol. 17, no. 4, pp. 395416, 2007.
 See my note on
extremal properties of eigenvalues.
 D. D. Lee and H. S. Seung (1999),
"Learning the Parts of Objects by Nonnegative Matrix Factorization",
Nature,
vol. 401, pp. 788791, 1999.
 C. Ding, T. Li, W. Peng, and H. Park (2006),
"Orthogonal nonnegative matrix trifactorizations for clustering,"
KDD2006.
 A. Cichocki, H. Lee, Y.D. Kim, and S. Choi (2008),
"Nonnegative matrix factorization with alphadivergence,"
Pattern Recognition Letters, 2008 (in press).

Lecture 5
Latent variable models 
 SVD and PCA
 Maximum likelihood factor analysis
 Probabilistic PCA
 Mixture of factor analyzers
 Mixture of probabilistic principal component analyzers


 Chapter 12.1 and 12.2 of Bishop's
 C. M. Bishop (1999),
"Latent variable models,"
In M. I. Jordan (Ed.), Learning in Graphical Models, 1999.
 M. E. Tipping and C. M. Bishop (1999),
"Probabilistic principal component analysis,"
Journal of the Royal Statistical Society, Series B,
vol. 21, pp. 611622, 1999.
 M. E. Tipping and C. M. Bishop (1999),
"Mixtures of probabilistic principal component analyzers,"
Neural Computation,
vol. 11, pp. 443482, 1999.
 Z. Ghahramani and G. E. Hinton (1996),
"The EM algorithm for mixtures of factor analyzers,"
University of Toronto Technical Report CRGTR961.
 S. Roweis (1997),
"EM algorithms for PCA and SPCA,"
NIPS1997.

Lecture 6
Regression 
 Regression
 Linear models for regression
 Least suares and RLS
 Biasvariance dilemma
 Bayesian linear regression



Lecture 7
Linear models for classification 
 Bayes decision theory
 Fisher's linear discriminant analysis
 Logistic regression
 Perceptron
 Support vector machine



Lecture 8
Neural networks 
 Multilayer perceptron (MLP)
 Radial basis functoin (RBF) network



Lecture 9
Mixture of experts 



Lecture 10
Kernel methods 
 Kernel PCA (KPCA)
 Kernel Fisher discriminant analysis (KFDA)



Lecture 11
Hidden Markov models 
 Hidden Markov models (HMMs)



Lecture 12
Gaussian process regression 


.
