# Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012

From Cohen Courses

Jump to navigationJump to searchThis is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012. **If you're taking 10-605 now, you're probably looking for the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013.**

## Contents

## January

- Tues Jan 17. Overview of course, cost of various operations, asymptotic analysis.
- Thus Jan 19. Review of probabilities.
- Tues Jan 24. Streaming algorithms and Naive Bayes.
*New Assignment: streaming Naive Bayes 1 (with feature counts in memory)*. PDF Handout

- Thus Jan 26. The stream-and-sort design pattern; Naive Bayes revisited.
- Tues Jan 31. Messages and records 1; Phrase finding.
**Assignment due: streaming Naive Bayes 1 (with feature counts in memory)**.*New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort*. PDF Handout

## February

- Thus Feb 2. More on streaming algorithms: Rocchio, and theory of on-line learning
- Tues Feb 7. More on streaming algorithms: parallelized voted perceptrons.
**Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort***New Assignment: phrase finding with stream-and-sort*. PDF Handout

- Thus Feb 9. Map-reduce and Hadoop 1 (Alona lecture).
- Tues Feb 14. Map-reduce and Hadoop 2. (Alona lecture, William is closer).
**Assignment due 2/15: phrase finding with stream-and-sort***New Assignment: Naive Bayes with Hadoop & Phrase-finding with Hadoop*PDF Handout

- Thus Feb 16. Hadoop helpers and Scalable SGD
- Tues Feb 21. Scalable SGD and Hash Kernels
- Thus Feb 23.
*Guest lecture*: Ron Bekkerman, LinkedIn, Scaling up Machine Learning - Tues Feb 28. Background on randomized algorithms; Graph computations 1.

## March

- Thus Mar 1.
*Guest Lecture*: Ben van Durme, JHU, Randomized Algorithms for Large-Scale Learning - Tues Mar 6. Learning on graphs 2.
**Hadoop assignments due***New Assignment: memory-efficient SGD*PDF writeup*New assignment: initial project proposals.*PDF writeup

- Thus Mar 8.
*Guest Lecture*: Joey Gonzales, CMU, GraphLab and Dynamic Asynchronous Computation PPT slides - Tues Mar 13.
*no class - spring break.* - Thus Mar 15.
*no class - spring break.* - Tues Mar 20. Subsampling a graph with RWR
**Assignment due: initial mini-project proposals.****Assignment due: memory-efficient SGD***New Assignment: Subsampling and visualizing a graph.*PDF writeup

- Thus Mar 22. Semi-supervised learning via label propagation on graphs
- Tues Mar 27. Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs
**Assignment due: Subsampling and visualizing a graph.***New Assignment: mini-project proposals (final version)*

- Thus Mar 29. Understanding spectral clustering techniques.
**Assignment due: mini-project proposals (final version).**

## April

- Tues Apr 3. LDA-like models for text and graphs; guest lecture from Partha Talukdar
- Thus Apr 5. Tentative: Guest lecture by U Kang, CMU.
- Tues Apr 10. Speeding up LDA-like models: sampling and parallelization
- Thus Apr 12. Fast KNN and similarity joins 1.
- Tues Apr 17. Fast KNN and similarity joins 2.
- Thus Apr 19.
*no class - Carnival* - Tues Apr 24. SGD for matrix factorization and online LDA
- Thus Apr 26. Scaling up decision tree learning

## May

- Tues May 1. Project reports.
- Thus May 3. Project reports.
- Fri May 4.
**Project writeups due at 5:00pm**. Submit a paper to Blackbook in PDF in the ICML 2012 format (up to 8pp double column), except, of course, do not submit it anonymously.