Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014
From Cohen Courses
Revision as of 17:09, 2 June 2014 by Wcohen
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014.
- The assignments are from 2013, and will be modified over the course of the semester - some may be changed substantially.
- Lecture notes will be posted around the time of the lectures.
- Mon Jan 13. Overview of course, cost of various operations, asymptotic analysis.
- Wed Jan 15. Review of probabilities, joint distributions and naive Bayes
- Mon Jan 20. No class - Martin Luther King Day.
- Wed Jan 22. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- New Assignment: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Mon Jan 27. Messages and records 1; Phrase finding.
- Assignment due: streaming Naive Bayes 1 (with feature counts in memory).
- Wed Jan 29. Phrase Finding and Rocchio
- New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- Thursday Jan 30. Scheduled down-time for the wiki host. (Obviously, it's up again now!)
- Mon Feb 3. Rocchio and Parallel Perceptrons
- Wed Feb 5. Perceptrons/Map-reduce and Hadoop.
- Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
- New Assignment: phrase finding with stream-and-sort. PDF Handout
- Mon Feb 10. Parallel Perceptrons.
- Wed Feb 12. Guest lecture: Matt Hurst, Microsoft/Bing: Local Search at Bing. One-on-one meetings with Matt can be scheduled for Thursday 12/13 between 9-12 in Gates-Hillman 6501, afternoon meetings 12:30-1:30pm in Gates-Hillman 6002.
- Mon Feb 17. Scalable SGD and Hash Kernels
- Wed Feb 19. Matrix Factorization and SGD, plus another Hadoop demo
- Fri Feb 21. Nothing due - the streaming run for Naive Bayes, 4(a), has been postponed till Monday.
- Mon Feb 24. SGD for Matrix Factorization, and Randomized Algorithms 1 (Bloom Filters)
- Streaming run on Hadoop of Naive Bayes due
- Wed Feb 26. Randomized Algorithms
- Fri Feb 28.
- Non-streaming run on Hadoop of Naive Bayes due.
- Mon Mar 3. Guest Lecture: Garth Gibson, Cloud Computing and Programming Paradigms
- Wed Mar 5. Guest lecture: Alex Beutel, SGD on Hadoop
- Fri Mar 7.
- Hadoop assignment (phrase-finding) due
- Mon Mar 10. no class - spring break.
- Wed Mar 12. no class - spring break.
- Mon Mar 17. Scalable PageRank
- New Assignment: memory-efficient SGD PDF handout
- Wed Mar 19. Subsampling a graph with RWR
- Mon Mar 24. Subsamping continued and SSL on Graphs
- Wed Mar 26. Scalable spectral clustering techniques.
Assignment due: memory-efficient SGDdelayed to Mon 3/31
- Mon Mar 31. Sparse sampling and parallelization for LDA
- Assignment due: memory-efficient SGD
- New Assignment: Subsampling and visualizing a graph. PDF handout
April and May
- Wed Apr 2. Speeding up LDA-like models: All-reduce and online LDA
- Mon Apr 7. Workflows in PIG
- Wed Apr 9. Fast KNN and similarity joins
- Mon Apr 14. Parallel/Scalable Similarity Joins
- Assignment due: Subsampling and visualizing a graph.
- New Assignment: Workflows with Pig PDF handout
- Wed Apr 16. First-order logics
- Mon Apr 21. Scalable First-order logics
- Wed Apr 23. Graph models for large-scale ML
- Assignment due: Workflows with Pig
- Mon Apr 28. Exam review session.
- Wed Apr 30. In-class exam.