Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014"
From Cohen Courses
Jump to navigationJump to searchLine 21: | Line 21: | ||
* Mon Feb 3. [[Class meeting for 10-605 Parallel Perceptrons|Rocchio and Parallel Perceptrons]] | * Mon Feb 3. [[Class meeting for 10-605 Parallel Perceptrons|Rocchio and Parallel Perceptrons]] | ||
− | * Wed Feb 5. [[Class meeting for 10-605 Hadoop 1| | + | * Wed Feb 5. [[Class meeting for 10-605 Hadoop 1|Perceptrons/Map-reduce and Hadoop]]. |
** '''Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''' | ** '''Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''' | ||
** ''New Assignment: phrase finding with stream-and-sort''. [http://curtis.ml.cmu.edu/w/courses/images/5/5e/Phrases.pdf PDF Handout] | ** ''New Assignment: phrase finding with stream-and-sort''. [http://curtis.ml.cmu.edu/w/courses/images/5/5e/Phrases.pdf PDF Handout] | ||
− | * Mon Feb 10. [[Class meeting for 10-605 | + | * Mon Feb 10. [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons]]. |
* Wed Feb 12. [[Class meeting for 10-605 Guest Lecture|Guest lecture: Matt Hurst, Microsoft/Bing: Local Search at Bing]] | * Wed Feb 12. [[Class meeting for 10-605 Guest Lecture|Guest lecture: Matt Hurst, Microsoft/Bing: Local Search at Bing]] | ||
** ''New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop''. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/hadoop.pdf PDF Handout] | ** ''New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop''. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/hadoop.pdf PDF Handout] |
Revision as of 11:25, 10 February 2014
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014.
Notes:
- The assignments are from 2013, and will be modified over the course of the semester - some may be changed substantially.
- Lecture notes will be posted around the time of the lectures.
Contents
January
- Mon Jan 13. Overview of course, cost of various operations, asymptotic analysis.
- Wed Jan 15. Review of probabilities, joint distributions and naive Bayes
- Mon Jan 20. No class - Martin Luther King Day.
- Wed Jan 22. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- New Assignment: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Mon Jan 27. Messages and records 1; Phrase finding.
- Assignment due: streaming Naive Bayes 1 (with feature counts in memory).
- Wed Jan 29. Phrase Finding and Rocchio
- New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- Thursday Jan 30. Scheduled down-time for the wiki host. (Obviously, it's up again now!)
February
- Mon Feb 3. Rocchio and Parallel Perceptrons
- Wed Feb 5. Perceptrons/Map-reduce and Hadoop.
- Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
- New Assignment: phrase finding with stream-and-sort. PDF Handout
- Mon Feb 10. Parallel Perceptrons.
- Wed Feb 12. Guest lecture: Matt Hurst, Microsoft/Bing: Local Search at Bing
- New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop. PDF Handout
- Mon Feb 17. Scalable SGD and Hash Kernels
- Assignment due: phrase finding with stream-and-sort
- Wed Feb 19. Matrix Factorization and SGD
- Streaming run on Hadoop of Naive Bayes due - checkpoint
- Mon Feb 24. Background on randomized algorithms; Graph computations 1.
- Wed Feb 26. Graphs computations 2
- Hadoop assignment (Naive Bayes) due
March
- Mon Mar 3. Guest Lecture: Garth Gibson, topic TBA
- Wed Mar 4. Guest lecture: Alex Beutel, SGD on Hadoop
- Hadoop assignment (phrase-finding) due
- New Assignment: memory-efficient SGD PDF writeup
- Mon Mar 10. no class - spring break.
- Wed Mar 12. no class - spring break.
- Mon Mar 17. Subsampling a graph with RWR
- Wed Mar 19. Semi-supervised learning via label propagation on graphs
- Assignment due: memory-efficient SGD
- New Assignment: Subsampling and visualizing a graph. PDF writeup
- Mon Mar 24. Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs
- Wed Mar 26. Understanding spectral clustering techniques.
- Mon Mar 31. Sparse sampling and parallelization for LDA
April and May
- Wed Apr 2. Speeding up LDA-like models: All-reduce and online LDA
- Assignment due: Subsampling and visualizing a graph.
- New Assignment: K-Means on MapReduce. PDF writeup
- Mon Apr 7. Fast KNN and similarity joins 1.
- Wed Apr 9. Fast KNN and similarity joins 2.
- Mon Apr 14. Scaling up decision tree learning
- Wed Apr 16. Gradient boosting with trees
- Assignment due: K-Means on MapReduce.
- New Assignment: Multi-class image classification or scalable classification using a linear classifier.
- Mon Apr 21. TBD
- Wed Apr 23. TBD
- Mon Apr 28. TBD
- Assignment due: Multi-class image classification or scalable classification.
- Wed Apr 30. In-class exam.