Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014"
From Cohen Courses
Jump to navigationJump to search (→March) |
|||
Line 49: | Line 49: | ||
== April and May == | == April and May == | ||
− | * | + | * Wed Apr 1. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and online LDA]] |
− | |||
** '''Assignment due: Subsampling and visualizing a graph.''' | ** '''Assignment due: Subsampling and visualizing a graph.''' | ||
** ''New Assignment: K-Means on MapReduce.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/kmeans.pdf PDF writeup] | ** ''New Assignment: K-Means on MapReduce.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/kmeans.pdf PDF writeup] | ||
− | * Mon Apr | + | * Mon Apr 7. [[Class meeting for 10-605 2013 04 08|Fast KNN and similarity joins 1.]] |
− | * Wed Apr | + | * Wed Apr 9. [[Class meeting for 10-605 2013 04 10|Fast KNN and similarity joins 2.]] |
− | * Mon Apr | + | * Mon Apr 14. [[Class meeting for 10-605 2013 04 15|Scaling up decision tree learning]] |
** '''Project progress report due''' | ** '''Project progress report due''' | ||
− | * Wed Apr | + | * Wed Apr 16. [[Class meeting for 10-605 2013 04 17|Gradient boosting with trees, and SGD for matrix factorization]] |
** '''Assignment due: K-Means on MapReduce.''' | ** '''Assignment due: K-Means on MapReduce.''' | ||
** ''New Assignment: Multi-class image classification or scalable classification using a linear classifier.'' Both of these count as one assignment toward your six. | ** ''New Assignment: Multi-class image classification or scalable classification using a linear classifier.'' Both of these count as one assignment toward your six. | ||
*** [http://www.cs.cmu.edu/~wcohen/10-605/assignments/image.pdf PDF writeup of image-classification assignment] | *** [http://www.cs.cmu.edu/~wcohen/10-605/assignments/image.pdf PDF writeup of image-classification assignment] | ||
*** [http://www.cs.cmu.edu/~wcohen/10-605/assignments/big-classifier.pdf PDF writeup of scalable classification] | *** [http://www.cs.cmu.edu/~wcohen/10-605/assignments/big-classifier.pdf PDF writeup of scalable classification] | ||
− | * Mon Apr | + | * Mon Apr 21. TBD |
− | + | * Wed Apr 23. TBD | |
− | * Wed Apr | + | * Mon Apr 28. TBD |
− | + | * Wed Apr 30. In-class exam. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | * Mon Apr | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | * Wed | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
** '''Assignment due: Multi-class image classification or scalable classification.''' | ** '''Assignment due: Multi-class image classification or scalable classification.''' | ||
Revision as of 18:07, 8 January 2014
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014.
Contents
January
- Mon Jan 13. Overview of course, cost of various operations, asymptotic analysis.
- Wed Jan 15. Review of probabilities.
- Mon Jan 20. 'No class for Martin Luther King Day.
- Wed Jan 22. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- New Assignment: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Mon Jan 27. Messages and records 1; Phrase finding.
- Assignment due: streaming Naive Bayes 1 (with feature counts in memory).
- New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- Wed Jan 29. More on streaming algorithms: Rocchio, and theory of on-line learning
February
- Mon Feb 3. More on streaming algorithms: parallelized voted perceptrons.
- Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
- New Assignment: phrase finding with stream-and-sort. PDF Handout
- Wed Feb 5. Map-reduce and Hadoop 1.
- Mon Feb 10. Map-reduce and Hadoop 2.
- Wed Feb 12. Hadoop helpers and Scalable SGD
- Assignment due: phrase finding with stream-and-sort
- New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop. PDF Handout
- Mon Feb 17. Scalable SGD and Hash Kernels
- Wed Feb 19. Matrix Factorization ad SGD
- Streaming run on Hadoop of Naive Bayes due - checkpoint
- Mon Feb 24. Background on randomized algorithms; Graph computations 1.
- Wed Feb 26. Graphs computations 2
- Hadoop assignment (Naive Bayes) due
March
- Mon Mar 3. "Guest Lecture: Garth Gibson, topic TBA"
- Wed Mar 4. Guest lecture: Alex Beutel, SGD on Hadoop"
- Hadoop assignment (phrase-finding) due
- New Assignment: memory-efficient SGD PDF writeup
- New assignment: initial project proposals. PDF writeup
- Mon Mar 10. no class - spring break.
- Wed Mar 12. no class - spring break.
- Mon Mar 17. Subsampling a graph with RWR
- Wed Mar 19. Semi-supervised learning via label propagation on graphs
- Assignment due: memory-efficient SGD
- New Assignment: Subsampling and visualizing a graph. PDF writeup
- Mon Mar 24. Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs
- Wed Mar 26. Understanding spectral clustering techniques.
- Wed Mar 31. Sparse sampling and parallelization for LDA
April and May
- Wed Apr 1. Speeding up LDA-like models: All-reduce and online LDA
- Assignment due: Subsampling and visualizing a graph.
- New Assignment: K-Means on MapReduce. PDF writeup
- Mon Apr 7. Fast KNN and similarity joins 1.
- Wed Apr 9. Fast KNN and similarity joins 2.
- Mon Apr 14. Scaling up decision tree learning
- Project progress report due
- Wed Apr 16. Gradient boosting with trees, and SGD for matrix factorization
- Assignment due: K-Means on MapReduce.
- New Assignment: Multi-class image classification or scalable classification using a linear classifier. Both of these count as one assignment toward your six.
- Mon Apr 21. TBD
- Wed Apr 23. TBD
- Mon Apr 28. TBD
- Wed Apr 30. In-class exam.
- Assignment due: Multi-class image classification or scalable classification.
May
- 9am, Tuesday, May 7. Project writeups due. Submit a paper to Blackbook in PDF in the ICML 2013 format (minimum 5 pp, up to 8pp double column), except, of course, do not submit it anonymously.
- Note: this is extended from previous deadline of Fri May 3---but I can't give any further extensions! Your project report should discuss
- The problem you're trying to solve, and why it's important and/or interesting.
- Related work, especially any related work that you're building on.
- The data that you're working with.
- The methods that you're using (in some detail - even if these are off-the-shelf methods, I want to know that you understand them)
- The experiments you did, the metrics you used to evaluate them, and the results.
- What was learned from the experiments (the conclusions).
- You should think of this as an exercise in writing a conference-style paper: so try and write in that style. (Of course, your work doesn't need to advance the state-of-the-art in machine learning, or be highly novel, but it should be well-described.)
- Note: this is extended from previous deadline of Fri May 3---but I can't give any further extensions! Your project report should discuss