Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013"
From Cohen Courses
Jump to navigationJump to searchLine 21: | Line 21: | ||
* Mon Feb 11. [[Class meeting for 10-605 2013 02 11|Map-reduce and Hadoop 2]]. | * Mon Feb 11. [[Class meeting for 10-605 2013 02 11|Map-reduce and Hadoop 2]]. | ||
** '''Assignment due 2/15: phrase finding with stream-and-sort''' | ** '''Assignment due 2/15: phrase finding with stream-and-sort''' | ||
− | ** ''New | + | ** ''New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop'' [http://www.cs.cmu.edu/~afyshe/Assignment4.pdf PDF Handout] |
* Wed Feb 13. [[Class meeting for 10-605 2013 02 13|Hadoop helpers and Scalable SGD]] | * Wed Feb 13. [[Class meeting for 10-605 2013 02 13|Hadoop helpers and Scalable SGD]] | ||
* Mon Feb 18. [[Class meeting for 10-605 2013 02 18|Scalable SGD and Hash Kernels]] | * Mon Feb 18. [[Class meeting for 10-605 2013 02 18|Scalable SGD and Hash Kernels]] | ||
* Wed Feb 20. '' Guest lecture: Chris Dyer. Scalable feature selection with Map-Reduce.'' | * Wed Feb 20. '' Guest lecture: Chris Dyer. Scalable feature selection with Map-Reduce.'' | ||
* Mon Feb 25. [[Class meeting for 10-605 2013 02 25|Background on randomized algorithms; Graph computations 1.]] | * Mon Feb 25. [[Class meeting for 10-605 2013 02 25|Background on randomized algorithms; Graph computations 1.]] | ||
+ | ** '''Hadoop assignment (Naive Bayes) due''' | ||
* Wed Feb 27. ''Tentative: GraphLab?'' | * Wed Feb 27. ''Tentative: GraphLab?'' | ||
Line 31: | Line 32: | ||
* Mon Mar 4. [[Class meeting for 10-605 2013 03 04|Learning on graphs 2]]. | * Mon Mar 4. [[Class meeting for 10-605 2013 03 04|Learning on graphs 2]]. | ||
− | ** '''Hadoop | + | ** '''Hadoop assignment (phrase-finding) due''' |
** ''New Assignment: memory-efficient SGD'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/sgd.pdf PDF writeup] | ** ''New Assignment: memory-efficient SGD'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/sgd.pdf PDF writeup] | ||
** ''New assignment: initial project proposals.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/initial-project-proposal.pdf PDF writeup] | ** ''New assignment: initial project proposals.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/initial-project-proposal.pdf PDF writeup] |
Revision as of 14:29, 4 January 2013
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013.
Contents
January
- Mon Jan 14. Overview of course, cost of various operations, asymptotic analysis.
- Wed Jan 16. Review of probabilities.
- Mon Jan 21. Streaming algorithms and Naive Bayes.
- New Assignment: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Wed Jan 23. The stream-and-sort design pattern; Naive Bayes revisited.
- Mon Jan 28. Messages and records 1; Phrase finding.
- Assignment due: streaming Naive Bayes 1 (with feature counts in memory).
- New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- Wed Jan 30. More on streaming algorithms: Rocchio, and theory of on-line learning
February
- Mon Feb 4. More on streaming algorithms: parallelized voted perceptrons.
- Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
- New Assignment: phrase finding with stream-and-sort. PDF Handout
- Wed Feb 6. Map-reduce and Hadoop 1.
- Mon Feb 11. Map-reduce and Hadoop 2.
- Assignment due 2/15: phrase finding with stream-and-sort
- New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop PDF Handout
- Wed Feb 13. Hadoop helpers and Scalable SGD
- Mon Feb 18. Scalable SGD and Hash Kernels
- Wed Feb 20. Guest lecture: Chris Dyer. Scalable feature selection with Map-Reduce.
- Mon Feb 25. Background on randomized algorithms; Graph computations 1.
- Hadoop assignment (Naive Bayes) due
- Wed Feb 27. Tentative: GraphLab?
March
- Mon Mar 4. Learning on graphs 2.
- Hadoop assignment (phrase-finding) due
- New Assignment: memory-efficient SGD PDF writeup
- New assignment: initial project proposals. PDF writeup
- Wed Mar 6. Guest lecture: John Wong (Google)
- Mon Mar 11. no class - spring break.
- Wed Mar 13. no class - spring break.
- Mon Mar 18. Subsampling a graph with RWR
- Assignment due: initial mini-project proposals.
- Assignment due: memory-efficient SGD
- New Assignment: Subsampling and visualizing a graph. PDF writeup
- Wed Mar 20. Semi-supervised learning via label propagation on graphs
- Mon Mar 25. Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs
- Assignment due: Subsampling and visualizing a graph.
- New Assignment: mini-project proposals (final version)
- Wed Mar 27. Understanding spectral clustering techniques.
- Assignment due: mini-project proposals (final version).
April and May
- Mon Apr 1. LDA-like models for text and graphs
- Wed Apr 3. To be decided
- Mon Apr 8. Speeding up LDA-like models: sampling and parallelization
- Wed Apr 10. Fast KNN and similarity joins 1.
- Mon Apr 15. Fast KNN and similarity joins 2.
- Wed Apr 17. Scaling up decision tree learning
- Mon Apr 22. SGD for matrix factorization and online LDA
- Wed Apr 24. Guest lecture, Evangelos Papalexakis, on Scalable Tensor Methods.
- Mon Apr 29. Project reports.
- Wed May 1. Project reports.
May
- Fri May 3.
- Project writeups due at 5:00pm. Submit a paper to Blackbook in PDF in the ICML 2013 format (up to 8pp double column), except, of course, do not submit it anonymously.