Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012"
From Cohen Courses
Jump to navigationJump to search(75 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | This is the syllabus for [[Machine Learning with Large Datasets 10-605 in Spring 2012]]. '''If you're taking 10-605 now, you're probably looking for the syllabus for [[Machine Learning with Large Datasets 10-605 in Spring 2013]].''' | |
− | + | == January == | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | * Tues Jan 17. [[Class meeting for 10-605 2012 01 17|Overview of course, cost of various operations, asymptotic analysis.]] | |
+ | * Thus Jan 19. [[Class meeting for 10-605 2012 01 19|Review of probabilities.]] | ||
+ | * Tues Jan 24. [[Class meeting for 10-605 2012 01 24|Streaming algorithms and Naive Bayes.]] | ||
+ | ** ''New Assignment: streaming Naive Bayes 1 (with feature counts in memory)''. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf PDF Handout] | ||
+ | * Thus Jan 26. [[Class meeting for 10-605 2012 01 26|The stream-and-sort design pattern; Naive Bayes revisited.]] | ||
+ | * Tues Jan 31. [[Class meeting for 10-605 2012 01 31|Messages and records 1; Phrase finding.]] | ||
+ | ** '''Assignment due: streaming Naive Bayes 1 (with feature counts in memory)'''. | ||
+ | ** ''New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf PDF Handout] | ||
− | + | == February == | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | * | + | * Thus Feb 2. [[Class meeting for 10-605 2012 02 02|More on streaming algorithms: Rocchio, and theory of on-line learning]] |
− | ** | + | * Tues Feb 7. [[Class meeting for 10-605 2012 02 07|More on streaming algorithms: parallelized voted perceptrons.]] |
− | * | + | ** '''Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''' |
− | * | + | ** ''New Assignment: phrase finding with stream-and-sort''. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/phrases.pdf PDF Handout] |
− | ** | + | * Thus Feb 9. [[Class meeting for 10-605 2012 02 09|Map-reduce and Hadoop 1 (Alona lecture)]]. |
− | ** | + | * Tues Feb 14. [[Class meeting for 10-605 2012 02 14|Map-reduce and Hadoop 2. (Alona lecture, William is closer)]]. |
− | *** | + | ** '''Assignment due 2/15: phrase finding with stream-and-sort''' |
− | ** | + | ** ''New Assignment: Naive Bayes with Hadoop & Phrase-finding with Hadoop'' [http://www.cs.cmu.edu/~afyshe/Assignment4.pdf PDF Handout] |
− | ** | + | * Thus Feb 16. [[Class meeting for 10-605 2012 02 16|Hadoop helpers and Scalable SGD]] |
− | * | + | * Tues Feb 21. [[Class meeting for 10-605 2012 02 21|Scalable SGD and Hash Kernels]] |
+ | * Thus Feb 23. ''Guest lecture'': [http://www.cs.umass.edu/~ronb/ Ron Bekkerman], LinkedIn, Scaling up Machine Learning | ||
+ | ** [http://www.cs.cmu.edu/~wcohen/10-605/2012-02-23-bekkerman.pptx Ron's slides in Powerpoint] | ||
+ | ** [http://www.cs.cmu.edu/~wcohen/10-605/2012-02-23-bekkerman.pdf Ron's slides in PDF] | ||
+ | * Tues Feb 28. [[Class meeting for 10-605 2012 02 28|Background on randomized algorithms; Graph computations 1.]] | ||
− | + | == March == | |
− | |||
− | |||
− | |||
− | * | + | * Thus Mar 1. ''Guest Lecture'': Ben van Durme, JHU, Randomized Algorithms for Large-Scale Learning |
− | ** | + | * Tues Mar 6. [[Class meeting for 10-605 2012 03 06|Learning on graphs 2]]. |
− | ** | + | ** '''Hadoop assignments due''' |
− | ** | + | ** ''New Assignment: memory-efficient SGD'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/sgd.pdf PDF writeup] |
− | ** | + | ** ''New assignment: initial project proposals.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/initial-project-proposal.pdf PDF writeup] |
− | ** ''' | + | * Thus Mar 8. ''Guest Lecture'': Joey Gonzales, CMU, GraphLab and Dynamic Asynchronous Computation [http://www.cs.cmu.edu/~jegonzal/talks/biglearning_with_graphs.pptx PPT slides] |
+ | * Tues Mar 13. ''no class - spring break.'' | ||
+ | * Thus Mar 15. ''no class - spring break.'' | ||
+ | * Tues Mar 20. [[Class meeting for 10-605 2012 03 20|Subsampling a graph with RWR]] | ||
+ | ** '''Assignment due: initial mini-project proposals.''' | ||
+ | ** '''Assignment due: memory-efficient SGD''' | ||
+ | ** ''New Assignment: Subsampling and visualizing a graph.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/snowball.pdf PDF writeup] | ||
+ | * Thus Mar 22. [[Class meeting for 10-605 2012 03 22|Semi-supervised learning via label propagation on graphs]] | ||
+ | * Tues Mar 27. [[Class meeting for 10-605 2012 03 27|Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs]] | ||
+ | ** '''Assignment due: Subsampling and visualizing a graph.''' | ||
+ | ** ''New Assignment: mini-project proposals (final version)'' | ||
+ | * Thus Mar 29. [[Class meeting for 10-605 2012 03 29|Understanding spectral clustering techniques.]] | ||
+ | ** '''Assignment due: mini-project proposals (final version).''' | ||
+ | |||
+ | == April == | ||
+ | |||
+ | * Tues Apr 3. [[Class meeting for 10-605 2012 04 03|LDA-like models for text and graphs]]; guest lecture from Partha Talukdar | ||
+ | * Thus Apr 5. Tentative: Guest lecture by U Kang, CMU. | ||
+ | * Tues Apr 10. [[Class meeting for 10-605 2012 04 10|Speeding up LDA-like models: sampling and parallelization]] | ||
+ | * Thus Apr 12. [[Class meeting for 10-605 2012 04 12|Fast KNN and similarity joins 1.]] | ||
+ | * Tues Apr 17. [[Class meeting for 10-605 2012 04 17|Fast KNN and similarity joins 2.]] | ||
+ | * Thus Apr 19. ''no class - Carnival'' | ||
+ | * Tues Apr 24. [[Class meeting for 10-605 2012 04 14|SGD for matrix factorization and online LDA]] | ||
+ | * Thus Apr 26. [[Class meeting for 10-605 2012 04 16|Scaling up decision tree learning]] | ||
+ | |||
+ | == May == | ||
+ | |||
+ | * Tues May 1. Project reports. | ||
+ | * Thus May 3. Project reports. | ||
+ | * Fri May 4. | ||
+ | ** '''Project writeups due at 5:00pm'''. Submit a paper to Blackbook in PDF in the [http://icml.cc/2012/author-instructions/ ICML 2012 format] (up to 8pp double column), except, of course, do not submit it anonymously. |
Latest revision as of 09:48, 28 March 2013
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012. If you're taking 10-605 now, you're probably looking for the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013.
Contents
January
- Tues Jan 17. Overview of course, cost of various operations, asymptotic analysis.
- Thus Jan 19. Review of probabilities.
- Tues Jan 24. Streaming algorithms and Naive Bayes.
- New Assignment: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Thus Jan 26. The stream-and-sort design pattern; Naive Bayes revisited.
- Tues Jan 31. Messages and records 1; Phrase finding.
- Assignment due: streaming Naive Bayes 1 (with feature counts in memory).
- New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
February
- Thus Feb 2. More on streaming algorithms: Rocchio, and theory of on-line learning
- Tues Feb 7. More on streaming algorithms: parallelized voted perceptrons.
- Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
- New Assignment: phrase finding with stream-and-sort. PDF Handout
- Thus Feb 9. Map-reduce and Hadoop 1 (Alona lecture).
- Tues Feb 14. Map-reduce and Hadoop 2. (Alona lecture, William is closer).
- Assignment due 2/15: phrase finding with stream-and-sort
- New Assignment: Naive Bayes with Hadoop & Phrase-finding with Hadoop PDF Handout
- Thus Feb 16. Hadoop helpers and Scalable SGD
- Tues Feb 21. Scalable SGD and Hash Kernels
- Thus Feb 23. Guest lecture: Ron Bekkerman, LinkedIn, Scaling up Machine Learning
- Tues Feb 28. Background on randomized algorithms; Graph computations 1.
March
- Thus Mar 1. Guest Lecture: Ben van Durme, JHU, Randomized Algorithms for Large-Scale Learning
- Tues Mar 6. Learning on graphs 2.
- Hadoop assignments due
- New Assignment: memory-efficient SGD PDF writeup
- New assignment: initial project proposals. PDF writeup
- Thus Mar 8. Guest Lecture: Joey Gonzales, CMU, GraphLab and Dynamic Asynchronous Computation PPT slides
- Tues Mar 13. no class - spring break.
- Thus Mar 15. no class - spring break.
- Tues Mar 20. Subsampling a graph with RWR
- Assignment due: initial mini-project proposals.
- Assignment due: memory-efficient SGD
- New Assignment: Subsampling and visualizing a graph. PDF writeup
- Thus Mar 22. Semi-supervised learning via label propagation on graphs
- Tues Mar 27. Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs
- Assignment due: Subsampling and visualizing a graph.
- New Assignment: mini-project proposals (final version)
- Thus Mar 29. Understanding spectral clustering techniques.
- Assignment due: mini-project proposals (final version).
April
- Tues Apr 3. LDA-like models for text and graphs; guest lecture from Partha Talukdar
- Thus Apr 5. Tentative: Guest lecture by U Kang, CMU.
- Tues Apr 10. Speeding up LDA-like models: sampling and parallelization
- Thus Apr 12. Fast KNN and similarity joins 1.
- Tues Apr 17. Fast KNN and similarity joins 2.
- Thus Apr 19. no class - Carnival
- Tues Apr 24. SGD for matrix factorization and online LDA
- Thus Apr 26. Scaling up decision tree learning
May
- Tues May 1. Project reports.
- Thus May 3. Project reports.
- Fri May 4.
- Project writeups due at 5:00pm. Submit a paper to Blackbook in PDF in the ICML 2012 format (up to 8pp double column), except, of course, do not submit it anonymously.