Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014"
From Cohen Courses
Jump to navigationJump to search (→March) |
|||
(84 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | This is the syllabus for [[Machine Learning with Large Datasets 10-605 in Spring 2014]]. | + | This is the syllabus for [[Machine Learning with Large Datasets 10-605 in Spring 2014]]. |
+ | |||
+ | Notes: | ||
+ | * The assignments are from 2013, and will be modified over the course of the semester - some may be changed substantially. | ||
+ | * Lecture notes will be posted around the time of the lectures. | ||
== January == | == January == | ||
* Mon Jan 13. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]] | * Mon Jan 13. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]] | ||
− | * Wed Jan 15. [[Class meeting for 10-605 Probability Review|Review of probabilities | + | * Wed Jan 15. [[Class meeting for 10-605 Probability Review|Review of probabilities, joint distributions and naive Bayes]] |
− | * Mon Jan 20. | + | * Mon Jan 20. ''No class - Martin Luther King Day.'' |
* Wed Jan 22. [[Class meeting for 10-605 Streaming Naive Bayes|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]] | * Wed Jan 22. [[Class meeting for 10-605 Streaming Naive Bayes|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]] | ||
− | ** ''New Assignment: streaming Naive Bayes 1 (with feature counts in memory)''. [http:// | + | ** ''New Assignment: streaming Naive Bayes 1 (with feature counts in memory)''. [http://curtis.ml.cmu.edu/w/courses/images/6/6d/Hashtable-nb.pdf PDF Handout] |
* Mon Jan 27. [[Class meeting for 10-605 Phase Finding|Messages and records 1; Phrase finding.]] | * Mon Jan 27. [[Class meeting for 10-605 Phase Finding|Messages and records 1; Phrase finding.]] | ||
** '''Assignment due: streaming Naive Bayes 1 (with feature counts in memory)'''. | ** '''Assignment due: streaming Naive Bayes 1 (with feature counts in memory)'''. | ||
− | ** ''New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''. [http:// | + | * Wed Jan 29. [[Class meeting for 10-605 Rocchio and On-line Learning|Phrase Finding and Rocchio]] |
− | * | + | ** ''New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''. [http://curtis.ml.cmu.edu/w/courses/images/0/0d/Stream-nb.pdf PDF Handout] |
+ | * Thursday Jan 30. Scheduled '''down-time for the wiki host'''. (Obviously, it's up again now!) | ||
== February == | == February == | ||
− | * Mon Feb 3. [[Class meeting for 10-605 Parallel Perceptrons| | + | * Mon Feb 3. [[Class meeting for 10-605 Parallel Perceptrons|Rocchio and Parallel Perceptrons]] |
+ | * Wed Feb 5. [[Class meeting for 10-605 Hadoop 1|Perceptrons/Map-reduce and Hadoop]]. | ||
** '''Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''' | ** '''Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''' | ||
− | ** ''New Assignment: phrase finding with stream-and-sort''. [http:// | + | ** ''New Assignment: phrase finding with stream-and-sort''. [http://curtis.ml.cmu.edu/w/courses/images/5/5e/Phrases.pdf PDF Handout] |
− | * | + | * Mon Feb 10. [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons]]. |
− | * | + | * Wed Feb 12. ''Guest lecture: Matt Hurst, Microsoft/Bing: Local Search at Bing''. One-on-one meetings with Matt can be scheduled for Thursday 12/13 between 9-12 in Gates-Hillman 6501, afternoon meetings 12:30-1:30pm in '''Gates-Hillman 6002'''. |
− | * | + | * Mon Feb 17. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]] |
** '''Assignment due: phrase finding with stream-and-sort''' | ** '''Assignment due: phrase finding with stream-and-sort''' | ||
− | ** ''New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop''. [http:// | + | ** ''New Assignments: Naive Bayes with Streaming Hadoop, Naive Bayes with Hadoop & Phrase-finding with Hadoop''. [http://curtis.ml.cmu.edu/w/courses/images/c/c0/Homework4a.pdf PDF Handout (4a)][http://curtis.ml.cmu.edu/w/courses/images/a/a2/Homework4b.pdf PDF Handout (4b)][http://curtis.ml.cmu.edu/w/courses/images/3/30/Homework4c.pdf PDF Handout (4c)] |
− | * | + | * Wed Feb 19. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD, plus another Hadoop demo]] |
− | * | + | * Fri Feb 21. ''Nothing due - the streaming run for Naive Bayes, 4(a), has been postponed till Monday.'' |
− | ** '''Streaming run on Hadoop of Naive Bayes due''' | + | * Mon Feb 24. [[Class meeting for 10-605 SGD for MF 2 and Randomized Algorithms|SGD for Matrix Factorization, and Randomized Algorithms 1 (Bloom Filters)]] |
− | * | + | ** '''Streaming run on Hadoop of Naive Bayes due''' |
− | * | + | * Wed Feb 26. [[Class meeting for 10-605 Graphs 2|Randomized Algorithms]] |
− | ** '''Hadoop | + | * Fri Feb 28. |
+ | ** '''Non-streaming run on Hadoop of Naive Bayes due.''' | ||
== March == | == March == | ||
− | * Mon Mar 3. | + | * Mon Mar 3. ''Guest Lecture: Garth Gibson, Cloud Computing and Programming Paradigms'' |
− | * Wed Mar | + | ** Slides: [http://www.cs.cmu.edu/~wcohen/10-605/garth-Intro.pptx Intro], [http://www.cs.cmu.edu/~wcohen/10-605/garth-MapReduce_majd.pdf Mapreduce], [http://www.cs.cmu.edu/~wcohen/10-605/garth-Programming.pptx Programming], [http://www.cs.cmu.edu/~wcohen/10-605/garth-UseCases.pptx Use Cases] |
+ | * Wed Mar 5. ''Guest lecture: Alex Beutel, SGD on Hadoop'' | ||
+ | ** [http://www.cs.cmu.edu/~wcohen/10-605/alex-beutel.pptx Slides] | ||
+ | * Fri Mar 7. | ||
** '''Hadoop assignment (phrase-finding) due''' | ** '''Hadoop assignment (phrase-finding) due''' | ||
− | |||
− | |||
* Mon Mar 10. ''no class - spring break.'' | * Mon Mar 10. ''no class - spring break.'' | ||
* Wed Mar 12. ''no class - spring break.'' | * Wed Mar 12. ''no class - spring break.'' | ||
− | * Mon Mar 17. [[Class meeting for 10-605 Subsample A Graph|Subsampling a graph with RWR]] | + | * Mon Mar 17. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]] |
− | * Wed Mar | + | ** ''New Assignment: memory-efficient SGD'' [http://curtis.ml.cmu.edu/w/courses/images/0/08/Sgd.pdf PDF handout] |
+ | * Wed Mar 19. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]] | ||
+ | * Mon Mar 24. [[Class meeting for 10-605 SSL on Graphs|Subsamping continued and SSL on Graphs]] | ||
+ | * Wed Mar 26. [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]] | ||
+ | ** <strike>Assignment due: memory-efficient SGD</strike> delayed to Mon 3/31 | ||
+ | * Mon Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]] | ||
** '''Assignment due: memory-efficient SGD''' | ** '''Assignment due: memory-efficient SGD''' | ||
− | ** ''New Assignment: Subsampling and visualizing a graph.'' [http:// | + | ** ''New Assignment: Subsampling and visualizing a graph.'' [http://curtis.ml.cmu.edu/w/courses/images/e/eb/ApproxPageRank.pdf PDF handout] |
− | |||
− | |||
− | |||
== April and May == | == April and May == | ||
− | * | + | * Wed Apr 2. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and online LDA]] |
− | * Wed Apr | + | * Mon Apr 7. [[Class meeting for 10-605 PIG|Workflows in PIG]] |
+ | * Wed Apr 9. [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]] | ||
+ | * Mon Apr 14. [[Class meeting for 10-605 Parallel Similarity Joins|Parallel/Scalable Similarity Joins]] | ||
** '''Assignment due: Subsampling and visualizing a graph.''' | ** '''Assignment due: Subsampling and visualizing a graph.''' | ||
− | ** ''New Assignment: | + | ** ''New Assignment: Workflows with Pig'' [http://curtis.ml.cmu.edu/w/courses/images/4/46/Nb_pig.pdf PDF handout] |
− | * | + | * Wed Apr 16. [[Class meeting for 10-605 First-Order Logics|First-order logics]] |
− | + | * Mon Apr 21. [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]] | |
− | * Mon Apr | + | * Wed Apr 23. [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]] |
− | + | ** '''Assignment due: Workflows with Pig''' | |
− | * Wed Apr | + | * Mon Apr 28. Exam review session. |
− | ** '''Assignment due: | + | ** [http://curtis.ml.cmu.edu/w/courses/images/0/0a/Practice_questions.pdf PDF practice questions] |
− | * | + | ** [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pptx Review session slides] |
− | + | * Wed Apr 30. In-class exam. | |
− | |||
− | |||
− | |||
− | * Wed Apr | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 17:09, 2 June 2014
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014.
Notes:
- The assignments are from 2013, and will be modified over the course of the semester - some may be changed substantially.
- Lecture notes will be posted around the time of the lectures.
Contents
January
- Mon Jan 13. Overview of course, cost of various operations, asymptotic analysis.
- Wed Jan 15. Review of probabilities, joint distributions and naive Bayes
- Mon Jan 20. No class - Martin Luther King Day.
- Wed Jan 22. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- New Assignment: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Mon Jan 27. Messages and records 1; Phrase finding.
- Assignment due: streaming Naive Bayes 1 (with feature counts in memory).
- Wed Jan 29. Phrase Finding and Rocchio
- New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- Thursday Jan 30. Scheduled down-time for the wiki host. (Obviously, it's up again now!)
February
- Mon Feb 3. Rocchio and Parallel Perceptrons
- Wed Feb 5. Perceptrons/Map-reduce and Hadoop.
- Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
- New Assignment: phrase finding with stream-and-sort. PDF Handout
- Mon Feb 10. Parallel Perceptrons.
- Wed Feb 12. Guest lecture: Matt Hurst, Microsoft/Bing: Local Search at Bing. One-on-one meetings with Matt can be scheduled for Thursday 12/13 between 9-12 in Gates-Hillman 6501, afternoon meetings 12:30-1:30pm in Gates-Hillman 6002.
- Mon Feb 17. Scalable SGD and Hash Kernels
- Assignment due: phrase finding with stream-and-sort
- New Assignments: Naive Bayes with Streaming Hadoop, Naive Bayes with Hadoop & Phrase-finding with Hadoop. PDF Handout (4a)PDF Handout (4b)PDF Handout (4c)
- Wed Feb 19. Matrix Factorization and SGD, plus another Hadoop demo
- Fri Feb 21. Nothing due - the streaming run for Naive Bayes, 4(a), has been postponed till Monday.
- Mon Feb 24. SGD for Matrix Factorization, and Randomized Algorithms 1 (Bloom Filters)
- Streaming run on Hadoop of Naive Bayes due
- Wed Feb 26. Randomized Algorithms
- Fri Feb 28.
- Non-streaming run on Hadoop of Naive Bayes due.
March
- Mon Mar 3. Guest Lecture: Garth Gibson, Cloud Computing and Programming Paradigms
- Slides: Intro, Mapreduce, Programming, Use Cases
- Wed Mar 5. Guest lecture: Alex Beutel, SGD on Hadoop
- Fri Mar 7.
- Hadoop assignment (phrase-finding) due
- Mon Mar 10. no class - spring break.
- Wed Mar 12. no class - spring break.
- Mon Mar 17. Scalable PageRank
- New Assignment: memory-efficient SGD PDF handout
- Wed Mar 19. Subsampling a graph with RWR
- Mon Mar 24. Subsamping continued and SSL on Graphs
- Wed Mar 26. Scalable spectral clustering techniques.
Assignment due: memory-efficient SGDdelayed to Mon 3/31
- Mon Mar 31. Sparse sampling and parallelization for LDA
- Assignment due: memory-efficient SGD
- New Assignment: Subsampling and visualizing a graph. PDF handout
April and May
- Wed Apr 2. Speeding up LDA-like models: All-reduce and online LDA
- Mon Apr 7. Workflows in PIG
- Wed Apr 9. Fast KNN and similarity joins
- Mon Apr 14. Parallel/Scalable Similarity Joins
- Assignment due: Subsampling and visualizing a graph.
- New Assignment: Workflows with Pig PDF handout
- Wed Apr 16. First-order logics
- Mon Apr 21. Scalable First-order logics
- Wed Apr 23. Graph models for large-scale ML
- Assignment due: Workflows with Pig
- Mon Apr 28. Exam review session.
- Wed Apr 30. In-class exam.