Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015"
From Cohen Courses
Jump to navigationJump to searchLine 29: | Line 29: | ||
* Thus Feb 12. '''student presentations''' | * Thus Feb 12. '''student presentations''' | ||
* Tues Feb 17. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]] | * Tues Feb 17. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]] | ||
− | ** '''HW3 due | + | ** '''HW3 due: Naive Bayes with Hadoop''' |
* Thus Feb 19. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD, plus another Hadoop demo]] | * Thus Feb 19. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD, plus another Hadoop demo]] | ||
* Tues Feb 24. [[Class meeting for 10-605 SGD for MF 2 and Randomized Algorithms|SGD for Matrix Factorization, and Randomized Algorithms 1 (Bloom Filters)]] | * Tues Feb 24. [[Class meeting for 10-605 SGD for MF 2 and Randomized Algorithms|SGD for Matrix Factorization, and Randomized Algorithms 1 (Bloom Filters)]] | ||
Line 39: | Line 39: | ||
* Tues Mar 3. '''student presentations''' | * Tues Mar 3. '''student presentations''' | ||
* Thus Mar 5. '''student presentations''' | * Thus Mar 5. '''student presentations''' | ||
− | ** '''HW4 due | + | ** '''HW4 due: Phrase-finding with Hadoop''' |
** ''HW5: memory-efficient SGD'' [http://curtis.ml.cmu.edu/w/courses/images/0/08/Sgd.pdf PDF handout] | ** ''HW5: memory-efficient SGD'' [http://curtis.ml.cmu.edu/w/courses/images/0/08/Sgd.pdf PDF handout] | ||
* Tues Mar 10. ''no class - spring break.'' | * Tues Mar 10. ''no class - spring break.'' | ||
Line 49: | Line 49: | ||
* Tues Mar 24. [[Class meeting for 10-605 SSL on Graphs|Subsamping continued and SSL on Graphs]] '''AAAI Spring Symposium week''' | * Tues Mar 24. [[Class meeting for 10-605 SSL on Graphs|Subsamping continued and SSL on Graphs]] '''AAAI Spring Symposium week''' | ||
* Thus Mar 26. [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]] '''AAAI Spring Symposium week''' | * Thus Mar 26. [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]] '''AAAI Spring Symposium week''' | ||
− | |||
* Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]] | * Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]] | ||
− | + | ** '''HW6 due: Subsampling and visualizing a graph.''' | |
== April == | == April == | ||
Line 59: | Line 58: | ||
* Thus Apr 9. [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]] | * Thus Apr 9. [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]] | ||
* Tues Apr 14. [[Class meeting for 10-605 Parallel Similarity Joins|Parallel/Scalable Similarity Joins]] | * Tues Apr 14. [[Class meeting for 10-605 Parallel Similarity Joins|Parallel/Scalable Similarity Joins]] | ||
− | + | ||
** ''New Assignment: Workflows with Pig'' [http://curtis.ml.cmu.edu/w/courses/images/4/46/Nb_pig.pdf PDF handout] | ** ''New Assignment: Workflows with Pig'' [http://curtis.ml.cmu.edu/w/courses/images/4/46/Nb_pig.pdf PDF handout] | ||
* Thus Apr 16. ''no class : carnival'' | * Thus Apr 16. ''no class : carnival'' |
Revision as of 16:48, 5 January 2015
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015.
Notes:
- The assignments are from 2014, and will be modified over the course of the semester - some may be changed substantially.
- Lecture notes and/or slides will be posted around the time of the lectures.
January
- Tues Jan 13. Overview of course, cost of various operations, asymptotic analysis.
- Thus Jan 15. Review of probabilities, joint distributions and naive Bayes
- HW1A: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Tues Jan 20. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- HW1B: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- Thus Jan 22. Messages and records 1; Phrase finding.
- Tues Jan 27. Phrase Finding and Rocchio
- HW1A and HW1B due.
- HW2: phrase finding with stream-and-sort. PDF Handout
- Thus Jan 29. Rocchio and Parallel Perceptrons
February
- Tues Feb 3. Perceptrons/Map-reduce and Hadoop.
- Thus Feb 5. Parallel Perceptrons.
- Tues Feb 10. student presentations
- Assignment due: phrase finding with stream-and-sort
- HW3,4: Naive Bayes with Streaming Hadoop, Naive Bayes with Hadoop & Phrase-finding with Hadoop. PDF Handout (4a) HW4 - warmup
PDF Handout (4b) HW4 PDF Handout (4c) HW5
- Thus Feb 12. student presentations
- Tues Feb 17. Scalable SGD and Hash Kernels
- HW3 due: Naive Bayes with Hadoop
- Thus Feb 19. Matrix Factorization and SGD, plus another Hadoop demo
- Tues Feb 24. SGD for Matrix Factorization, and Randomized Algorithms 1 (Bloom Filters)
- Thus Feb 26. Randomized Algorithms
March
- Tues Mar 3. student presentations
- Thus Mar 5. student presentations
- HW4 due: Phrase-finding with Hadoop
- HW5: memory-efficient SGD PDF handout
- Tues Mar 10. no class - spring break.
- Thus Mar 12. no class - spring break.
- Tues Mar 17. Scalable PageRank
- HW5 due: memory-efficient SGD
- HW6: Subsampling and visualizing a graph. PDF handout
- Thus Mar 19. Subsampling a graph with RWR
- Tues Mar 24. Subsamping continued and SSL on Graphs AAAI Spring Symposium week
- Thus Mar 26. Scalable spectral clustering techniques. AAAI Spring Symposium week
- Tues Mar 31. Sparse sampling and parallelization for LDA
- HW6 due: Subsampling and visualizing a graph.
April
- Thus Apr 2. Speeding up LDA-like models: All-reduce and online LDA
- Tues Apr 7. Workflows in PIG
- Thus Apr 9. Fast KNN and similarity joins
- Tues Apr 14. Parallel/Scalable Similarity Joins
- New Assignment: Workflows with Pig PDF handout
- Thus Apr 16. no class : carnival
- Tues Apr 21. Graph models for large-scale ML
- Assignment due: Workflows with Pig
- Thus Apr 23.
- Tues Apr 28. Exam review session.
- Thus Apr 30. In-class exam.