Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014"

From Cohen Courses
Jump to navigationJump to search
 
(31 intermediate revisions by 2 users not shown)
Line 28: Line 28:
 
* Mon Feb 17. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 
* Mon Feb 17. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 
** '''Assignment due: phrase finding with stream-and-sort'''
 
** '''Assignment due: phrase finding with stream-and-sort'''
** ''New Assignments: Naive Bayes with Streaming Hadoop,  Naive Bayes with Streaming Hadoop & Phrase-finding with Hadoop''. [http://curtis.ml.cmu.edu/w/courses/images/c/c0/Homework4a.pdf PDF Handout (4a)][http://curtis.ml.cmu.edu/w/courses/images/a/a2/Homework4b.pdf PDF Handout (4b)][http://curtis.ml.cmu.edu/w/courses/images/3/30/Homework4c.pdf PDF Handout (4c)]
+
** ''New Assignments: Naive Bayes with Streaming Hadoop,  Naive Bayes with Hadoop & Phrase-finding with Hadoop''. [http://curtis.ml.cmu.edu/w/courses/images/c/c0/Homework4a.pdf PDF Handout (4a)][http://curtis.ml.cmu.edu/w/courses/images/a/a2/Homework4b.pdf PDF Handout (4b)][http://curtis.ml.cmu.edu/w/courses/images/3/30/Homework4c.pdf PDF Handout (4c)]
 
* Wed Feb 19. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD, plus another Hadoop demo]]
 
* Wed Feb 19. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD, plus another Hadoop demo]]
 
* Fri Feb 21. ''Nothing due - the streaming run for Naive Bayes, 4(a), has been postponed till Monday.''
 
* Fri Feb 21. ''Nothing due - the streaming run for Naive Bayes, 4(a), has been postponed till Monday.''
Line 39: Line 39:
 
== March  ==
 
== March  ==
  
* Mon Mar 3. ''Guest Lecture: Garth Gibson, Cloud Computing and Programming Paradigms''
+
* Mon Mar 3. ''Guest Lecture: Garth Gibson, Cloud Computing and Programming Paradigms''  
* Wed Mar 5. ''Guest lecture: Alex Beutel, SGD on Hadoop''
+
** Slides: [http://www.cs.cmu.edu/~wcohen/10-605/garth-Intro.pptx Intro], [http://www.cs.cmu.edu/~wcohen/10-605/garth-MapReduce_majd.pdf Mapreduce], [http://www.cs.cmu.edu/~wcohen/10-605/garth-Programming.pptx Programming], [http://www.cs.cmu.edu/~wcohen/10-605/garth-UseCases.pptx Use Cases]
 +
* Wed Mar 5. ''Guest lecture: Alex Beutel, SGD on Hadoop''  
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/alex-beutel.pptx Slides]
 
* Fri Mar 7.  
 
* Fri Mar 7.  
 
** '''Hadoop assignment (phrase-finding) due'''
 
** '''Hadoop assignment (phrase-finding) due'''
Line 46: Line 48:
 
* Wed Mar 12. ''no class - spring break.''
 
* Wed Mar 12. ''no class - spring break.''
 
* Mon Mar 17. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]
 
* Mon Mar 17. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]
** ''New Assignment: memory-efficient SGD'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/sgd.pdf PDF writeup (draft)]
+
** ''New Assignment: memory-efficient SGD'' [http://curtis.ml.cmu.edu/w/courses/images/0/08/Sgd.pdf PDF handout]
 
* Wed Mar 19. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]]
 
* Wed Mar 19. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]]
* Mon Mar 24. [[Class meeting for 10-605 SSL LP 2|Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs]]
+
* Mon Mar 24. [[Class meeting for 10-605 SSL on Graphs|Subsamping continued and SSL on Graphs]]
* Wed Mar 26. [[Class meeting for 10-605 Spectral Clustering|Understanding spectral clustering techniques.]]
+
* Wed Mar 26. [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
 +
** <strike>Assignment due: memory-efficient SGD</strike> delayed to Mon 3/31
 +
* Mon Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
 
** '''Assignment due: memory-efficient SGD'''
 
** '''Assignment due: memory-efficient SGD'''
** ''New Assignment: Subsampling and visualizing a graph.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/snowball.pdf PDF writeup]
+
** ''New Assignment: Subsampling and visualizing a graph.'' [http://curtis.ml.cmu.edu/w/courses/images/e/eb/ApproxPageRank.pdf PDF handout]
* Mon Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
 
  
 
== April and May ==
 
== April and May ==
  
 
* Wed Apr 2. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and online LDA]]
 
* Wed Apr 2. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and online LDA]]
 +
* Mon Apr 7. [[Class meeting for 10-605 PIG|Workflows in PIG]]
 +
* Wed Apr 9. [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 +
* Mon Apr 14.  [[Class meeting for 10-605 Parallel Similarity Joins|Parallel/Scalable Similarity Joins]]
 
** '''Assignment due: Subsampling and visualizing a graph.'''
 
** '''Assignment due: Subsampling and visualizing a graph.'''
** ''New Assignment: TBA''
+
** ''New Assignment: Workflows with Pig'' [http://curtis.ml.cmu.edu/w/courses/images/4/46/Nb_pig.pdf PDF handout]
* Mon Apr 7. [[Class meeting for 10-605 Fast KNN 1|Fast KNN and similarity joins 1.]]
+
* Wed Apr 16. [[Class meeting for 10-605 First-Order Logics|First-order logics]]
* Wed Apr 9. [[Class meeting for 10-605 Fast KNN 2|Fast KNN and similarity joins 2.]]
+
* Mon Apr 21. [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
* Mon Apr 14. [[Class meeting for 10-605 Decision Trees|Scaling up decision tree learning]]
+
* Wed Apr 23.   [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
* Wed Apr 16. [[Class meeting for 10-605 Gradient Boosting|Gradient boosting with trees]]
+
** '''Assignment due: Workflows with Pig'''
** '''Assignment due: TBA'''
+
* Mon Apr 28. Exam review session.  
** ''New Assignment: TBA''
+
** [http://curtis.ml.cmu.edu/w/courses/images/0/0a/Practice_questions.pdf PDF practice questions]
* Mon Apr 21. TBA
+
** [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pptx Review session slides]
* Wed Apr 23. TBA
 
* Mon Apr 28. Exam review session.
 
** '''Assignment due: TBA'''
 
 
* Wed Apr 30. In-class exam.
 
* Wed Apr 30. In-class exam.

Latest revision as of 17:09, 2 June 2014

This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014.

Notes:

  • The assignments are from 2013, and will be modified over the course of the semester - some may be changed substantially.
  • Lecture notes will be posted around the time of the lectures.

January

February

March

April and May