Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015"

From Cohen Courses
Jump to navigationJump to search
 
(20 intermediate revisions by 3 users not shown)
Line 5: Line 5:
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
  
 +
Schedule:
 
* Tues Sep 1. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]]
 
* Tues Sep 1. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]]
 
* Thus Sep 3. [[Class meeting for 10-605 Probability Review|Review of probabilities, joint distributions and naive Bayes]]
 
* Thus Sep 3. [[Class meeting for 10-605 Probability Review|Review of probabilities, joint distributions and naive Bayes]]
Line 10: Line 11:
 
** HW1 out: streaming naive Bayes in Java. [https://s3.amazonaws.com/vincy/10605-15Fall/HW1_StreamingNB.pdf PDF Handout]
 
** HW1 out: streaming naive Bayes in Java. [https://s3.amazonaws.com/vincy/10605-15Fall/HW1_StreamingNB.pdf PDF Handout]
 
* Thus Sep 10. [[Class meeting for 10-605 Phrase Finding|Phrase Finding]]
 
* Thus Sep 10. [[Class meeting for 10-605 Phrase Finding|Phrase Finding]]
* Tues Sep 15. [[Class meeting for 10-605 Phrases_with_Stream_and_Sort|Implementing Phrase Finding and Large-Data Testing for Naive Bayes with Stream-and-Sort]]
+
* Tues Sep 15. [[Class meeting for 10-605 Phrases_with_Stream_and_Sort|Implementing Phrase Finding and Large-Data Testing for Naive Bayes with Stream-and-Sort]].
 +
** Lecture also discusses: map-reduce abstractions/dataflow
 
** Also: Guest lecture from Manik Varma, MSR.
 
** Also: Guest lecture from Manik Varma, MSR.
 
* Thus Sep 17. [[Class_meeting_for_10-605_Hadoop_Overview|Hadoop Overview]]
 
* Thus Sep 17. [[Class_meeting_for_10-605_Hadoop_Overview|Hadoop Overview]]
 
** HW2 out: naive Bayes training on Hadoop in Java. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view PDF Handout]
 
** HW2 out: naive Bayes training on Hadoop in Java. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view PDF Handout]
 
* Tues Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 
* Tues Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 +
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
  
 
----
 
----
Line 30: Line 33:
 
* Tues Oct 20. Exam review tips ([http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pptx ppt], [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf pdf]) and guest lecture from '''Mark Torrance of RocketFuel'''
 
* Tues Oct 20. Exam review tips ([http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pptx ppt], [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf pdf]) and guest lecture from '''Mark Torrance of RocketFuel'''
 
* Thus Oct 22. ''midterm exam''
 
* Thus Oct 22. ''midterm exam''
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm - v1].  This document also references the relevant questions from two previous review sheets:
+
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm - from 2015].  This document also identicies relevant questions from two previous review sheets:
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
Line 39: Line 42:
 
* Tues Oct 27. [[Class meeting for 10-605 Randomized|Randomized Algorithms 1]]
 
* Tues Oct 27. [[Class meeting for 10-605 Randomized|Randomized Algorithms 1]]
 
* Thus Oct 29. [[Class meeting for 10-605 Randomized|Randomized Algorithms 2]]
 
* Thus Oct 29. [[Class meeting for 10-605 Randomized|Randomized Algorithms 2]]
** HW5 out: (tentatively) dSGD for modeling text
+
** HW5 out: dSGD for modeling text ([https://drive.google.com/file/d/0BzQQ-spWKjhUYUM1LUVZakx0ZlE/view])
* Tues Nov 3. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]
+
* Tues Nov 3. Finish up with randomized algorithms.
* Thus Nov 5. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]]
+
* Thus Nov 5. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]  
 
* Tues Nov 10. [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
 
* Tues Nov 10. [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
** HW6 out: (tentatively) sDSG for collaborative filtering
+
* Thus Nov 12. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
* Thus Nov 12. [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
+
** HW6 out: approximate pagerank for sampling a graph ([https://goo.gl/ThtRc6])
* Tues Nov 17.  ''Guest lecture, Chris Dyer.'' Learning with GPUs.
+
* Tues Nov 17.  ''Guest lecture, Chris Dyer.'' [http://demo.clab.cs.cmu.edu/cdyer/bigdata-cuda.pdf Learning with GPUs].
* Thus Nov 19. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
+
* Thus Nov 19. ''Guest lecture: Aurick Qiao'', parameter servers [http://curtis.ml.cmu.edu/w/courses/images/8/85/Aurick_release.pptx ppt slides].
 
* Tues Nov 24.  [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and other tricks]]
 
* Tues Nov 24.  [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and other tricks]]
** HW7 out: LDA with a param server
+
** HW7 out: LDA with a param server ([http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf PDF handout])
 
* Thus Nov 26. ''Happy Thanksgiving!''
 
* Thus Nov 26. ''Happy Thanksgiving!''
  
 
----
 
----
  
* Tues Dec 1. [[Class meeting for 10-605 First-Order Logics|First-order logics]]
+
* Tues Dec 1, Thus Dec 3.  [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
* Thus Dec 3.  [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
+
* Tues Dec 8. Review and project presentations (15 min each):
* Tues Dec 8.   [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
+
** Schedule:
 +
*** Bhuwan Dingra/Yun Fu
 +
*** Rohit Girdhar
 +
*** Siddha Ganju/Sravya Popuri/Srikant Avasarala
 +
*** Jingkun Gao/Yiming Gu
 
** HW7 due
 
** HW7 due
 
* Thus Dec 10.  In-class final exam.
 
* Thus Dec 10.  In-class final exam.
 +
* Tues Dec 15.  Writeup for 10-805 projects are due (at 11:59pm).
  
 
== Topics covered in previous years but not in 2015 ==
 
== Topics covered in previous years but not in 2015 ==
  
 +
*  [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
 
* [[Class meeting for 10-605 PIG|Workflows in PIG]]
 
* [[Class meeting for 10-605 PIG|Workflows in PIG]]
 
* [[Class meeting for 10-605 Phase Finding|Phrase Finding]]
 
* [[Class meeting for 10-605 Phase Finding|Phrase Finding]]
Line 67: Line 76:
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]
 +
* [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]

Latest revision as of 10:07, 11 October 2016

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

Schedule:




  • Tues Dec 1, Thus Dec 3. Graph models for large-scale ML
  • Tues Dec 8. Review and project presentations (15 min each):
    • Schedule:
      • Bhuwan Dingra/Yun Fu
      • Rohit Girdhar
      • Siddha Ganju/Sravya Popuri/Srikant Avasarala
      • Jingkun Gao/Yiming Gu
    • HW7 due
  • Thus Dec 10. In-class final exam.
  • Tues Dec 15. Writeup for 10-805 projects are due (at 11:59pm).

Topics covered in previous years but not in 2015