Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
Line 20: Line 20:
 
* Thus Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 
* Thus Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
 
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
 
----
 
 
 
* Tues Sep 27.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
* Tues Sep 27.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
** HW3 out: Naive Bays in GuineaPig. [https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view PDF Handout]
 
** HW3 out: Naive Bays in GuineaPig. [https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view PDF Handout]
 
* Thus Sep 29. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 
* Thus Sep 29. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
 
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
 +
 +
== October ==
 +
 
* Tues Oct 4. '''No class - Rosh Hashana.'''
 
* Tues Oct 4. '''No class - Rosh Hashana.'''
 
* Thus Oct 6. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
 
* Thus Oct 6. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
Line 32: Line 32:
 
* Thus Oct 13. [[Class meeting for 10-605 Advanced topics for SGD|More on parallel and streaming ML]]: Adaptive gradients, AllReduce, and Parameter Servers
 
* Thus Oct 13. [[Class meeting for 10-605 Advanced topics for SGD|More on parallel and streaming ML]]: Adaptive gradients, AllReduce, and Parameter Servers
 
** HW4 out: streaming logistic regression classifier [http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf PDF Handout]
 
** HW4 out: streaming logistic regression classifier [http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf PDF Handout]
 
+
* Tues Oct 18. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
stopped here
+
** Also, some exam review tips ([http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pptx ppt]
 
 
* Thus Oct 15. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
 
** For 805 students: the final project proposal is due.
 
* Tues Oct 20. Exam review tips ([http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pptx ppt], [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf pdf]) and guest lecture from '''Mark Torrance of RocketFuel'''
 
* Thus Oct 22. ''midterm exam''
 
 
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm - v1].  This document also references the relevant questions from two previous review sheets:
 
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm - v1].  This document also references the relevant questions from two previous review sheets:
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf Some review tips - modified from last year's exam review session]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf Some review tips - modified from last year's exam review session]
 +
** For 805 students: the final project proposal is due.
 +
* Thus Oct 20. ''midterm exam''
 +
* Tues Oct 25. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
 +
* Thus Oct 27. [[Class meeting for 10-605 Randomized|Randomized Algorithms 1]]
  
----
+
== November ==
  
* Tues Oct 27. [[Class meeting for 10-605 Randomized|Randomized Algorithms 1]]
+
* Tues Nov 1. [[Class meeting for 10-605 Randomized|Randomized Algorithms 2]]
* Thus Oct 29. [[Class meeting for 10-605 Randomized|Randomized Algorithms 2]]
 
 
** HW5 out: dSGD for modeling text ([https://drive.google.com/file/d/0BzQQ-spWKjhUYUM1LUVZakx0ZlE/view])
 
** HW5 out: dSGD for modeling text ([https://drive.google.com/file/d/0BzQQ-spWKjhUYUM1LUVZakx0ZlE/view])
* Tues Nov 3. Finish up with randomized algorithms.
+
* Thus Nov 3. Finish up with randomized algorithms.
* Thus Nov 5. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]  
+
* Thues Nov 8. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]  
 
* Tues Nov 10. [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
 
* Tues Nov 10. [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
* Thus Nov 12. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
+
* Tues Nov 15. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
 
** HW6 out: approximate pagerank for sampling a graph ([https://goo.gl/ThtRc6])
 
** HW6 out: approximate pagerank for sampling a graph ([https://goo.gl/ThtRc6])
* Tues Nov 17.  ''Guest lecture, Chris Dyer.'' [http://demo.clab.cs.cmu.edu/cdyer/bigdata-cuda.pdf Learning with GPUs].
+
* Thus Nov 17.  TBD
* Thus Nov 19. ''Guest lecture: Aurick Qiao'', parameter servers [http://curtis.ml.cmu.edu/w/courses/images/8/85/Aurick_release.pptx ppt slides].
+
* Tues Nov 22. TBD
* Tues Nov 24.  [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and other tricks]]
+
* Thus Nov 24. ''No class - happy Thanksgiving!''
 +
* Tues Nov 29.  [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and other tricks]]
 
** HW7 out: LDA with a param server ([http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf PDF handout])
 
** HW7 out: LDA with a param server ([http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf PDF handout])
* Thus Nov 26. ''Happy Thanksgiving!''
 
  
----
+
== December ==
  
* Tues Dec 1, Thus Dec 3. [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
+
* Thus Dec 1. [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
* Tues Dec 8.  Review and project presentations (15 min each):
+
* Tues Dec 6.  Review and project presentations (15 min each):
** Schedule:
 
*** Bhuwan Dingra/Yun Fu
 
*** Rohit Girdhar
 
*** Siddha Ganju/Sravya Popuri/Srikant Avasarala
 
*** Jingkun Gao/Yiming Gu
 
 
** HW7 due
 
** HW7 due
* Thus Dec 10.  In-class final exam.
+
* Thus Dec 8.  In-class exam.
 
* Tues Dec 15.  Writeup for 10-805 projects are due (at 11:59pm).
 
* Tues Dec 15.  Writeup for 10-805 projects are due (at 11:59pm).
  

Revision as of 17:24, 18 July 2016

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

note: this is under construction

September

October

November

December

  • Thus Dec 1. Graph models for large-scale ML
  • Tues Dec 6. Review and project presentations (15 min each):
    • HW7 due
  • Thus Dec 8. In-class exam.
  • Tues Dec 15. Writeup for 10-805 projects are due (at 11:59pm).

Topics covered in previous years but not in 2015