Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016. Notes: * Homeworks, unless otherwise posted, will be due when the next HW comes out....")
 
Line 8: Line 8:
  
 
Schedule:
 
Schedule:
* Tues Sep 1. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]]
+
* Thus Sep 1. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]]
* Thus Sep 3. [[Class meeting for 10-605 Probability Review|Review of probabilities, joint distributions and naive Bayes]]
+
* Tues Sep 6. [[Class meeting for 10-605 Probability Review|Review of probabilities, joint distributions and naive Bayes]]
* Tues Sep 8.  [[Class meeting for 10-605 Streaming Naive Bayes|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]
+
* Thus Sep 8.  [[Class meeting for 10-605 Streaming Naive Bayes|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]
 
** HW1 out: streaming naive Bayes in Java. [https://s3.amazonaws.com/vincy/10605-15Fall/HW1_StreamingNB.pdf PDF Handout]
 
** HW1 out: streaming naive Bayes in Java. [https://s3.amazonaws.com/vincy/10605-15Fall/HW1_StreamingNB.pdf PDF Handout]
* Thus Sep 10. [[Class meeting for 10-605 Phrase Finding|Phrase Finding]]
+
* Tues Sep 13. [[Class meeting for 10-605 Phrase Finding|Phrase Finding]]
* Tues Sep 15. [[Class meeting for 10-605 Phrases_with_Stream_and_Sort|Implementing Phrase Finding and Large-Data Testing for Naive Bayes with Stream-and-Sort]].
+
* Thus Sep 15. [[Class meeting for 10-605 Phrases_with_Stream_and_Sort|Implementing Phrase Finding and Large-Data Testing for Naive Bayes with Stream-and-Sort]].
 
** Lecture also discusses: map-reduce abstractions/dataflow
 
** Lecture also discusses: map-reduce abstractions/dataflow
** Also: Guest lecture from Manik Varma, MSR.
+
* Tues Sep 20. [[Class_meeting_for_10-605_Hadoop_Overview|Hadoop Overview]]
* Thus Sep 17. [[Class_meeting_for_10-605_Hadoop_Overview|Hadoop Overview]]
 
 
** HW2 out: naive Bayes training on Hadoop in Java. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view PDF Handout]
 
** HW2 out: naive Bayes training on Hadoop in Java. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view PDF Handout]
* Tues Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
+
* Thus Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
 
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
  
 
----
 
----
  
* Tues Sep 29.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
+
* Tues Sep 27.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
** HW3 out: Naive Bays in GuineaPig. [https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view PDF Handout]
 
** HW3 out: Naive Bays in GuineaPig. [https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view PDF Handout]
* Thus Oct 1. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
+
* Thus Sep 29. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
 
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
* Tues Oct 6. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
+
* Tues Oct 4. '''No class - Rosh Hashana.'''
* Thus Oct 8. [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons 2]].
+
* Thus Oct 6. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
* Tues Oct 13. [[Class meeting for 10-605 Advanced topics for SGD|More on parallel and streaming ML]]: Adaptive gradients, AllReduce, and Parameter Servers
+
* Tues Oct 11. [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons 2]].
 +
* Thus Oct 13. [[Class meeting for 10-605 Advanced topics for SGD|More on parallel and streaming ML]]: Adaptive gradients, AllReduce, and Parameter Servers
 
** HW4 out: streaming logistic regression classifier [http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf PDF Handout]
 
** HW4 out: streaming logistic regression classifier [http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf PDF Handout]
 +
 +
stopped here
 +
 
* Thus Oct 15. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
 
* Thus Oct 15. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
 
** For 805 students: the final project proposal is due.
 
** For 805 students: the final project proposal is due.

Revision as of 17:15, 18 July 2016

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

note: this is under construction

Schedule:


  • Tues Sep 27. Fast KNN and similarity joins
  • Thus Sep 29. Scalable SGD and Hash Kernels
    • For 805 students: an initial project proposal is due via email to wcohen+805@gmail.com. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest. Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
  • Tues Oct 4. No class - Rosh Hashana.
  • Thus Oct 6. Parallel Perceptrons 1.
  • Tues Oct 11. Parallel Perceptrons 2.
  • Thus Oct 13. More on parallel and streaming ML: Adaptive gradients, AllReduce, and Parameter Servers
    • HW4 out: streaming logistic regression classifier PDF Handout

stopped here



  • Tues Dec 1, Thus Dec 3. Graph models for large-scale ML
  • Tues Dec 8. Review and project presentations (15 min each):
    • Schedule:
      • Bhuwan Dingra/Yun Fu
      • Rohit Girdhar
      • Siddha Ganju/Sravya Popuri/Srikant Avasarala
      • Jingkun Gao/Yiming Gu
    • HW7 due
  • Thus Dec 10. In-class final exam.
  • Tues Dec 15. Writeup for 10-805 projects are due (at 11:59pm).

Topics covered in previous years but not in 2015