Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
Line 20: Line 20:
 
** Lecture also discusses: map-reduce abstractions/dataflow
 
** Lecture also discusses: map-reduce abstractions/dataflow
 
* Tues Sep 20, Thus Sep 22. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 
* Tues Sep 20, Thus Sep 22. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 +
** HW3 out: Using workflow languages.
 
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
 
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
 
* Tues Sep 27.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
* Tues Sep 27.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
** HW3 out: Using workflow languages.
 
 
* Thus Sep 29. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 
* Thus Sep 29. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 +
** HW4 out: streaming logistic regression classifier [http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf PDF Handout]
 
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
 
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
  

Revision as of 16:47, 25 July 2016

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

note: this is under construction

September

October

November

December

  • Thus Dec 1. Graph models for large-scale ML
  • Tues Dec 6. Review and project presentations (15 min each):
    • HW7 due
  • Thus Dec 8. In-class exam.
  • Tues Dec 15. Writeup for 10-805 projects are due (at 11:59pm).

Topics covered in previous years but not in 2015