Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015"

From Cohen Courses
Jump to navigationJump to search
Line 15: Line 15:
 
** HW2 out: naive Bayes training on Hadoop in Java. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view PDF Handout]
 
** HW2 out: naive Bayes training on Hadoop in Java. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view PDF Handout]
 
* Tues Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
 
* Tues Sep 22 - Thus Sep 24. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
* Thus Sep 24.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
  
 
----
 
----
Line 61: Line 60:
 
* [[Class meeting for 10-605 Phase Finding|Phrase Finding]]
 
* [[Class meeting for 10-605 Phase Finding|Phrase Finding]]
 
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
 
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
 +
* [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]

Revision as of 13:28, 28 September 2015

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

  • Tues Sep 29. Scalable SGD and Hash Kernels
    • HW3 out: applying a large linear classifier to a large test set in Hadoop.
  • Thus Oct 1. TBA
    • For 805 students: an initial project proposal is due via email to wcohen+805@gmail.com. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest. Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
  • Tues Oct 6. Parallel Perceptrons 1.
  • Thus Oct 8. Parallel Perceptrons 2.
  • Tues Oct 13. Parameter servers and AllReduce
    • HW4 out: streaming logistic regression classifier
  • Thus Oct 15. Matrix Factorization and SGD
    • For 805 students: the final project proposal is due.
  • Tues Oct 20. guest lecture from Mark Torrance of RocketFuel
  • Thus Oct 22. midterm exam


Topics covered in previous years but not in 2015