Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
Line 8: Line 8:
 
Schedule:
 
Schedule:
  
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
+
* Tues Aug 30, 2016 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
* Tues Sep 6, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
+
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
 
 
** '''Start work on''' Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf
 
** '''Start work on''' Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf
* Tues Sep 13, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
+
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
 +
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
+
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
* Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  Similarity joins, Similarity joins with TFIDF, Parallel simjoins, PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
+
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  Similarity joins, Similarity joins with TFIDF, Parallel simjoins
 
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view
 
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view
 +
* Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]].  PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
 
* Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]].  Phrase-finding in Pig, Other work with phrases
 
* Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]].  Phrase-finding in Pig, Other work with phrases
 
* Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
 
* Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression

Revision as of 10:00, 26 August 2016

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • No classes will be held on Oct 4 (Rosh Hashana) or Nov 24 (Thanksgiving)

Schedule: