Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017"

From Cohen Courses
Jump to navigationJump to search
Line 23: Line 23:
  
 
'''Tentative''' schedule for lectures and 605 assignments:
 
'''Tentative''' schedule for lectures and 605 assignments:
 +
 
* Tues Aug 29, 2017 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
 
* Tues Aug 29, 2017 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
 
* Thurs Aug 31, 2017 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
 
* Thurs Aug 31, 2017 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
** '''Start work on''' Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1a-naivebayes-hashtab.pdf
+
** '''Start work on''' Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-a.pdf
 
* Tues Sep 5, 2017 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
 
* Tues Sep 5, 2017 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
 
* Thurs Sep 7, 2017 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
 
* Thurs Sep 7, 2017 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1b-naivebayes-hadoop.pdf
+
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-b.pdf
 
* Tues Sep 12, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop
 
* Tues Sep 12, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop
 
* Thurs Sep 14, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins, Similarity joins with TFIDF, Parallel simjoins
 
* Thurs Sep 14, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins, Similarity joins with TFIDF, Parallel simjoins

Revision as of 11:25, 2 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

Schedule for 805 projects:



Tentative schedule for lectures and 605 assignments: