Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
 
(21 intermediate revisions by 4 users not shown)
Line 6: Line 6:
 
* Homeworks, unless otherwise posted, will be due when the next HW comes out.
 
* Homeworks, unless otherwise posted, will be due when the next HW comes out.
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
* '''Classes are cancelled for Oct 27'''
+
* Classes are cancelled for Oct 27
 
* '''No classes will be held on Nov 24 (Thanksgiving)'''
 
* '''No classes will be held on Nov 24 (Thanksgiving)'''
  
Line 16: Line 16:
 
** This is a revised writeup that will address any comments William raises from the initial proposal.
 
** This is a revised writeup that will address any comments William raises from the initial proposal.
 
* 11:59pm Sun 11/13: [[Midterm 805 project report]] due.
 
* 11:59pm Sun 11/13: [[Midterm 805 project report]] due.
 +
* '''1:30-2:50pm Tues 12/6: Project presentations''' (in class).  One presentation per group, 12minutes per presentation.  Please send your slide deck to William by '''10am 12/6''' (PDF is best).
 
* 11:59pm Sun 12/11: [[Machine_Learning_with_Large_Datasets_10-605_in_Fall_2016#Project_Info|Final 805 project writeup]] due.
 
* 11:59pm Sun 12/11: [[Machine_Learning_with_Large_Datasets_10-605_in_Fall_2016#Project_Info|Final 805 project writeup]] due.
If class time permits there will also be a short presentation in late Nov early Dec.
+
 
  
 
----
 
----
Line 23: Line 24:
 
Schedule for lectures and 605 assignments:
 
Schedule for lectures and 605 assignments:
  
* Tues Aug 30, 2016 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
+
* Tues Aug 30, 2016 [[Class meeting for 10-605 in Fall 2016 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
+
* Thurs Sep 1, 2016 [[Class meeting for 10-605 in Fall 2016 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
 
** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup].
 
** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup].
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
+
* Tues Sep 6, 2016 [[Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
+
* Thurs Sep 8, 2016 [[Class meeting for 10-605 in Fall 2016 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup
 
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
 
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
Line 58: Line 59:
 
* Tues Nov 15, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
 
* Tues Nov 15, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
 
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
 
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
** '''Start work on''' Assignment 6:  Phrase-finding with Spark.
+
** '''Start work on''' Assignment 6:  Phrase-finding with Spark. Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw6phrasefindingwithspark/writeup
* Tues Nov 22, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].
+
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA
* Tues Nov 29, 2016 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA
+
* Tues Nov 29, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].
** '''Start work on''' Assignment 7: LDA with a Parameter Server, draft http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf
+
** '''Start work on''' Assignment 7: LDA with a Parameter Server, Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw7lda/attachments/677
 
* Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]].  Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
 
* Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]].  Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
* Tues Dec 6, 2016 [[Class meeting for 10-605 Review session for final|Review session for final]].
+
* Tues Dec 6, 2016 [[Class meeting for 10-605 Project Reports|Project Reports]].
 
** '''Last assignment due'''
 
** '''Last assignment due'''
* Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]].
+
* Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]].  Note that we've posted:
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 +
** Comments:
 +
*** Most of the exam (approximately 80%) covers material from after the midterm.
 +
*** You may bring in '''two''' 8 1/2 by 11 sheets of paper with notes.

Latest revision as of 11:54, 11 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Oct 27
  • No classes will be held on Nov 24 (Thanksgiving)

Schedule for 805 projects:

  • 11:59pm Sun 10/2: Initial 805 project proposal due.
  • 11:59pm Sun 10/16: Final 805 project proposal due.
    • This is a revised writeup that will address any comments William raises from the initial proposal.
  • 11:59pm Sun 11/13: Midterm 805 project report due.
  • 1:30-2:50pm Tues 12/6: Project presentations (in class). One presentation per group, 12minutes per presentation. Please send your slide deck to William by 10am 12/6 (PDF is best).
  • 11:59pm Sun 12/11: Final 805 project writeup due.



Schedule for lectures and 605 assignments: