Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
 
(25 intermediate revisions by 5 users not shown)
Line 6: Line 6:
 
* Homeworks, unless otherwise posted, will be due when the next HW comes out.
 
* Homeworks, unless otherwise posted, will be due when the next HW comes out.
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
 
* Lecture notes and/or slides will be (re)posted around the time of the lectures.
* '''Classes are cancelled for Oct 27'''
+
* Classes are cancelled for Oct 27
 
* '''No classes will be held on Nov 24 (Thanksgiving)'''
 
* '''No classes will be held on Nov 24 (Thanksgiving)'''
  
Line 16: Line 16:
 
** This is a revised writeup that will address any comments William raises from the initial proposal.
 
** This is a revised writeup that will address any comments William raises from the initial proposal.
 
* 11:59pm Sun 11/13: [[Midterm 805 project report]] due.
 
* 11:59pm Sun 11/13: [[Midterm 805 project report]] due.
 +
* '''1:30-2:50pm Tues 12/6: Project presentations''' (in class).  One presentation per group, 12minutes per presentation.  Please send your slide deck to William by '''10am 12/6''' (PDF is best).
 
* 11:59pm Sun 12/11: [[Machine_Learning_with_Large_Datasets_10-605_in_Fall_2016#Project_Info|Final 805 project writeup]] due.
 
* 11:59pm Sun 12/11: [[Machine_Learning_with_Large_Datasets_10-605_in_Fall_2016#Project_Info|Final 805 project writeup]] due.
If class time permits there will also be a short presentation in late Nov early Dec.
+
 
  
 
----
 
----
Line 23: Line 24:
 
Schedule for lectures and 605 assignments:
 
Schedule for lectures and 605 assignments:
  
* Tues Aug 30, 2016 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
+
* Tues Aug 30, 2016 [[Class meeting for 10-605 in Fall 2016 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
+
* Thurs Sep 1, 2016 [[Class meeting for 10-605 in Fall 2016 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
 
** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup].
 
** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup].
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
+
* Tues Sep 6, 2016 [[Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
+
* Thurs Sep 8, 2016 [[Class meeting for 10-605 in Fall 2016 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup
 
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
 
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
Line 48: Line 49:
 
* Tues Oct 18, 2016 [[Class meeting for 10-605 Midterm|Midterm]].   
 
* Tues Oct 18, 2016 [[Class meeting for 10-605 Midterm|Midterm]].   
 
* Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
 
* Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://drive.google.com/file/d/0BzQQ-spWKjhUaWoyOFZHV21uUlU/view
+
** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw4approximatepagerank/writeup
* Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, Deep learning \
+
* Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
 
 
* Thurs Oct 27, 2016. '''No class.'''
 
* Thurs Oct 27, 2016. '''No class.'''
* Tues Nov 1, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Recurs\
+
* Tues Nov 1, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Recursive ANNs, Word2vec
ive ANNs, Word2vec
+
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The \
+
** '''Start work on''' Assignment 5: Autodiff with IPM.  This is a new assignment for Fall 2016. View writeup at https://github.com/KarandeepJohar/10605-f16-hw5/blob/master/automatic-reverse-mode.pdf
countmin sketch
+
* Tues Nov 8, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]].  Locality sensitive hashing
** '''Start work on''' Assignment 5: Autodiff with IPM.  This is a new assignment for Fall 2016.
+
* Thurs Nov 10, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
* Tues Nov 8, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]].  Locality sensitive h\
+
* Tues Nov 15, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
ashing
+
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
* Thurs Nov 10, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-base\
+
** '''Start work on''' Assignment 6: Phrase-finding with Spark. Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw6phrasefindingwithspark/writeup
d ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
+
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA
* Tues Nov 15, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multi\
+
* Tues Nov 29, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].
rank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
+
** '''Start work on''' Assignment 7: LDA with a Parameter Server, Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw7lda/attachments/677
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  \
+
* Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]].  Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation \
+
* Tues Dec 6, 2016 [[Class meeting for 10-605 Project Reports|Project Reports]].
for SSL on non-graph data
 
** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
 
* Tues Nov 22, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].
 
* Tues Nov 29, 2016 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA
 
** '''Start work on''' Assignment 7: LDA with a Parameter Server, draft http://curtis.ml.cmu.edu/w/courses/images/1/\
 
16/Hw7-lda-ps.pdf
 
* Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]].  Parallelizing LDA, Fast sampling for LDA, DGMs for grap\
 
hs
 
* Tues Dec 6, 2016 [[Class meeting for 10-605 Review session for final|Review session for final]].
 
 
** '''Last assignment due'''
 
** '''Last assignment due'''
* Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]].
+
* Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]].  Note that we've posted:
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 +
** Comments:
 +
*** Most of the exam (approximately 80%) covers material from after the midterm.
 +
*** You may bring in '''two''' 8 1/2 by 11 sheets of paper with notes.

Latest revision as of 11:54, 11 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Oct 27
  • No classes will be held on Nov 24 (Thanksgiving)

Schedule for 805 projects:

  • 11:59pm Sun 10/2: Initial 805 project proposal due.
  • 11:59pm Sun 10/16: Final 805 project proposal due.
    • This is a revised writeup that will address any comments William raises from the initial proposal.
  • 11:59pm Sun 11/13: Midterm 805 project report due.
  • 1:30-2:50pm Tues 12/6: Project presentations (in class). One presentation per group, 12minutes per presentation. Please send your slide deck to William by 10am 12/6 (PDF is best).
  • 11:59pm Sun 12/11: Final 805 project writeup due.



Schedule for lectures and 605 assignments: