Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017"

From Cohen Courses
Jump to navigationJump to search
Line 29: Line 29:
 
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
 
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
 
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
 
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d\
+
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf
/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
 
 
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map reduce algorithms, Joins in Hadoop
 
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map reduce algorithms, Joins in Hadoop
 
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,  
 
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,  
Line 37: Line 36:
 
* Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]].  PageRank in Pig, K-means in Pig, Spark
 
* Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]].  PageRank in Pig, K-means in Pig, Spark
 
* Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]].  Systems built on top of Hadoop, Phrase-finding in Pig, Other work with phrases
 
* Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]].  Systems built on top of Hadoop, Phrase-finding in Pig, Other work with phrases
* Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, H\
+
* Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
ash kernels for logistic regression
 
 
* Thurs Sep 29, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]].  Debugging ML algorithms
 
* Thurs Sep 29, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]].  Debugging ML algorithms
 
** '''Start work on''' Assignment 3: scalable SGD Draft at http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf
 
** '''Start work on''' Assignment 3: scalable SGD Draft at http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf
Line 55: Line 53:
 
* Tues Nov 1, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
 
* Tues Nov 1, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
 
** '''Start work on''' Assignment 5: Autodiff with IPM.  This is a new assignment for Fall 2016.
 
** '''Start work on''' Assignment 5: Autodiff with IPM.  This is a new assignment for Fall 2016.
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\
+
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 ]].  Review of Bloom filters, Locality sensitive hashing
].  Review of Bloom filters, Locality sensitive hashing
 
 
* Tues Nov 8, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab,
 
* Tues Nov 8, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab,
 
  PowerGraph, GraphChi, GraphX
 
  PowerGraph, GraphChi, GraphX
* Thurs Nov 10, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\
+
* Thurs Nov 10, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
sorption SSL method, MAD with countmin sketches
 
 
* Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
 
* Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
 
** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
 
** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.

Revision as of 10:53, 2 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

Schedule for 805 projects:



Tentative schedule for lectures and 605 assignments:

Similarity joins with TFIDF, Parallel simjoins

and GPUs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 27, 2016 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists,

Breakdown of xman.py, Recursive ANNs, Convolutional ANNs

  • Tues Nov 1, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
    • Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016.
  • Thurs Nov 3, 2016 Randomized Algorithms 2 . Review of Bloom filters, Locality sensitive hashing
  • Tues Nov 8, 2016 Graph Architectures for ML. Graph-based ML architectures, Pregel, Signal-collect, GraphLab,
PowerGraph, GraphChi, GraphX
  • Thurs Nov 10, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
  • Tues Nov 15, 2016 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
    • Start work on Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
  • Thurs Nov 17, 2016 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
  • Tues Nov 22, 2016 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
  • Thurs Nov 24, 2016 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
  • Tues Nov 29, 2016 Review session for final.
    • Last assignment due
  • Thurs Dec 1, 2016 Final Exam.