Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016. ---- Notes: * Homeworks, unless otherwise posted, will be due when the next HW come...")
 
Line 32: Line 32:
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d\
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d\
 
/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
 
/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-r\
+
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop
educe algorithms, Joins in Hadoop
+
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,  
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins, \
 
 
Similarity joins with TFIDF, Parallel simjoins
 
Similarity joins with TFIDF, Parallel simjoins
 
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view
 
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view
Line 51: Line 50:
 
* Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
 
* Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
 
** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://drive.google.com/file/d/0BzQQ-spWKjhUaWoyOFZHV21uUlU/view
 
** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://drive.google.com/file/d/0BzQQ-spWKjhUaWoyOFZHV21uUlU/view
* Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning\
+
* Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning
 
  and GPUs, Exploding and vanishing gradients, Modern deep learning models
 
  and GPUs, Exploding and vanishing gradients, Modern deep learning models
* Thurs Oct 27, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, \
+
* Thurs Oct 27, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists,  
 
Breakdown of xman.py, Recursive ANNs, Convolutional ANNs
 
Breakdown of xman.py, Recursive ANNs, Convolutional ANNs
 
* Tues Nov 1, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
 
* Tues Nov 1, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
Line 59: Line 58:
 
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\
 
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\
 
].  Review of Bloom filters, Locality sensitive hashing
 
].  Review of Bloom filters, Locality sensitive hashing
* Tues Nov 8, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab,\
+
* Tues Nov 8, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab,
 
  PowerGraph, GraphChi, GraphX
 
  PowerGraph, GraphChi, GraphX
 
* Thurs Nov 10, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\
 
* Thurs Nov 10, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\
 
sorption SSL method, MAD with countmin sketches
 
sorption SSL method, MAD with countmin sketches
* Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Lab\
+
* Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
el propagation for clustering non-graph data, Label propagation for SSL on non-graph data
 
 
** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
 
** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
 
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].  Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Manage\
 
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].  Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Manage\

Revision as of 10:51, 2 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

Schedule for 805 projects:



Tentative schedule for lectures and 605 assignments:

  • Tues Aug 30, 2016 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
  • Thurs Sep 1, 2016 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio a\

nd TFIDF

/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view

  • Tues Sep 13, 2016 Workflows For Hadoop 1. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop
  • Thurs Sep 15, 2016 Workflows For Hadoop 2. TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,

Similarity joins with TFIDF, Parallel simjoins

ash kernels for logistic regression

and GPUs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 27, 2016 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists,

Breakdown of xman.py, Recursive ANNs, Convolutional ANNs

  • Tues Nov 1, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
    • Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016.
  • Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\

]. Review of Bloom filters, Locality sensitive hashing

PowerGraph, GraphChi, GraphX
  • Thurs Nov 10, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\

sorption SSL method, MAD with countmin sketches

  • Tues Nov 15, 2016 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
    • Start work on Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
  • Thurs Nov 17, 2016 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Manage\

d Communication in PS, LDA Sampler with PS