Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017"

From Cohen Courses
Jump to navigationJump to search
Line 25: Line 25:
  
 
* Tues Aug 30, 2016 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
 
* Tues Aug 30, 2016 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio a\
+
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
nd TFIDF
 
 
** '''Start work on''' Assignment 1a: Streaming NB.    Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf
 
** '''Start work on''' Assignment 1a: Streaming NB.    Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf
 
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
 
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
Line 32: Line 31:
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d\
 
** '''Start work on'''  Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d\
 
/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
 
/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop
+
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map reduce algorithms, Joins in Hadoop
 
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,  
 
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,  
 
Similarity joins with TFIDF, Parallel simjoins
 
Similarity joins with TFIDF, Parallel simjoins
Line 64: Line 63:
 
* Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
 
* Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
 
** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
 
** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].  Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Manage\
+
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].  Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
d Communication in PS, LDA Sampler with PS
 
 
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]].  DGMs for naive Bayes, Gibbs sampling for LDA
 
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]].  DGMs for naive Bayes, Gibbs sampling for LDA
 
** '''Start work on''' Assignment 7: LDA with a Parameter Server, draft http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf
 
** '''Start work on''' Assignment 7: LDA with a Parameter Server, draft http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf

Revision as of 10:51, 2 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

Schedule for 805 projects:



Tentative schedule for lectures and 605 assignments:

/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view

  • Tues Sep 13, 2016 Workflows For Hadoop 1. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map reduce algorithms, Joins in Hadoop
  • Thurs Sep 15, 2016 Workflows For Hadoop 2. TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,

Similarity joins with TFIDF, Parallel simjoins

ash kernels for logistic regression

and GPUs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 27, 2016 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists,

Breakdown of xman.py, Recursive ANNs, Convolutional ANNs

  • Tues Nov 1, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
    • Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016.
  • Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\

]. Review of Bloom filters, Locality sensitive hashing

PowerGraph, GraphChi, GraphX
  • Thurs Nov 10, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\

sorption SSL method, MAD with countmin sketches

  • Tues Nov 15, 2016 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
    • Start work on Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
  • Thurs Nov 17, 2016 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
  • Tues Nov 22, 2016 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
  • Thurs Nov 24, 2016 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
  • Tues Nov 29, 2016 Review session for final.
    • Last assignment due
  • Thurs Dec 1, 2016 Final Exam.