Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017"

From Cohen Courses
Jump to navigationJump to search
Line 27: Line 27:
 
* Thurs Aug 31, 2017 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
 
* Thurs Aug 31, 2017 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
 
** '''Start work on''' Assignment 1a: Streaming NB; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-a.pdf
 
** '''Start work on''' Assignment 1a: Streaming NB; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-a.pdf
* Tues Sep 5, 2017 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
+
* Tues Sep 5, 2017 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Alternatives to stream and sort, Local counting in stream and sort, Stream and sort examples
* Thurs Sep 7, 2017 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
+
* Thurs Sep 7, 2017 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming, Debugging Hadoop, Combiners
 
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-b.pdf
 
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-b.pdf
* Tues Sep 12, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Abstracts for map-reduce algorithms
+
* Tues Sep 12, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Abstracts for map-reduce algorithms, Joins in Hadoop
 
* Thurs Sep 14, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  Guinea Pig intro, Similarity joins, Similarity joins with TFIDF
 
* Thurs Sep 14, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  Guinea Pig intro, Similarity joins, Similarity joins with TFIDF
 
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-2-naivebayes-gpig/main.pdf
 
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-2-naivebayes-gpig/main.pdf
* Tues Sep 19, 2017 [[Class meeting for 10-605 Workflows For Hadoop 3: PageRank and Phrases|Workflows For Hadoop 3: PageRank and Phrases]].  Spark
+
* Tues Sep 19, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]].  PageRank, Spark, Phrase finding
* Tues Sep 26, 2017 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
+
* Tues Sep 26, 2017 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Efficient regularized SGD, Hash kernels for logistic regression
* Thurs Sep 28, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]].  Debugging ML algorithms
+
* Thurs Sep 28, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]].  The "delta trick", Averaged perceptrons, Debugging ML algorithms
 
** '''Start work on''' Assignment 3: scalable SGD; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-3-sga-logreg/main.pdf
 
** '''Start work on''' Assignment 3: scalable SGD; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-3-sga-logreg/main.pdf
* Tues Oct 3, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]].   
+
* Tues Oct 3, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]].  Hash kernels, Ranking perceptrons
 
* Thurs Oct 5, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 3]].  Structured perceptrons, Interative parameter mixing paper
 
* Thurs Oct 5, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 3]].  Structured perceptrons, Interative parameter mixing paper
 
* Tues Oct 10, 2017 [[Class meeting for 10-605 SGD for MF|SGD for MF]].  Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
 
* Tues Oct 10, 2017 [[Class meeting for 10-605 SGD for MF|SGD for MF]].  Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
* Thurs Oct 12, 2017 [[Class meeting for 10-605 Midterm review and catchup|Midterm review and catchup]].   
+
* Thurs Oct 12, 2017 [[Class meeting for 10-605 Midterm review and catchup|Midterm review and catchup]].  Midterm review
 
** '''Last assignment due'''
 
** '''Last assignment due'''
 
* Tues Oct 17, 2017 [[Class meeting for 10-605 Midterm|Midterm]].   
 
* Tues Oct 17, 2017 [[Class meeting for 10-605 Midterm|Midterm]].   
* Thurs Oct 19, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models
+
* Thurs Oct 19, 2017 [[Class meeting for 10-605 Computing with GPUs|Computing with GPUs]]. 
* Tues Oct 24, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, Breakdown of xman.py
+
* Tues Oct 24, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models
 +
* Thurs Oct 26, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, Breakdown of xman.py
 
** '''Start work on''' Assignment 4: Autodiff with IPM part 1/2; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-5-autodiff/main.pdf
 
** '''Start work on''' Assignment 4: Autodiff with IPM part 1/2; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-5-autodiff/main.pdf
* Thurs Oct 26, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 3]].  Recursive ANNs, Convolutional ANNs
+
* Tues Oct 31, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 3]].  Recursive ANNs, Convolutional ANNs
* Tues Oct 31, 2017 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
+
* Thurs Nov 2, 2017 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
* Thurs Nov 2, 2017 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]].  Review of Bloom filters, Locality sensitive hashing
+
* Tues Nov 7, 2017 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]].  Review of Bloom filters, Locality sensitive hashing, Online LSH
 
** '''Start work on''' Assignment 5: Autodiff with IPM part 2/2
 
** '''Start work on''' Assignment 5: Autodiff with IPM part 2/2
* Tues Nov 7, 2017 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
+
* Thurs Nov 9, 2017 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
* Thurs Nov 9, 2017 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
+
* Tues Nov 14, 2017 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
* Tues Nov 14, 2017 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
 
 
** '''Start work on''' Assignment 6: SSL on a graph in Spark maybe using NELL data?
 
** '''Start work on''' Assignment 6: SSL on a graph in Spark maybe using NELL data?
 
* Thurs Nov 16, 2017 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].  Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
 
* Thurs Nov 16, 2017 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].  Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS

Revision as of 13:31, 10 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

Schedule for 805 projects:



Tentative schedule for lectures and 605 assignments: