Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
(Undo revision 18884 by Wcohen (talk))
m (Reverted edits by Wcohen (talk) to last revision by Yulanh)
Line 24: Line 24:
 
Schedule for lectures and 605 assignments:
 
Schedule for lectures and 605 assignments:
  
* Tues Aug 29, 2017 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
+
* Tues Aug 30, 2016 [[Class meeting for 10-605 Overview|Overview]].  Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
* Thurs Aug 31, 2017 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
+
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]].  Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
** '''Start work on''' Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1a-naivebayes-streaming/main-a.pdf
+
** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup].
* Tues Sep 5, 2017 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
+
* Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]].  Notes on scalable naive bayes, Local counting in stream and sort
* Thurs Sep 7, 2017 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
+
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]].  Intro to Hadoop, Hadoop Streaming
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1a-naivebayes-streaming/main-b.pdf
+
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup
* Tues Sep 12, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop
+
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]].  Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
* Thurs Sep 14, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins, Similarity joins with TFIDF, Parallel simjoins
+
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]].  Similarity joins, Similarity joins with TFIDF, Parallel simjoins
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-2-naivebayes-gpig/main.pdf
+
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw2nbwithguineapig/writeup
* Tues Sep 19, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]].  PageRank in Pig, K-means in Pig, Spark
+
* Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]].  PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
* Tues Sep 26, 2017 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]].  Systems built on top of Hadoop, Phrase-finding in Pig, Other work with phrases
+
* Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]].  Phrase-finding in Pig, Other work with phrases
* Thurs Sep 28, 2017 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
+
* Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]].  Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
* Tues Oct 3, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]].  Debugging ML algorithms
+
* Thurs Sep 29, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]].  Also wrapup for SGD, debugging ML algorithms
** '''Start work on''' Assignment 3: scalable SGD Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-3-sga-logreg/main.pdf
+
** '''Start work on''' Assignment 3: scalable SGD at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw3sgd/writeup
* Thurs Oct 5, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]].   
+
* Tues Oct 4, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]].   
* Tues Oct 10, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 3]].  Structured perceptrons, Interative parameter mixing paper
+
* Thurs Oct 6, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 3]].  Structured perceptrons, Interative parameter mixing paper
* Thurs Oct 12, 2017 [[Class meeting for 10-605 SGD for MF|SGD for MF]].  Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
+
* Tues Oct 11, 2016 [[Class meeting for 10-605 SGD for MF|SGD for MF]].  Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
* Tues Oct 17, 2017 [[Class meeting for 10-605 Midterm review|Midterm review]].   
+
* Thurs Oct 13, 2016 [[Class meeting for 10-605 Midterm review|Midterm review]].
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm from 2015].  This document also references the relevant questions from two previous review sheets:
 +
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
 +
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
 +
*** [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf Some review tips - modified from last year's exam review session]
 
** '''Last assignment due'''
 
** '''Last assignment due'''
* Thurs Oct 19, 2017 [[Class meeting for 10-605 Midterm|Midterm]].   
+
* Tues Oct 18, 2016 [[Class meeting for 10-605 Midterm|Midterm]].   
* Tues Oct 24, 2017 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
+
* Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]].  Sampling a graph, Local partitioning
** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-4-apr/main.pdf
+
** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw4approximatepagerank/writeup
* Thurs Oct 26, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models
+
* Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]].  Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
* Tues Oct 31, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, Breakdown of xman.py, Recursive ANNs, Convolutional ANNs
+
* Thurs Oct 27, 2016. '''No class.'''
* Thurs Nov 2, 2017 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
+
* Tues Nov 1, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]].  Reverse-mode differentiation, Recursive ANNs, Word2vec
** '''Start work on''' Assignment 5: Autodiff with IPM. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-5-autodiff/main.pdf
+
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]].  Bloom filters, The countmin sketch
* Tues Nov 7, 2017 [[Class meeting for 10-605 Randomized Algorithms 2 someday, redo the count-min stuff|Randomized Algorithms 2 someday, redo the count-min stuff]].  Review of Bloom filters, Locality sensitive hashing
+
** '''Start work on''' Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016. View writeup at https://github.com/KarandeepJohar/10605-f16-hw5/blob/master/automatic-reverse-mode.pdf
* Thurs Nov 9, 2017 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
+
* Tues Nov 8, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]].  Locality sensitive hashing
* Tues Nov 14, 2017 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
+
* Thurs Nov 10, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]].  Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
* Thurs Nov 16, 2017 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
+
* Tues Nov 15, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]].  Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
** '''Start work on''' Assignment 6: Phrase-finding in Spark. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-6-spark-phrases/main.pdf
+
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]].  Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
* Tues Nov 21, 2017 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].  Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
+
** '''Start work on''' Assignment 6: Phrase-finding with Spark. Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw6phrasefindingwithspark/writeup
* Tues Nov 28, 2017 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA
+
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]].  DGMs for naive Bayes, Gibbs sampling for LDA
** '''Start work on''' Assignment 7: LDA with a Parameter Server, draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-7-lda-ps/main.pdf
+
* Tues Nov 29, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]].
* Thurs Nov 30, 2017 [[Class meeting for 10-605 LDA|LDA 2]].  Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
+
** '''Start work on''' Assignment 7: LDA with a Parameter Server, Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw7lda/attachments/677
* Tues Dec 5, 2017 [[Class meeting for 10-605 Review session for final|Review session for final]].
+
* Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]].  Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
 +
* Tues Dec 6, 2016 [[Class meeting for 10-605 Project Reports|Project Reports]].
 
** '''Last assignment due'''
 
** '''Last assignment due'''
* Thurs Dec 7, 2017 [[Class meeting for 10-605 Final Exam|Final Exam]].
+
* Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]].

Revision as of 11:52, 11 August 2017

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Oct 27
  • No classes will be held on Nov 24 (Thanksgiving)

Schedule for 805 projects:

  • 11:59pm Sun 10/2: Initial 805 project proposal due.
  • 11:59pm Sun 10/16: Final 805 project proposal due.
    • This is a revised writeup that will address any comments William raises from the initial proposal.
  • 11:59pm Sun 11/13: Midterm 805 project report due.
  • 1:30-2:50pm Tues 12/6: Project presentations (in class). One presentation per group, 12minutes per presentation. Please send your slide deck to William by 10am 12/6 (PDF is best).
  • 11:59pm Sun 12/11: Final 805 project writeup due.



Schedule for lectures and 605 assignments: