Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"

From Cohen Courses
Jump to navigationJump to search
Line 7: Line 7:
 
''note: this is under construction''
 
''note: this is under construction''
  
== September ==
+
* Thurs Sep 1, 2016 [[Class meeting for 10-605 Overview|Overview]] Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
 
+
* Tues Sep 6, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]] Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
* Thus Sep 1. [[Class meeting for 10-605 Overview|Overview of course, cost of various operations, asymptotic analysis.]]
+
* Thurs Sep 8, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]] Notes on scalable naive bayes, Local counting in stream and sort
* Tues Sep 6. [[Class meeting for 10-605 Probability Review|Review of probabilities, joint distributions and naive Bayes]]
+
** '''Start work on''' assignment 1a: streaming NB
** HW1A out: streaming naive Bayes. [https://s3.amazonaws.com/vincy/10605-15Fall/HW1_StreamingNB.pdf draft Handout]
+
* Tues Sep 13, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]] Intro to Hadoop, Hadoop Streaming
* Thus Sep 8[[Class meeting for 10-605 Streaming Naive Bayes|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]
+
** '''Start work on'''  assignment 1b: streaming NB on streaming hadoop
* Tues Sep 13. [[Class meeting for 10-605 Phrase Finding and Hadoop|Phrase Finding and Hadoop]]
+
* Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop 1|Workflows For Hadoop 1]] Scalably using out-of-memory-scale classifiers, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins
** HW1B out: naive Bayes training on Hadoop. [https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view draft Handout]
+
* Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop 2|Workflows For Hadoop 2]] Similarity joins with TFIDF, Parallel simjoins, PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
**  [[Class_meeting_for_10-605_Hadoop_Overview|Hadoop Overview]]
+
** '''Start work on''' assignment 2: naive bayes testing in guinea pig
** [[Class meeting for 10-605 Phrase Finding|Phrase Finding]]
+
* Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]] Phrase-finding in Pig, Other work with phrases
* Thus Sep 15. [[Class meeting for 10-605 Phrases_with_Stream_and_Sort|Implementing Phrase Finding and Large-Data Testing for Naive Bayes with Stream-and-Sort]].
+
* Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]] Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
** Lecture also discusses: map-reduce abstractions/dataflow
+
* Thurs Sep 29, 2016 [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]] Debugging ML algorithms
* Tues Sep 20. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]]
+
** '''Start work on''' assignment 3: scalable sgd system
** HW3 out: Using workflow languages.
+
* Thurs Oct 6, 2016 [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons 2]] Structured perceptrons, Interative parameter mixing paper
* Thus Sep 22. [[Class_meeting_for_10-605_Rocchio_and_Hadoop_Workflows|Hadoop Workflow Languages and Rocchio and TFIDF]] continued
+
* Tues Oct 11, 2016 [[Class meeting for 10-605 SGD for MF|SGD for MF]] Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
** Lecture also discusses: hadoop streaming, mrjob, cascading, pipes, scaling, hive, pig, spark, flink
+
* Thurs Oct 13, 2016 [[Class meeting for 10-605 Midterm review|Midterm review]]  
* Tues Sep 27.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
+
** '''Last assignment due'''
* Thus Sep 29. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
+
* Tues Oct 18, 2016 [[Class meeting for 10-605 Midterm|Midterm]]  
** HW4 out: streaming logistic regression classifier [http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf PDF Handout]
+
* Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]] Sampling a graph, Local partitioning
** For 805 students: an initial project proposal is due '''via email to wcohen+805@gmail.com'''. You will get feedback on it from the instructors, and it will also be posted to the class - mainly for 605 students that are interested in collaborating, but also for general interest.  Please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
+
** '''Start work on''' assignment 4: graph subsampling
 
+
* Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning 1|Deep Learning 1]] Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
== October ==
+
* Thurs Oct 27, 2016 [[Class meeting for 10-605 Deep Learning 2|Deep Learning 2]] Reverse-mode differentiation, Recursive ANNs, Word2vec
 
+
* Tues Nov 1, 2016 [[Class meeting for 10-605 Randomized Algorithms 1|Randomized Algorithms 1]] Bloom filters, The countmin sketch
* Tues Oct 4. '''No class - Rosh Hashana.'''
+
** '''Start work on''' assignment 5: autodiff with IPM
* Thus Oct 6. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
+
* Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2|Randomized Algorithms 2]] Locality sensitive hashing
* Tues Oct 11. [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons 2]].
+
* Tues Nov 8, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]] Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
* Thus Oct 13. [[Class meeting for 10-605 Advanced topics for SGD|More on parallel and streaming ML]]: Adaptive gradients, AllReduce, and Parameter Servers
+
* Thurs Nov 10, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]] Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
** Also, some exam review tips ([http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pptx ppt])
+
* Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]] Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/f2015-midterm.pdf practice questions for midterm - v1].  This document also references the relevant questions from two previous review sheets:
+
** '''Start work on''' assignment 6: graphX for SSL
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014]
+
* Thurs Nov 17, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]]  
*** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015]
+
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA 1|LDA 1]] DGMs for naive Bayes, Gibbs sampling for LDA
*** [http://www.cs.cmu.edu/~wcohen/10-605/midterm-review.pdf Some review tips - modified from last year's exam review session]
+
** '''Start work on''' assignment 7: LDA with parameter servers
** ''William's note - revised, and discuss param servers more later on''
+
* Tues Nov 29, 2016 [[Class meeting for 10-605 LDA 2|LDA 2]] Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
* Tues Oct 18. ''midterm exam''
+
* Thurs Dec 1, 2016 [[Class meeting for 10-605 Scalable Probabilistic Logics|Scalable Probabilistic Logics]]  
* Thus Oct 20. [[Class meeting for 10-605 Reverse-mode differentiation and Deep Learning 1]]
+
* Tues Dec 6, 2016 [[Class meeting for 10-605 Review session for final|Review session for final]]  
** HW4 out: Implementing autograd light
+
** '''Last assignment due'''
* Tues Oct 25. [[Class meeting for 10-605 Reverse-mode differentiation and Deep Learning 2]]
+
* Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]]
** ''William's note: will include some material from'' [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
 
** For 805 students: the final project proposal is due.
 
 
 
== November ==
 
 
 
* Tues Nov 1. [[Class meeting for 10-605 Subsample A Graph|Scalable PageRank]]  
 
* Thus Nov 3. [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
 
** HW5 out: SSL on Spark
 
* Tues Nov 8. [[Class meeting for 10-605 Randomized|Randomized Algorithms 1]]
 
* Tues Nov 10. [[Class meeting for 10-605 Randomized|Randomized Algorithms 2]]
 
* Tues Nov 15. [[Class meeting for 10-605 Randomized|Randomized Algorithms 3]]
 
** HW6 out: parallel deep learning in Spark
 
* Thus Nov 17.  [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
 
* Tues Nov 22.  [[LDA 2]]
 
* Thus Nov  24. ''No class - happy Thanksgiving!''
 
* Tues Nov 29.  [[Parameter servers]]
 
** HW7 out: LDA with a param server ([http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf draft handout])
 
 
 
== December ==
 
 
 
* Thus Dec 1. [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
 
* Tues Dec 6.  [[Review and project presentations (15 min each)]]
 
** HW7 due
 
* Thus Dec 8.  In-class exam.
 
* Tues Dec 15.  Writeup for 10-805 projects are due (at 11:59pm).
 
 
 
== Topics covered in previous years but not in 2015 ==
 
 
 
* [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
 
* [[Class meeting for 10-605 PIG|Workflows in PIG]]
 
* [[Class meeting for 10-605 Phase Finding|Phrase Finding]]
 
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
 
* [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]
 
* [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
 

Revision as of 13:49, 11 August 2016

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

note: this is under construction

  • Thurs Sep 1, 2016 Overview Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
  • Tues Sep 6, 2016 Probability Review Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
  • Thurs Sep 8, 2016 Streaming Naive Bayes Notes on scalable naive bayes, Local counting in stream and sort
    • Start work on assignment 1a: streaming NB
  • Tues Sep 13, 2016 Hadoop Overview Intro to Hadoop, Hadoop Streaming
    • Start work on assignment 1b: streaming NB on streaming hadoop
  • Thurs Sep 15, 2016 Workflows For Hadoop 1 Scalably using out-of-memory-scale classifiers, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins
  • Tues Sep 20, 2016 Workflows For Hadoop 2 Similarity joins with TFIDF, Parallel simjoins, PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
    • Start work on assignment 2: naive bayes testing in guinea pig
  • Thurs Sep 22, 2016 Phrase Finding Phrase-finding in Pig, Other work with phrases
  • Tues Sep 27, 2016 SGD and Hash Kernels Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
  • Thurs Sep 29, 2016 Parallel Perceptrons 1 Debugging ML algorithms
    • Start work on assignment 3: scalable sgd system
  • Thurs Oct 6, 2016 Parallel Perceptrons 2 Structured perceptrons, Interative parameter mixing paper
  • Tues Oct 11, 2016 SGD for MF Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
  • Thurs Oct 13, 2016 Midterm review
    • Last assignment due
  • Tues Oct 18, 2016 Midterm
  • Thurs Oct 20, 2016 Subsampling a Graph Sampling a graph, Local partitioning
    • Start work on assignment 4: graph subsampling
  • Tues Oct 25, 2016 Deep Learning 1 Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 27, 2016 Deep Learning 2 Reverse-mode differentiation, Recursive ANNs, Word2vec
  • Tues Nov 1, 2016 Randomized Algorithms 1 Bloom filters, The countmin sketch
    • Start work on assignment 5: autodiff with IPM
  • Thurs Nov 3, 2016 Randomized Algorithms 2 Locality sensitive hashing
  • Tues Nov 8, 2016 Graph Architectures for ML Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
  • Thurs Nov 10, 2016 SSL on Graphs Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
  • Tues Nov 15, 2016 Unsupervised Learning On Graphs Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
    • Start work on assignment 6: graphX for SSL
  • Thurs Nov 17, 2016 Parameter Servers
  • Tues Nov 22, 2016 LDA 1 DGMs for naive Bayes, Gibbs sampling for LDA
    • Start work on assignment 7: LDA with parameter servers
  • Tues Nov 29, 2016 LDA 2 Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
  • Thurs Dec 1, 2016 Scalable Probabilistic Logics
  • Tues Dec 6, 2016 Review session for final
    • Last assignment due
  • Thurs Dec 8, 2016 Final Exam