Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017"
From Cohen Courses
Jump to navigationJump to searchLine 27: | Line 27: | ||
* Thurs Aug 31, 2017 [[Class meeting for 10-605 Probability Review|Probability Review]]. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF | * Thurs Aug 31, 2017 [[Class meeting for 10-605 Probability Review|Probability Review]]. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF | ||
** '''Start work on''' Assignment 1a: Streaming NB; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-a.pdf | ** '''Start work on''' Assignment 1a: Streaming NB; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-a.pdf | ||
− | * Tues Sep 5, 2017 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Local counting in stream and sort | + | * Tues Sep 5, 2017 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Alternatives to stream and sort, Local counting in stream and sort, Stream and sort examples |
− | * Thurs Sep 7, 2017 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]]. Intro to Hadoop, Hadoop Streaming | + | * Thurs Sep 7, 2017 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]]. Intro to Hadoop, Hadoop Streaming, Debugging Hadoop, Combiners |
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-b.pdf | ** '''Start work on''' Assignment 1b: Streaming NB on Hadoop; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-b.pdf | ||
− | * Tues Sep 12, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]]. Scalable classification, Abstracts for map-reduce algorithms | + | * Tues Sep 12, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]]. Scalable classification, Abstracts for map-reduce algorithms, Joins in Hadoop |
* Thurs Sep 14, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]]. Guinea Pig intro, Similarity joins, Similarity joins with TFIDF | * Thurs Sep 14, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]]. Guinea Pig intro, Similarity joins, Similarity joins with TFIDF | ||
** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-2-naivebayes-gpig/main.pdf | ** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-2-naivebayes-gpig/main.pdf | ||
− | * Tues Sep 19, 2017 [[Class meeting for 10-605 Workflows For Hadoop | + | * Tues Sep 19, 2017 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 3]]. PageRank, Spark, Phrase finding |
− | * Tues Sep 26, 2017 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]]. Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression | + | * Tues Sep 26, 2017 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]]. Learning as optimization, Logistic regression with SGD, Regularized SGD, Efficient regularized SGD, Hash kernels for logistic regression |
− | * Thurs Sep 28, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]]. Debugging ML algorithms | + | * Thurs Sep 28, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]]. The "delta trick", Averaged perceptrons, Debugging ML algorithms |
** '''Start work on''' Assignment 3: scalable SGD; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-3-sga-logreg/main.pdf | ** '''Start work on''' Assignment 3: scalable SGD; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-3-sga-logreg/main.pdf | ||
− | * Tues Oct 3, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]]. | + | * Tues Oct 3, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]]. Hash kernels, Ranking perceptrons |
* Thurs Oct 5, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 3]]. Structured perceptrons, Interative parameter mixing paper | * Thurs Oct 5, 2017 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 3]]. Structured perceptrons, Interative parameter mixing paper | ||
* Tues Oct 10, 2017 [[Class meeting for 10-605 SGD for MF|SGD for MF]]. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD | * Tues Oct 10, 2017 [[Class meeting for 10-605 SGD for MF|SGD for MF]]. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD | ||
− | * Thurs Oct 12, 2017 [[Class meeting for 10-605 Midterm review and catchup|Midterm review and catchup]]. | + | * Thurs Oct 12, 2017 [[Class meeting for 10-605 Midterm review and catchup|Midterm review and catchup]]. Midterm review |
** '''Last assignment due''' | ** '''Last assignment due''' | ||
* Tues Oct 17, 2017 [[Class meeting for 10-605 Midterm|Midterm]]. | * Tues Oct 17, 2017 [[Class meeting for 10-605 Midterm|Midterm]]. | ||
− | * Thurs Oct 19, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]]. Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models | + | * Thurs Oct 19, 2017 [[Class meeting for 10-605 Computing with GPUs|Computing with GPUs]]. |
− | * | + | * Tues Oct 24, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]]. Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models |
+ | * Thurs Oct 26, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]]. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, Breakdown of xman.py | ||
** '''Start work on''' Assignment 4: Autodiff with IPM part 1/2; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-5-autodiff/main.pdf | ** '''Start work on''' Assignment 4: Autodiff with IPM part 1/2; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-5-autodiff/main.pdf | ||
− | * | + | * Tues Oct 31, 2017 [[Class meeting for 10-605 Deep Learning|Deep Learning 3]]. Recursive ANNs, Convolutional ANNs |
− | * | + | * Thurs Nov 2, 2017 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]]. Bloom filters, The countmin sketch |
− | * | + | * Tues Nov 7, 2017 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]]. Review of Bloom filters, Locality sensitive hashing, Online LSH |
** '''Start work on''' Assignment 5: Autodiff with IPM part 2/2 | ** '''Start work on''' Assignment 5: Autodiff with IPM part 2/2 | ||
− | * | + | * Thurs Nov 9, 2017 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]]. Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX |
− | * | + | * Tues Nov 14, 2017 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]]. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches |
− | |||
** '''Start work on''' Assignment 6: SSL on a graph in Spark maybe using NELL data? | ** '''Start work on''' Assignment 6: SSL on a graph in Spark maybe using NELL data? | ||
* Thurs Nov 16, 2017 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]]. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS | * Thurs Nov 16, 2017 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]]. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS |
Revision as of 13:31, 10 August 2017
This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017.
Notes:
- Homeworks, unless otherwise posted, will be due when the next HW comes out.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
- Classes are cancelled for Sept 21 (Rosh Hashana)
- No classes will be held on Nov 23 (Thanksgiving)
Schedule for 805 projects:
- 11:59pm Sun 10/1: Initial 805 project proposal due.
- 11:59pm Sun 10/15: Final 805 project proposal due.
- This is a revised writeup that will address any comments William raises from the initial proposal.
- 11:59pm Sun 11/12: Midterm 805 project report due.
- 1:30-2:50pm Tues 12/5: Project presentations (in class).
- 11:59pm Sun 12/10: Final 805 project writeup due.
Tentative schedule for lectures and 605 assignments:
- Tues Aug 29, 2017 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
- Thurs Aug 31, 2017 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
- Start work on Assignment 1a: Streaming NB; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-a.pdf
- Tues Sep 5, 2017 Streaming Naive Bayes. Notes on scalable naive bayes, Alternatives to stream and sort, Local counting in stream and sort, Stream and sort examples
- Thurs Sep 7, 2017 Hadoop Overview. Intro to Hadoop, Hadoop Streaming, Debugging Hadoop, Combiners
- Start work on Assignment 1b: Streaming NB on Hadoop; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-1-naivebayes-streaming/main-b.pdf
- Tues Sep 12, 2017 Workflows For Hadoop 1. Scalable classification, Abstracts for map-reduce algorithms, Joins in Hadoop
- Thurs Sep 14, 2017 Workflows For Hadoop 2. Guinea Pig intro, Similarity joins, Similarity joins with TFIDF
- Start work on Assignment 2: Naive bayes testing in Guinea Pig; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-2-naivebayes-gpig/main.pdf
- Tues Sep 19, 2017 Workflows For Hadoop 3. PageRank, Spark, Phrase finding
- Tues Sep 26, 2017 SGD and Hash Kernels. Learning as optimization, Logistic regression with SGD, Regularized SGD, Efficient regularized SGD, Hash kernels for logistic regression
- Thurs Sep 28, 2017 Parallel Perceptrons 1. The "delta trick", Averaged perceptrons, Debugging ML algorithms
- Start work on Assignment 3: scalable SGD; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-3-sga-logreg/main.pdf
- Tues Oct 3, 2017 Parallel Perceptrons 2. Hash kernels, Ranking perceptrons
- Thurs Oct 5, 2017 Parallel Perceptrons 3. Structured perceptrons, Interative parameter mixing paper
- Tues Oct 10, 2017 SGD for MF. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
- Thurs Oct 12, 2017 Midterm review and catchup. Midterm review
- Last assignment due
- Tues Oct 17, 2017 Midterm.
- Thurs Oct 19, 2017 Computing with GPUs.
- Tues Oct 24, 2017 Deep Learning 1. Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models
- Thurs Oct 26, 2017 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, Breakdown of xman.py
- Start work on Assignment 4: Autodiff with IPM part 1/2; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-5-autodiff/main.pdf
- Tues Oct 31, 2017 Deep Learning 3. Recursive ANNs, Convolutional ANNs
- Thurs Nov 2, 2017 Randomized Algorithms 1. Bloom filters, The countmin sketch
- Tues Nov 7, 2017 Randomized Algorithms 2. Review of Bloom filters, Locality sensitive hashing, Online LSH
- Start work on Assignment 5: Autodiff with IPM part 2/2
- Thurs Nov 9, 2017 Graph Architectures for ML. Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
- Tues Nov 14, 2017 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
- Start work on Assignment 6: SSL on a graph in Spark maybe using NELL data?
- Thurs Nov 16, 2017 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
- Tues Nov 21, 2017 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
- Tues Nov 28, 2017 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
- Start work on Assignment 7: LDA with a Parameter Server; Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hw-7-lda-ps/main.pdf
- Thurs Nov 30, 2017 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
- Tues Dec 5, 2017 Review session for final.
- Last assignment due
- Thurs Dec 7, 2017 Final Exam.