Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"
From Cohen Courses
Jump to navigationJump to searchLine 8: | Line 8: | ||
Schedule: | Schedule: | ||
− | * Thurs Sep 1, 2016 [[Class meeting for 10-605 Overview|Overview]] Grading policies and etc, History of Big Data, Complexity theory and cost of important operations | + | * Thurs Sep 1, 2016 [[Class meeting for 10-605 Overview|Overview]]. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations |
− | * Tues Sep 6, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]] Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF | + | * Tues Sep 6, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]]. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF |
− | * Thurs Sep 8, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]] Notes on scalable naive bayes, Local counting in stream and sort | + | * Thurs Sep 8, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Local counting in stream and sort |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf |
− | * Tues Sep 13, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]] Intro to Hadoop, Hadoop Streaming | + | * Tues Sep 13, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]]. Intro to Hadoop, Hadoop Streaming |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view |
− | * Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]] Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig | + | * Thurs Sep 15, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]]. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig |
− | * Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]] Similarity joins, Similarity joins with TFIDF, Parallel simjoins, PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop | + | * Tues Sep 20, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 2]]. Similarity joins, Similarity joins with TFIDF, Parallel simjoins, PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 2: Naive bayes testing in Guinea Pig, draft at https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view |
− | * Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]] Phrase-finding in Pig, Other work with phrases | + | * Thurs Sep 22, 2016 [[Class meeting for 10-605 Phrase Finding|Phrase Finding]]. Phrase-finding in Pig, Other work with phrases |
− | * Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]] Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression | + | * Tues Sep 27, 2016 [[Class meeting for 10-605 SGD and Hash Kernels|SGD and Hash Kernels]]. Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression |
− | * Thurs Sep 29, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]] Debugging ML algorithms | + | * Thurs Sep 29, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 1]]. Debugging ML algorithms |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 3: scalable SGD Draft at http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf |
− | * Thurs Oct 6, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]] Structured perceptrons, Interative parameter mixing paper | + | * Thurs Oct 6, 2016 [[Class meeting for 10-605 Parallel Perceptrons|Parallel Perceptrons 2]]. Structured perceptrons, Interative parameter mixing paper |
− | * Tues Oct 11, 2016 [[Class meeting for 10-605 SGD for MF|SGD for MF]] Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD | + | * Tues Oct 11, 2016 [[Class meeting for 10-605 SGD for MF|SGD for MF]]. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD |
− | * Thurs Oct 13, 2016 [[Class meeting for 10-605 Midterm review|Midterm review]] | + | * Thurs Oct 13, 2016 [[Class meeting for 10-605 Midterm review|Midterm review]]. |
** '''Last assignment due''' | ** '''Last assignment due''' | ||
− | * Tues Oct 18, 2016 [[Class meeting for 10-605 Midterm|Midterm]] | + | * Tues Oct 18, 2016 [[Class meeting for 10-605 Midterm|Midterm]]. |
− | * Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]] Sampling a graph, Local partitioning | + | * Thurs Oct 20, 2016 [[Class meeting for 10-605 Subsampling a Graph|Subsampling a Graph]]. Sampling a graph, Local partitioning |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://drive.google.com/file/d/0BzQQ-spWKjhUaWoyOFZHV21uUlU/view |
− | * Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]] Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models | + | * Tues Oct 25, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 1]]. Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models |
− | * Thurs Oct 27, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]] Reverse-mode differentiation, Recursive ANNs, Word2vec | + | * Thurs Oct 27, 2016 [[Class meeting for 10-605 Deep Learning|Deep Learning 2]]. Reverse-mode differentiation, Recursive ANNs, Word2vec |
− | * Tues Nov 1, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]] Bloom filters, The countmin sketch | + | * Tues Nov 1, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 1]]. Bloom filters, The countmin sketch |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016. |
− | * Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]] Locality sensitive hashing | + | * Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms|Randomized Algorithms 2]]. Locality sensitive hashing |
− | * Tues Nov 8, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]] Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX | + | * Tues Nov 8, 2016 [[Class meeting for 10-605 Graph Architectures for ML|Graph Architectures for ML]]. Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX |
− | * Thurs Nov 10, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]] Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches | + | * Thurs Nov 10, 2016 [[Class meeting for 10-605 SSL on Graphs|SSL on Graphs]]. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches |
− | * Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]] Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data | + | * Tues Nov 15, 2016 [[Class meeting for 10-605 Unsupervised Learning On Graphs|Unsupervised Learning On Graphs]]. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW. |
− | * Thurs Nov 17, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]] | + | * Thurs Nov 17, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]]. |
− | * Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]] DGMs for naive Bayes, Gibbs sampling for LDA | + | * Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA |
− | ** '''Start work on''' | + | ** '''Start work on''' Assignment 7: LDA with a Parameter Server, draft http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf |
− | * Tues Nov 29, 2016 [[Class meeting for 10-605 LDA|LDA 2]] Parallelizing LDA, Fast sampling for LDA, DGMs for graphs | + | * Tues Nov 29, 2016 [[Class meeting for 10-605 LDA|LDA 2]]. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs |
− | * Thurs Dec 1, 2016 [[Class meeting for 10-605 Scalable Probabilistic Logics|Scalable Probabilistic Logics]] | + | * Thurs Dec 1, 2016 [[Class meeting for 10-605 Scalable Probabilistic Logics|Scalable Probabilistic Logics]]. |
− | * Tues Dec 6, 2016 [[Class meeting for 10-605 Review session for final|Review session for final]] | + | * Tues Dec 6, 2016 [[Class meeting for 10-605 Review session for final|Review session for final]]. |
** '''Last assignment due''' | ** '''Last assignment due''' | ||
− | * Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]] | + | * Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]]. |
Revision as of 17:29, 11 August 2016
This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.
Notes:
- Homeworks, unless otherwise posted, will be due when the next HW comes out.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
- No classes will be held on Oct 4 (Rosh Hashana) or Nov 24 (Thanksgiving)
Schedule:
- Thurs Sep 1, 2016 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
- Tues Sep 6, 2016 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
- Thurs Sep 8, 2016 Streaming Naive Bayes. Notes on scalable naive bayes, Local counting in stream and sort
- Start work on Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf
- Tues Sep 13, 2016 Hadoop Overview. Intro to Hadoop, Hadoop Streaming
- Start work on Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
- Thurs Sep 15, 2016 Workflows For Hadoop 1. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
- Tues Sep 20, 2016 Workflows For Hadoop 2. Similarity joins, Similarity joins with TFIDF, Parallel simjoins, PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
- Start work on Assignment 2: Naive bayes testing in Guinea Pig, draft at https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view
- Thurs Sep 22, 2016 Phrase Finding. Phrase-finding in Pig, Other work with phrases
- Tues Sep 27, 2016 SGD and Hash Kernels. Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
- Thurs Sep 29, 2016 Parallel Perceptrons 1. Debugging ML algorithms
- Start work on Assignment 3: scalable SGD Draft at http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf
- Thurs Oct 6, 2016 Parallel Perceptrons 2. Structured perceptrons, Interative parameter mixing paper
- Tues Oct 11, 2016 SGD for MF. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
- Thurs Oct 13, 2016 Midterm review.
- Last assignment due
- Tues Oct 18, 2016 Midterm.
- Thurs Oct 20, 2016 Subsampling a Graph. Sampling a graph, Local partitioning
- Start work on Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://drive.google.com/file/d/0BzQQ-spWKjhUaWoyOFZHV21uUlU/view
- Tues Oct 25, 2016 Deep Learning 1. Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
- Thurs Oct 27, 2016 Deep Learning 2. Reverse-mode differentiation, Recursive ANNs, Word2vec
- Tues Nov 1, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
- Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016.
- Thurs Nov 3, 2016 Randomized Algorithms 2. Locality sensitive hashing
- Tues Nov 8, 2016 Graph Architectures for ML. Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
- Thurs Nov 10, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
- Tues Nov 15, 2016 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
- Start work on Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
- Thurs Nov 17, 2016 Parameter Servers.
- Tues Nov 22, 2016 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
- Start work on Assignment 7: LDA with a Parameter Server, draft http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf
- Tues Nov 29, 2016 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
- Thurs Dec 1, 2016 Scalable Probabilistic Logics.
- Tues Dec 6, 2016 Review session for final.
- Last assignment due
- Thurs Dec 8, 2016 Final Exam.