Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017
From Cohen Courses
Revision as of 10:49, 2 August 2017 by Wcohen (talk | contribs) (Created page with "This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016. ---- Notes: * Homeworks, unless otherwise posted, will be due when the next HW come...")
This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.
Notes:
- Homeworks, unless otherwise posted, will be due when the next HW comes out.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
- Classes are cancelled for Sept 21 (Rosh Hashana)
- No classes will be held on Nov 23 (Thanksgiving)
Schedule for 805 projects:
- 11:59pm Sun 10/1: Initial 805 project proposal due.
- 11:59pm Sun 10/15: Final 805 project proposal due.
- This is a revised writeup that will address any comments William raises from the initial proposal.
- 11:59pm Sun 11/12: Midterm 805 project report due.
- 1:30-2:50pm Tues 12/5: Project presentations (in class).
- 11:59pm Sun 12/10: Final 805 project writeup due.
Tentative schedule for lectures and 605 assignments:
- Tues Aug 30, 2016 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
- Thurs Sep 1, 2016 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio a\
nd TFIDF
- Start work on Assignment 1a: Streaming NB. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf
- Tues Sep 6, 2016 Streaming Naive Bayes. Notes on scalable naive bayes, Local counting in stream and sort
- Thurs Sep 8, 2016 Hadoop Overview. Intro to Hadoop, Hadoop Streaming
- Start work on Assignment 1b: Streaming NB on Hadoop. Draft at http://www.cs.cmu.edu/~wcohen/10-605/assignments/stream-nb.pdf, https://drive.google.com/file/d\
/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view
- Tues Sep 13, 2016 Workflows For Hadoop 1. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-r\
educe algorithms, Joins in Hadoop
- Thurs Sep 15, 2016 Workflows For Hadoop 2. TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins, \
Similarity joins with TFIDF, Parallel simjoins
- Start work on Assignment 2: Naive bayes testing in Guinea Pig, draft at https://drive.google.com/file/d/0B-p8_eIVeEHFM1JOSGFWNFFJcU0/view
- Tues Sep 20, 2016 Workflows For Hadoop 3. PageRank in Pig, K-means in Pig, Spark
- Thurs Sep 22, 2016 Phrase Finding. Systems built on top of Hadoop, Phrase-finding in Pig, Other work with phrases
- Tues Sep 27, 2016 SGD and Hash Kernels. Learning as optimization, Logistic regression with SGD, Regularized SGD, H\
ash kernels for logistic regression
- Thurs Sep 29, 2016 Parallel Perceptrons 1. Debugging ML algorithms
- Start work on Assignment 3: scalable SGD Draft at http://curtis.ml.cmu.edu/w/courses/images/8/86/Sgd_fall15.pdf
- Tues Oct 4, 2016 Parallel Perceptrons 2.
- Thurs Oct 6, 2016 Parallel Perceptrons 3. Structured perceptrons, Interative parameter mixing paper
- Tues Oct 11, 2016 SGD for MF. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
- Thurs Oct 13, 2016 Midterm review.
- Last assignment due
- Tues Oct 18, 2016 Midterm.
- Thurs Oct 20, 2016 Subsampling a Graph. Sampling a graph, Local partitioning
- Start work on Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://drive.google.com/file/d/0BzQQ-spWKjhUaWoyOFZHV21uUlU/view
- Tues Oct 25, 2016 Deep Learning 1. Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning\
and GPUs, Exploding and vanishing gradients, Modern deep learning models
- Thurs Oct 27, 2016 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, \
Breakdown of xman.py, Recursive ANNs, Convolutional ANNs
- Tues Nov 1, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
- Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016.
- Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\
]. Review of Bloom filters, Locality sensitive hashing
- Tues Nov 8, 2016 Graph Architectures for ML. Graph-based ML architectures, Pregel, Signal-collect, GraphLab,\
PowerGraph, GraphChi, GraphX
- Thurs Nov 10, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\
sorption SSL method, MAD with countmin sketches
- Tues Nov 15, 2016 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Lab\
el propagation for clustering non-graph data, Label propagation for SSL on non-graph data
- Start work on Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
- Thurs Nov 17, 2016 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Manage\
d Communication in PS, LDA Sampler with PS
- Tues Nov 22, 2016 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
- Start work on Assignment 7: LDA with a Parameter Server, draft http://curtis.ml.cmu.edu/w/courses/images/1/16/Hw7-lda-ps.pdf
- Thurs Nov 24, 2016 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
- Tues Nov 29, 2016 Review session for final.
- Last assignment due
- Thurs Dec 1, 2016 Final Exam.