Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016"
From Cohen Courses
Jump to navigationJump to search(13 intermediate revisions by 2 users not shown) | |||
Line 24: | Line 24: | ||
Schedule for lectures and 605 assignments: | Schedule for lectures and 605 assignments: | ||
− | * Tues Aug 30, 2016 [[Class meeting for 10-605 Overview|Overview]]. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations | + | * Tues Aug 30, 2016 [[Class meeting for 10-605 in Fall 2016 Overview|Overview]]. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations |
− | * Thurs Sep 1, 2016 [[Class meeting for 10-605 Probability Review|Probability Review]]. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF | + | * Thurs Sep 1, 2016 [[Class meeting for 10-605 in Fall 2016 Probability Review|Probability Review]]. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF |
** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup]. | ** '''Start work on''' Assignment 1a: Streaming NB. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/2016-fall/hashtable-nb.pdf Writeup]. | ||
− | * Tues Sep 6, 2016 [[Class meeting for 10-605 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Local counting in stream and sort | + | * Tues Sep 6, 2016 [[Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Local counting in stream and sort |
− | * Thurs Sep 8, 2016 [[Class meeting for 10-605 Hadoop Overview|Hadoop Overview]]. Intro to Hadoop, Hadoop Streaming | + | * Thurs Sep 8, 2016 [[Class meeting for 10-605 in Fall 2016 Hadoop Overview|Hadoop Overview]]. Intro to Hadoop, Hadoop Streaming |
** '''Start work on''' Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup | ** '''Start work on''' Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup | ||
* Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]]. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig | * Tues Sep 13, 2016 [[Class meeting for 10-605 Workflows For Hadoop|Workflows For Hadoop 1]]. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig | ||
Line 62: | Line 62: | ||
* Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA | * Tues Nov 22, 2016 [[Class meeting for 10-605 LDA|LDA 1]]. DGMs for naive Bayes, Gibbs sampling for LDA | ||
* Tues Nov 29, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]]. | * Tues Nov 29, 2016 [[Class meeting for 10-605 Parameter Servers|Parameter Servers]]. | ||
− | ** '''Start work on''' Assignment 7: LDA with a Parameter Server, Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw7lda/attachments/ | + | ** '''Start work on''' Assignment 7: LDA with a Parameter Server, Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw7lda/attachments/677 |
* Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]]. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs | * Thurs Dec 1, 2016 [[Class meeting for 10-605 LDA|LDA 2]]. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs | ||
* Tues Dec 6, 2016 [[Class meeting for 10-605 Project Reports|Project Reports]]. | * Tues Dec 6, 2016 [[Class meeting for 10-605 Project Reports|Project Reports]]. | ||
** '''Last assignment due''' | ** '''Last assignment due''' | ||
− | * Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]]. | + | * Thurs Dec 8, 2016 [[Class meeting for 10-605 Final Exam|Final Exam]]. Note that we've posted: |
+ | ** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf practice questions from final, 2014] | ||
+ | ** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf practice questions for final, 2015] | ||
+ | ** Comments: | ||
+ | *** Most of the exam (approximately 80%) covers material from after the midterm. | ||
+ | *** You may bring in '''two''' 8 1/2 by 11 sheets of paper with notes. |
Latest revision as of 11:54, 11 August 2017
This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.
Notes:
- Homeworks, unless otherwise posted, will be due when the next HW comes out.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
- Classes are cancelled for Oct 27
- No classes will be held on Nov 24 (Thanksgiving)
Schedule for 805 projects:
- 11:59pm Sun 10/2: Initial 805 project proposal due.
- 11:59pm Sun 10/16: Final 805 project proposal due.
- This is a revised writeup that will address any comments William raises from the initial proposal.
- 11:59pm Sun 11/13: Midterm 805 project report due.
- 1:30-2:50pm Tues 12/6: Project presentations (in class). One presentation per group, 12minutes per presentation. Please send your slide deck to William by 10am 12/6 (PDF is best).
- 11:59pm Sun 12/11: Final 805 project writeup due.
Schedule for lectures and 605 assignments:
- Tues Aug 30, 2016 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
- Thurs Sep 1, 2016 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
- Start work on Assignment 1a: Streaming NB. Writeup.
- Tues Sep 6, 2016 Streaming Naive Bayes. Notes on scalable naive bayes, Local counting in stream and sort
- Thurs Sep 8, 2016 Hadoop Overview. Intro to Hadoop, Hadoop Streaming
- Start work on Assignment 1b: Streaming NB on Hadoop. Draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw1bhadoopnaivebayes/writeup
- Tues Sep 13, 2016 Workflows For Hadoop 1. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig
- Thurs Sep 15, 2016 Workflows For Hadoop 2. Similarity joins, Similarity joins with TFIDF, Parallel simjoins
- Start work on Assignment 2: Naive bayes testing in Guinea Pig, draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw2nbwithguineapig/writeup
- Tues Sep 20, 2016 Workflows For Hadoop 3. PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
- Thurs Sep 22, 2016 Phrase Finding. Phrase-finding in Pig, Other work with phrases
- Tues Sep 27, 2016 SGD and Hash Kernels. Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
- Thurs Sep 29, 2016 Parallel Perceptrons 1. Also wrapup for SGD, debugging ML algorithms
- Start work on Assignment 3: scalable SGD at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw3sgd/writeup
- Tues Oct 4, 2016 Parallel Perceptrons 2.
- Thurs Oct 6, 2016 Parallel Perceptrons 3. Structured perceptrons, Interative parameter mixing paper
- Tues Oct 11, 2016 SGD for MF. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
- Thurs Oct 13, 2016 Midterm review.
- practice questions for midterm from 2015. This document also references the relevant questions from two previous review sheets:
- Last assignment due
- Tues Oct 18, 2016 Midterm.
- Thurs Oct 20, 2016 Subsampling a Graph. Sampling a graph, Local partitioning
- Start work on Assignment 4: Subsampling a Graph with Approximate PageRank, draft at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw4approximatepagerank/writeup
- Tues Oct 25, 2016 Deep Learning 1. Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
- Thurs Oct 27, 2016. No class.
- Tues Nov 1, 2016 Deep Learning 2. Reverse-mode differentiation, Recursive ANNs, Word2vec
- Thurs Nov 3, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
- Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016. View writeup at https://github.com/KarandeepJohar/10605-f16-hw5/blob/master/automatic-reverse-mode.pdf
- Tues Nov 8, 2016 Randomized Algorithms 2. Locality sensitive hashing
- Thurs Nov 10, 2016 Graph Architectures for ML. Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
- Tues Nov 15, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
- Thurs Nov 17, 2016 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
- Start work on Assignment 6: Phrase-finding with Spark. Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw6phrasefindingwithspark/writeup
- Tues Nov 22, 2016 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
- Tues Nov 29, 2016 Parameter Servers.
- Start work on Assignment 7: LDA with a Parameter Server, Writeup at https://autolab.andrew.cmu.edu/courses/10605-f16/assessments/hw7lda/attachments/677
- Thurs Dec 1, 2016 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
- Tues Dec 6, 2016 Project Reports.
- Last assignment due
- Thurs Dec 8, 2016 Final Exam. Note that we've posted:
- practice questions from final, 2014
- practice questions for final, 2015
- Comments:
- Most of the exam (approximately 80%) covers material from after the midterm.
- You may bring in two 8 1/2 by 11 sheets of paper with notes.