Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-405 in Spring 2018"
From Cohen Courses
Jump to navigationJump to search (→Notes) |
VivekShankar (talk | contribs) (→Notes) |
||
Line 8: | Line 8: | ||
* Wed Jan 17, 2018 [[Class meeting for 10-405 Overview|Overview]]. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations | * Wed Jan 17, 2018 [[Class meeting for 10-405 Overview|Overview]]. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations | ||
* Mon Jan 22, 2018 [[Class meeting for 10-405 Probability Review|Probability Review]]. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF | * Mon Jan 22, 2018 [[Class meeting for 10-405 Probability Review|Probability Review]]. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF | ||
− | ** '''Start work on''' Assignment 1a: Streaming NB; Draft at | + | ** '''Start work on''' Assignment 1a: Streaming NB; Draft at https://autolab.andrew.cmu.edu/courses/10405-s18/assessments/hw1astreamingnaivebayes/writeup |
* Wed Jan 24, 2018 [[Class meeting for 10-405 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Alternatives to stream and sort, Local counting in stream and sort, Stream and sort examples | * Wed Jan 24, 2018 [[Class meeting for 10-405 Streaming Naive Bayes|Streaming Naive Bayes]]. Notes on scalable naive bayes, Alternatives to stream and sort, Local counting in stream and sort, Stream and sort examples | ||
* Mon Jan 29, 2018 [[Class meeting for 10-405 Hadoop Overview|Hadoop Overview]]. Intro to Hadoop, Hadoop Streaming, Debugging Hadoop, Combiners | * Mon Jan 29, 2018 [[Class meeting for 10-405 Hadoop Overview|Hadoop Overview]]. Intro to Hadoop, Hadoop Streaming, Debugging Hadoop, Combiners |
Revision as of 11:29, 23 January 2018
This is the syllabus for Machine Learning with Large Datasets 10-405 in Spring 2018.
Notes
- Homeworks, unless otherwise posted, will be due when the next HW comes out.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
- Wed Jan 17, 2018 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
- Mon Jan 22, 2018 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
- Start work on Assignment 1a: Streaming NB; Draft at https://autolab.andrew.cmu.edu/courses/10405-s18/assessments/hw1astreamingnaivebayes/writeup
- Wed Jan 24, 2018 Streaming Naive Bayes. Notes on scalable naive bayes, Alternatives to stream and sort, Local counting in stream and sort, Stream and sort examples
- Mon Jan 29, 2018 Hadoop Overview. Intro to Hadoop, Hadoop Streaming, Debugging Hadoop, Combiners
- Start work on Assignment 1b: Streaming NB on Hadoop; Draft at http://www.cs.cmu.edu/~wcohen/10-405/assignments/2016-fall/hw-1-naivebayes-streaming/main-b.pdf
- Wed Jan 31, 2018 Workflows For Hadoop 1. Scalable classification, Abstracts for map-reduce algorithms, Joins in Hadoop
- Mon Feb 5, 2018 Workflows For Hadoop 2. Guinea Pig intro, Similarity joins, Similarity joins with TFIDF, Parallel simjoins
- Start work on Assignment 2: Naive bayes testing in Guinea Pig; Draft at http://www.cs.cmu.edu/~wcohen/10-405/assignments/2016-fall/hw-2-naivebayes-gpig/main.pdf
- Wed Feb 7, 2018 Workflows For Hadoop 3. PageRank, PageRank in Pig and Guinea Pig, K-means in Pig, Spark, Systems built on top of Hadoop
- Mon Feb 12, 2018 SGD and Hash Kernels. Learning as optimization, Logistic regression with SGD, Regularized SGD, Efficient regularized SGD, Hash kernels for logistic regression
- Wed Feb 14, 2018 Parallel Perceptrons 1. The "delta trick", Averaged perceptrons, Debugging ML algorithms
- Start work on Assignment 3: scalable SGD; Draft at http://www.cs.cmu.edu/~wcohen/10-405/assignments/2016-fall/hw-3-sga-logreg/main.pdf
- Mon Feb 19, 2018 Parallel Perceptrons 2. Hash kernels, Ranking perceptrons, Structured perceptrons
- Wed Feb 21, 2018 Parallel Perceptrons 3. Iterative parameter mixing paper, Parallel SGD via Param Mixing
- Mon Feb 26, 2018 SGD for MF. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
- Wed Feb 28, 2018 Guest lecture - tentative.
- Last assignment due
- Mon Mar 5, 2018 Midterm review and catchup. Midterm review
- Wed Mar 7, 2018 Midterm.
- Mon Mar 19, 2018 Computing with GPUs. Introduction to GPUs, CUDA, Vectorization
- Wed Mar 21, 2018 Deep Learning 1. Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models
- Mon Mar 26, 2018 Deep Learning 2. Reverse-mode differentiation (autodiff), Some systems using autodiff, Details on Wengert lists, Breakdown of xman.py
- Start work on Assignment 4: Autodiff with IPM part 1/2; Draft at http://www.cs.cmu.edu/~wcohen/10-405/assignments/2016-fall/hw-5-autodiff/main.pdf
- Wed Mar 28, 2018 Deep Learning 3. Inputs, parameters, updates, Word2vec and GloVE, Recursive ANNs, Convolutional ANNs, Achitectures using RNNs
- Mon Apr 2, 2018 Randomized Algorithms 1. Bloom filters, The countmin sketch, CM Sketches in Deep Learning
- Wed Apr 4, 2018 Randomized Algorithms 2. Review of Bloom filters, Locality sensitive hashing, Online LSH
- Start work on Assignment 5: Autodiff with IPM part 2/2
- Mon Apr 9, 2018 Graph Architectures for ML. Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
- Wed Apr 11, 2018 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
- Mon Apr 16, 2018 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
- Wed Apr 18, 2018 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
- Mon Apr 23, 2018 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
- Wed Apr 25, 2018 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
- Mon Apr 30, 2018 Review session for final.
- Wed May 2, 2018 Final Exam.