Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016

From Cohen Courses
Revision as of 13:52, 11 August 2016 by Wcohen (talk | contribs)
Jump to navigationJump to search

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.

Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

Schedule:

  • Thurs Sep 1, 2016 Overview Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
  • Tues Sep 6, 2016 Probability Review Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
  • Thurs Sep 8, 2016 Streaming Naive Bayes Notes on scalable naive bayes, Local counting in stream and sort
    • Start work on assignment 1a: streaming NB
  • Tues Sep 13, 2016 Hadoop Overview Intro to Hadoop, Hadoop Streaming
    • Start work on assignment 1b: streaming NB on streaming hadoop
  • Thurs Sep 15, 2016 Workflows For Hadoop 1 Scalably using out-of-memory-scale classifiers, Abstracts for map-reduce algorithms, Joins in Hadoop, TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins
  • Tues Sep 20, 2016 Workflows For Hadoop 2 Similarity joins with TFIDF, Parallel simjoins, PageRank in Pig, K-means in Pig, Spark, Systems built on top of Hadoop
    • Start work on assignment 2: naive bayes testing in guinea pig
  • Thurs Sep 22, 2016 Phrase Finding Phrase-finding in Pig, Other work with phrases
  • Tues Sep 27, 2016 SGD and Hash Kernels Learning as optimization, Logistic regression with SGD, Regularized SGD, Hash kernels for logistic regression
  • Thurs Sep 29, 2016 Parallel Perceptrons 1 Debugging ML algorithms
    • Start work on assignment 3: scalable sgd system
  • Thurs Oct 6, 2016 Parallel Perceptrons 2 Structured perceptrons, Interative parameter mixing paper
  • Tues Oct 11, 2016 SGD for MF Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
  • Thurs Oct 13, 2016 Midterm review
    • Last assignment due
  • Tues Oct 18, 2016 Midterm
  • Thurs Oct 20, 2016 Subsampling a Graph Sampling a graph, Local partitioning
    • Start work on assignment 4: graph subsampling
  • Tues Oct 25, 2016 Deep Learning 1 Deep learning intro, Deep learning and GPUs, Expressiveness of MLPs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 27, 2016 Deep Learning 2 Reverse-mode differentiation, Recursive ANNs, Word2vec
  • Tues Nov 1, 2016 Randomized Algorithms 1 Bloom filters, The countmin sketch
    • Start work on assignment 5: autodiff with IPM
  • Thurs Nov 3, 2016 Randomized Algorithms 2 Locality sensitive hashing
  • Tues Nov 8, 2016 Graph Architectures for ML Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
  • Thurs Nov 10, 2016 SSL on Graphs Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
  • Tues Nov 15, 2016 Unsupervised Learning On Graphs Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
    • Start work on assignment 6: graphX for SSL
  • Thurs Nov 17, 2016 Parameter Servers
  • Tues Nov 22, 2016 LDA 1 DGMs for naive Bayes, Gibbs sampling for LDA
    • Start work on assignment 7: LDA with parameter servers
  • Tues Nov 29, 2016 LDA 2 Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
  • Thurs Dec 1, 2016 Scalable Probabilistic Logics
  • Tues Dec 6, 2016 Review session for final
    • Last assignment due
  • Thurs Dec 8, 2016 Final Exam