Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017

From Cohen Courses
Revision as of 17:50, 4 October 2017 by Wcohen (talk | contribs)
Jump to navigationJump to search

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

805 Project Schedule

Schedule for 805 projects:


Schedule for lectures and 605 assignments

  • Tues Aug 29, 2017 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
  • Thurs Aug 31, 2017 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio and TFIDF
    • Start work on Assignment 1a: Streaming NB; writeup here
  • Tues Sep 5, 2017 Streaming Naive Bayes. Notes on scalable naive bayes, Alternatives to stream and sort, Local counting in stream and sort, Stream and sort examples
  • Thurs Sep 7, 2017 Hadoop Overview. Intro to Hadoop, Hadoop Streaming, Debugging Hadoop, Combiners
    • Start work on Assignment 1b: Streaming NB on Hadoop; writeup here
  • Tues Sep 12, 2017 Workflows For Hadoop 1. Scalable classification, Abstracts for map-reduce algorithms, Joins in Hadoop
  • Thurs Sep 14, 2017 Workflows For Hadoop 2. Guinea Pig intro, Similarity joins, Similarity joins with TFIDF
    • Start work on Assignment 2: Naive bayes testing in Guinea Pig; writeup here (Login to Autolab before following the link.)
  • Tues Sep 19, 2017 Workflows For Hadoop 3. PageRank, Spark, Phrase finding
  • Tues Sep 26, 2017 SGD and Hash Kernels. Learning as optimization, Logistic regression with SGD, Regularized SGD, Efficient regularized SGD, Hash kernels for logistic regression
  • Thurs Sep 28, 2017 Parallel Perceptrons 1. The "delta trick", Averaged perceptrons, Debugging ML algorithms
    • Start work on Assignment 3: scalable SGD; writeup here
  • Tues Oct 3, 2017 Parallel Perceptrons 2. Hash kernels, Ranking perceptrons
  • Thurs Oct 5, 2017 Parallel Perceptrons 3. Structured perceptrons, Interative parameter mixing paper
  • Tues Oct 10, 2017 SGD for MF. Matrix factorization, Matrix factorization with SGD, distributed matrix factorization with SGD
  • Thurs Oct 12, 2017 Midterm review and catchup. Midterm review
    • Last assignment due
  • Tues Oct 17, 2017 Midterm.
  • Thurs Oct 19, 2017 Computing with GPUs.
  • Tues Oct 24, 2017 Deep Learning 1. Deep learning intro, BackProp following Nielson, Expressiveness of MLPs, Deep learning and GPUs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 26, 2017 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, Breakdown of xman.py
  • Tues Oct 31, 2017 Deep Learning 3. Recursive ANNs, Convolutional ANNs
  • Thurs Nov 2, 2017 Randomized Algorithms 1. Bloom filters, The countmin sketch
  • Tues Nov 7, 2017 Randomized Algorithms 2. Review of Bloom filters, Locality sensitive hashing, Online LSH
    • Start work on Assignment 5: Autodiff with IPM part 2/2
  • Thurs Nov 9, 2017 Graph Architectures for ML. Graph-based ML architectures, Pregel, Signal-collect, GraphLab, PowerGraph, GraphChi, GraphX
  • Tues Nov 14, 2017 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Adsorption SSL method, MAD with countmin sketches
    • Start work on Assignment 6: SSL on a graph in Spark maybe using NELL data?
  • Thurs Nov 16, 2017 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Managed Communication in PS, LDA Sampler with PS
  • Tues Nov 21, 2017 LDA 1. DGMs for naive Bayes, Gibbs sampling for LDA
  • Tues Nov 28, 2017 LDA 2. Parallelizing LDA, Fast sampling for LDA, DGMs for graphs
  • Thurs Nov 30, 2017 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
  • Tues Dec 5, 2017 Review session for final.
    • Last assignment due
  • Thurs Dec 7, 2017 Final Exam.