Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017

From Cohen Courses
Revision as of 10:51, 2 August 2017 by Wcohen (talk | contribs)
Jump to navigationJump to search

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

Schedule for 805 projects:



Tentative schedule for lectures and 605 assignments:

  • Tues Aug 30, 2016 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
  • Thurs Sep 1, 2016 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio a\

nd TFIDF

/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view

  • Tues Sep 13, 2016 Workflows For Hadoop 1. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-reduce algorithms, Joins in Hadoop
  • Thurs Sep 15, 2016 Workflows For Hadoop 2. TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins,

Similarity joins with TFIDF, Parallel simjoins

ash kernels for logistic regression

and GPUs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 27, 2016 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists,

Breakdown of xman.py, Recursive ANNs, Convolutional ANNs

  • Tues Nov 1, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
    • Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016.
  • Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\

]. Review of Bloom filters, Locality sensitive hashing

PowerGraph, GraphChi, GraphX
  • Thurs Nov 10, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\

sorption SSL method, MAD with countmin sketches

  • Tues Nov 15, 2016 Unsupervised Learning On Graphs. Spectral clustering, Power iteration clustering, Label propagation for clustering non-graph data, Label propagation for SSL on non-graph data
    • Start work on Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
  • Thurs Nov 17, 2016 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Manage\

d Communication in PS, LDA Sampler with PS