Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017

From Cohen Courses
Revision as of 10:49, 2 August 2017 by Wcohen (talk | contribs) (Created page with "This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016. ---- Notes: * Homeworks, unless otherwise posted, will be due when the next HW come...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016.


Notes:

  • Homeworks, unless otherwise posted, will be due when the next HW comes out.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.
  • Classes are cancelled for Sept 21 (Rosh Hashana)
  • No classes will be held on Nov 23 (Thanksgiving)

Schedule for 805 projects:



Tentative schedule for lectures and 605 assignments:

  • Tues Aug 30, 2016 Overview. Grading policies and etc, History of Big Data, Complexity theory and cost of important operations
  • Thurs Sep 1, 2016 Probability Review. Counting for big data and density estimation, streaming Naive Bayes, Rocchio a\

nd TFIDF

/0BzQQ-spWKjhUd0NXSTB6TW82LWM/view

  • Tues Sep 13, 2016 Workflows For Hadoop 1. Scalable classification, Scalable Rocchio and TFIDF, Abstracts for map-r\

educe algorithms, Joins in Hadoop

  • Thurs Sep 15, 2016 Workflows For Hadoop 2. TFIDF in Pig, Guinea Pig intro, TFIDF in Guinea Pig, Similarity joins, \

Similarity joins with TFIDF, Parallel simjoins

ash kernels for logistic regression

and GPUs, Exploding and vanishing gradients, Modern deep learning models
  • Thurs Oct 27, 2016 Deep Learning 2. Reverse-mode differentiation, Some systems using autodiff, Details on Wengert lists, \

Breakdown of xman.py, Recursive ANNs, Convolutional ANNs

  • Tues Nov 1, 2016 Randomized Algorithms 1. Bloom filters, The countmin sketch
    • Start work on Assignment 5: Autodiff with IPM. This is a new assignment for Fall 2016.
  • Thurs Nov 3, 2016 [[Class meeting for 10-605 Randomized Algorithms 2 - someday, redo the count-min stuff|Randomized Algorithms 2 - someday, redo the count-min stuff]\

]. Review of Bloom filters, Locality sensitive hashing

PowerGraph, GraphChi, GraphX
  • Thurs Nov 10, 2016 SSL on Graphs. Semi-supervised learning intro, Multirank-walk SSL method, Harmonic fields, Modified Ad\

sorption SSL method, MAD with countmin sketches

el propagation for clustering non-graph data, Label propagation for SSL on non-graph data

    • Start work on Assignment 6: To be decided, possibly using Spark/GraphX to do PIC or MRW.
  • Thurs Nov 17, 2016 Parameter Servers. Parameter servers, PS vs Hadoop, State Synchronous Parallel (SSP) model, Manage\

d Communication in PS, LDA Sampler with PS