Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012"

Revision as of 17:37, 20 October 2011

Overviews [1 week]
- Lecture: Overview of course, cost of various operations, asymptotic analysis
- Lecture: Review of probabilities
Streaming Learning algorithms [2 weeks]
- Lecture: Naive Bayes, and a streaming implementation of it (features in memory).
  - Assignment: streaming Naive Bayes w/ features in memory
- Lecture: Naive Bayes and logistic regression.
- Lecture: SGD implementation of LogReg, with lazy regularization
  - Assignment: streaming LogReg w/ features in memory
- Lecture: other streaming methods - the perceptron algorithm and Rocchio's algorithm.
Stream-and-sort [1.5 week]
- Lecture: Naive Bayes when data's not in memory.
  - Assignment: stream-and-sort Naive Bayes (Twitter emoticon data?)
- Lecture: finding informative phrases (with vocab counts in memory).
- Lecture: messages and records; revisit finding informative phrases.
  - Assignment: finding informative phrases (Google books data)
Map-reduce and Hadoop [1 week]
- Lecture: Alona, Map-reduce
- Lecture: Alona, Hadoop and map-reduce
  - Assignment: finding informative phrases
Reducing memory usage with randomized methods [1.5 weeks]
- Lecture: Locality-sensitive hashing.
- Lecture: Bloom filters for counting events.
  - Assignment: LSH transformation of datasets
- Lecture: Vowpal Wabbit and the hashing trick.

Week 11. Stochastic gradient descent and other streaming learning algorithms.
- SGD for logistic regression.
- Large feature sets SGD: delayed regularization-based updates; projection onto L1; truncated gradients.
- Assignment: Proposal for a one-month project.

@@ Line 13: / Line 13: @@
 * Stream-and-sort [1.5 week]
 ** Lecture: Naive Bayes when data's not in memory.
-*** '''Assignment: stream-and-sort Naive Bayes'''
+*** '''Assignment: stream-and-sort Naive Bayes''' (Twitter emoticon data?)
 ** Lecture: finding informative phrases (with vocab counts in memory).
 ** Lecture: messages and records; revisit finding informative phrases.
-*** '''Assignment: finding informative phrases'''
+*** '''Assignment: finding informative phrases''' (Google books data)
 * Map-reduce and Hadoop [1 week]
 ** Lecture: Alona, Map-reduce