Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012"

Revision as of 16:58, 20 October 2011

Overviews [1 week]
- Lecture: Overview of course, cost of various operations, asymptotic analysis
- Lecture: Review of probabilities
Streaming Learning algorithms [1.5 weeks]
- Lecture: Naive Bayes, and a streaming implementation of it (features in memory).
  - Assignment: streaming Naive Bayes w/ features in memory
- Lecture: Naive Bayes and logistic regression.
- Lecture: SGD implementation of LogReg, with lazy regularization
  - Assignment: streaming LogReg w/ features in memory
Stream-and-sort [1.5 week]
- Lecture: Naive Bayes when data's not in memory.
- Lecture: finding informative phrases (with vocab counts in memory).
  - Assignment: stream-and-sort Naive Bayes
- Lecture: messages and records; revisit finding informative phrases.
  - Assignment: finding informative phrases
Map-reduce and Hadoop [1 week]
- Lecture: Alona, using Hadoop
- Lecture: Alona, programming tips
Reducing memory usage with randomized methods.
- Lecture: Locality-sensitive hashing.
- Lecture: Bloom filters for counting events.
- Lecture: Feature hashing and Vowpal Wabbit.

Draft - subject to change!

Week 3. Examples of more complex programs using stream-and-sort.
- Lecture topics:
  - Finding informative phrases in a corpus, and finding polar phrases in a corpus.
  - Using records and messages to manage a complex dataflow.
- Assignment: phrase-finding and sentiment classification

Week 4. The map-reduce paradigm and Hadoop.
- Assignment: Hadoop re-implementation of assignments 1/2.

Week 5. Reducing memory usage with randomized methods.
- Feature hashing and Vowpal Wabbit.
- Bloom filters for counting events.
- Locality-sensitive hashing.
- Assignment: memory-efficient Naive Bayes.

Week 11. Stochastic gradient descent and other streaming learning algorithms.
- SGD for logistic regression.
- Large feature sets SGD: delayed regularization-based updates; projection onto L1; truncated gradients.
- Assignment: Proposal for a one-month project.

@@ Line 6: / Line 6: @@
 * Streaming Learning algorithms [1.5 weeks]
 ** Lecture: Naive Bayes, and a streaming implementation of it (features in memory).
-** '''Assignment: streaming Naive Bayes w/ features in memory'''
+*** '''Assignment: streaming Naive Bayes w/ features in memory'''
 ** Lecture: Naive Bayes and logistic regression.
 ** Lecture: SGD implementation of LogReg, with lazy regularization
-** '''Assignment: streaming LogReg w/ features in memory'''
+*** '''Assignment: streaming LogReg w/ features in memory'''
 * Stream-and-sort [1.5 week]
 ** Lecture: Naive Bayes when data's not in memory.
 ** Lecture: finding informative phrases (with vocab counts in memory).
-** '''Assignment: stream-and-sort Naive Bayes'''
+*** '''Assignment: stream-and-sort Naive Bayes'''
 ** Lecture: messages and records; revisit finding informative phrases.
-** '''Assignment: informative phrases'''
+*** '''Assignment: finding informative phrases'''
 * Map-reduce and Hadoop [1 week]
 ** Lecture: Alona, using Hadoop