Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012"

From Cohen Courses
Jump to navigationJump to search
Line 6: Line 6:
 
* Thus Jan 19. [[Class meeting for 10-605 2012 01 19|Review of probabilities.]]
 
* Thus Jan 19. [[Class meeting for 10-605 2012 01 19|Review of probabilities.]]
 
* Tues Jan 24. Streaming algorithms and Naive Bayes.
 
* Tues Jan 24. Streaming algorithms and Naive Bayes.
** '''New Assignment: streaming Naive Bayes 1 (with feature counts in memory)'''
+
** ''New Assignment: streaming Naive Bayes 1 (with feature counts in memory)''
 
* Thus Jan 26. The stream-and-sort design pattern; Naive Bayes revisited.
 
* Thus Jan 26. The stream-and-sort design pattern; Naive Bayes revisited.
 
* Tues Jan 31. Messages and records 1; Phrase finding.
 
* Tues Jan 31. Messages and records 1; Phrase finding.
 
** '''Assignment due: streaming Naive Bayes 1 (with feature counts in memory)'''
 
** '''Assignment due: streaming Naive Bayes 1 (with feature counts in memory)'''
** '''New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort'''
+
** ''New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort''
  
 
== February ==
 
== February ==
Line 17: Line 17:
 
* Tues Feb 7. Other streaming algorithms: voted perceptron, Rocchio; averaging.
 
* Tues Feb 7. Other streaming algorithms: voted perceptron, Rocchio; averaging.
 
** '''Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort'''
 
** '''Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort'''
** '''New Assignment: phrase finding with stream-and-sort'''
+
** ''New Assignment: phrase finding with stream-and-sort''
 
* Thus Feb 9. Map-reduce and Hadoop 1 (Alona lecture).
 
* Thus Feb 9. Map-reduce and Hadoop 1 (Alona lecture).
 
* Tues Feb 14. Map-reduce and Hadoop 2. (Alona lecture).
 
* Tues Feb 14. Map-reduce and Hadoop 2. (Alona lecture).
 
** '''Assignment due: phrase finding with stream-and-sort'''
 
** '''Assignment due: phrase finding with stream-and-sort'''
** '''New Assignment: Naive Bayes with Hadoop'''
+
** ''New Assignment: Naive Bayes with Hadoop''
 
* Thus Feb 16. Naive Bayes and Logistic regression.
 
* Thus Feb 16. Naive Bayes and Logistic regression.
 
* Tues Feb 21. Logistic regression with stochastic gradient descent.
 
* Tues Feb 21. Logistic regression with stochastic gradient descent.
** '''New Assignment: Phrase-finding with Hadoop'''
+
** ''New Assignment: Phrase-finding with Hadoop''
 
* Thus Feb 23. Other SGD algorithms; parallelizing SGD.
 
* Thus Feb 23. Other SGD algorithms; parallelizing SGD.
 
* Tues Feb 28. Bloom Filters and Locality sensitive hashing 1.
 
* Tues Feb 28. Bloom Filters and Locality sensitive hashing 1.
 
** '''Hadoop assignments due'''
 
** '''Hadoop assignments due'''
** '''New Assignment: memory-efficient SGD'''
+
** ''New Assignment: memory-efficient SGD''
  
 
== March ==
 
== March ==
Line 34: Line 34:
 
* Thus Mar 1. Bloom Filters and Locality sensitive hashing 2.
 
* Thus Mar 1. Bloom Filters and Locality sensitive hashing 2.
 
* Tues Mar 6. Learning on graphs. PageRank, Harmonic field, RWR.
 
* Tues Mar 6. Learning on graphs. PageRank, Harmonic field, RWR.
** '''Assignment: mini-project proposals 1.'''
+
** '''Assignment due: memory-efficient SGD'''
 +
** ''New assignment: mini-project proposals (first draft).''
 
* Thus Mar 8. Tools and design patterns for graphs (Pregel, GraphLab, Schimmy, ...)
 
* Thus Mar 8. Tools and design patterns for graphs (Pregel, GraphLab, Schimmy, ...)
 
* Tues Mar 13. ''no class - spring break.''
 
* Tues Mar 13. ''no class - spring break.''
 
* Thus Mar 15. ''no class - spring break.''
 
* Thus Mar 15. ''no class - spring break.''
 
* Tues Mar 20. Spectral clustering and PIC.
 
* Tues Mar 20. Spectral clustering and PIC.
** '''Assignment: Subsampling and visualizing a graph.'''
+
** '''Assignment due: mini-project proposals (first draft).'''
 +
** ''New Assignment: Subsampling and visualizing a graph.''
 
* Thus Mar 22. Gibbs sampling and LDA 1.
 
* Thus Mar 22. Gibbs sampling and LDA 1.
 
* Tues Mar 27. Gibbs sampling and LDA 2.
 
* Tues Mar 27. Gibbs sampling and LDA 2.
** '''Assignment: mini-project proposals 2.'''
+
** '''Assignment due: Subsampling and visualizing a graph.'''
 +
** ''New Assignment: mini-project proposals (final version)''
 
* Thus Mar 29. KNN classification and inverted indices.
 
* Thus Mar 29. KNN classification and inverted indices.
** '''Assignment: mini-project proposals 2 are due.'''
+
** '''Assignment due: mini-project proposals (final version).'''
  
 
== April ==
 
== April ==

Revision as of 13:33, 17 January 2012

This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012.

January

  • Tues Jan 17. Overview of course, cost of various operations, asymptotic analysis.
  • Thus Jan 19. Review of probabilities.
  • Tues Jan 24. Streaming algorithms and Naive Bayes.
    • New Assignment: streaming Naive Bayes 1 (with feature counts in memory)
  • Thus Jan 26. The stream-and-sort design pattern; Naive Bayes revisited.
  • Tues Jan 31. Messages and records 1; Phrase finding.
    • Assignment due: streaming Naive Bayes 1 (with feature counts in memory)
    • New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort

February

  • Thus Feb 2. Messages and records 2; Phrase finding.
  • Tues Feb 7. Other streaming algorithms: voted perceptron, Rocchio; averaging.
    • Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
    • New Assignment: phrase finding with stream-and-sort
  • Thus Feb 9. Map-reduce and Hadoop 1 (Alona lecture).
  • Tues Feb 14. Map-reduce and Hadoop 2. (Alona lecture).
    • Assignment due: phrase finding with stream-and-sort
    • New Assignment: Naive Bayes with Hadoop
  • Thus Feb 16. Naive Bayes and Logistic regression.
  • Tues Feb 21. Logistic regression with stochastic gradient descent.
    • New Assignment: Phrase-finding with Hadoop
  • Thus Feb 23. Other SGD algorithms; parallelizing SGD.
  • Tues Feb 28. Bloom Filters and Locality sensitive hashing 1.
    • Hadoop assignments due
    • New Assignment: memory-efficient SGD

March

  • Thus Mar 1. Bloom Filters and Locality sensitive hashing 2.
  • Tues Mar 6. Learning on graphs. PageRank, Harmonic field, RWR.
    • Assignment due: memory-efficient SGD
    • New assignment: mini-project proposals (first draft).
  • Thus Mar 8. Tools and design patterns for graphs (Pregel, GraphLab, Schimmy, ...)
  • Tues Mar 13. no class - spring break.
  • Thus Mar 15. no class - spring break.
  • Tues Mar 20. Spectral clustering and PIC.
    • Assignment due: mini-project proposals (first draft).
    • New Assignment: Subsampling and visualizing a graph.
  • Thus Mar 22. Gibbs sampling and LDA 1.
  • Tues Mar 27. Gibbs sampling and LDA 2.
    • Assignment due: Subsampling and visualizing a graph.
    • New Assignment: mini-project proposals (final version)
  • Thus Mar 29. KNN classification and inverted indices.
    • Assignment due: mini-project proposals (final version).

April

  • Tues Apr 3. Decision trees and random forests 1.
  • Thus Apr 5. Decision trees and random forests 2.
  • Tues Apr 10. Soft joins with KNN and inverted indices 1.
  • Thus Apr 12. Soft joins with KNN and inverted indices 1.
  • Tues Apr 17. Structured prediction 1.
  • Thus Apr 19. no class - Carnival
  • Tues Apr 24. Structured prediction 2.
  • Thus Apr 26. Additional topics.

May

  • Tues May 1. Project reports.
  • Thus May 3. Project reports.