Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012"

From Cohen Courses
Jump to navigationJump to search
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This is the syllabus for [[Machine Learning with Large Datasets 10-605 in Spring 2012]].
+
This is the syllabus for [[Machine Learning with Large Datasets 10-605 in Spring 2012]]. '''If you're taking 10-605 now, you're probably looking for the syllabus for  [[Machine Learning with Large Datasets 10-605 in Spring 2013]].'''
  
 
== January ==
 
== January ==
Line 21: Line 21:
 
* Tues Feb 14.  [[Class meeting for 10-605 2012 02 14|Map-reduce and Hadoop 2. (Alona lecture, William is closer)]].
 
* Tues Feb 14.  [[Class meeting for 10-605 2012 02 14|Map-reduce and Hadoop 2. (Alona lecture, William is closer)]].
 
** '''Assignment due 2/15: phrase finding with stream-and-sort'''
 
** '''Assignment due 2/15: phrase finding with stream-and-sort'''
** ''New Assignment: Naive Bayes with Hadoop''
+
** ''New Assignment: Naive Bayes with Hadoop & Phrase-finding with Hadoop'' [http://www.cs.cmu.edu/~afyshe/Assignment4.pdf PDF Handout]
** ''New Assignment: Phrase-finding with Hadoop''
+
* Thus Feb 16. [[Class meeting for 10-605 2012 02 16|Hadoop helpers and Scalable SGD]]
* Thus Feb 16. [[Class meeting for 10-605 2012 02 18|Hadoop helpers and Scalable SGD 1]]
+
* Tues Feb 21. [[Class meeting for 10-605 2012 02 21|Scalable SGD and Hash Kernels]]
* Tues Feb 21. Scalable SGD 2
+
* Thus Feb 23. ''Guest lecture'': [http://www.cs.umass.edu/~ronb/ Ron Bekkerman], LinkedIn, Scaling up Machine Learning
* Thus Feb 23. ''Guest lecture'': Ron Bekkerman, LinkedIn, Scaling up Machine Learning
+
** [http://www.cs.cmu.edu/~wcohen/10-605/2012-02-23-bekkerman.pptx Ron's slides in Powerpoint]
* Tues Feb 28. Bloom Filters and Locality sensitive hashing 1.
+
** [http://www.cs.cmu.edu/~wcohen/10-605/2012-02-23-bekkerman.pdf Ron's slides in PDF]
** '''Hadoop assignments due'''
+
* Tues Feb 28. [[Class meeting for 10-605 2012 02 28|Background on randomized algorithms; Graph computations 1.]]
** ''New Assignment: memory-efficient SGD''
 
  
 
== March ==
 
== March ==
  
 
* Thus Mar 1.  ''Guest Lecture'': Ben van Durme, JHU, Randomized Algorithms for Large-Scale Learning
 
* Thus Mar 1.  ''Guest Lecture'': Ben van Durme, JHU, Randomized Algorithms for Large-Scale Learning
* Tues Mar 6. Learning on graphs. PageRank, Harmonic field, RWR; tools and design patterns for graphs (Pregel, GraphLab, Schimmy, ...)
+
* Tues Mar 6. [[Class meeting for 10-605 2012 03 06|Learning on graphs 2]].  
** '''Assignment due: memory-efficient SGD'''
+
** '''Hadoop assignments due'''
** ''New assignment: mini-project proposals (first draft).''
+
** ''New Assignment: memory-efficient SGD'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/sgd.pdf PDF writeup]
* Thus Mar 8. ''Guest Lecture'': Joey Gonzales, CMU, GraphLab and Dynamic Asynchronous Computation
+
** ''New assignment: initial project proposals.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/initial-project-proposal.pdf PDF writeup]
 +
* Thus Mar 8. ''Guest Lecture'': Joey Gonzales, CMU, GraphLab and Dynamic Asynchronous Computation [http://www.cs.cmu.edu/~jegonzal/talks/biglearning_with_graphs.pptx PPT slides]
 
* Tues Mar 13. ''no class - spring break.''
 
* Tues Mar 13. ''no class - spring break.''
 
* Thus Mar 15. ''no class - spring break.''
 
* Thus Mar 15. ''no class - spring break.''
* Tues Mar 20. Spectral clustering and PIC.
+
* Tues Mar 20. [[Class meeting for 10-605 2012 03 20|Subsampling a graph with RWR]]
** '''Assignment due: mini-project proposals (first draft).'''
+
** '''Assignment due: initial mini-project proposals.'''
** ''New Assignment: Subsampling and visualizing a graph.''
+
** '''Assignment due: memory-efficient SGD'''
* Thus Mar 22. Tentative: Guest lecture by U Kang, CMU.
+
** ''New Assignment: Subsampling and visualizing a graph.'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/snowball.pdf PDF writeup]
* Tues Mar 27. Gibbs sampling and LDA.
+
* Thus Mar 22. [[Class meeting for 10-605 2012 03 22|Semi-supervised learning via label propagation on graphs]]
 +
* Tues Mar 27. [[Class meeting for 10-605 2012 03 27|Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs]]
 
** '''Assignment due: Subsampling and visualizing a graph.'''
 
** '''Assignment due: Subsampling and visualizing a graph.'''
 
** ''New Assignment: mini-project proposals (final version)''
 
** ''New Assignment: mini-project proposals (final version)''
* Thus Mar 29. KNN classification and inverted indices.
+
* Thus Mar 29. [[Class meeting for 10-605 2012 03 29|Understanding spectral clustering techniques.]]
 
** '''Assignment due: mini-project proposals (final version).'''
 
** '''Assignment due: mini-project proposals (final version).'''
  
 
== April ==
 
== April ==
  
* Tues Apr 3. Decision trees and random forests 1.
+
* Tues Apr 3. [[Class meeting for 10-605 2012 04 03|LDA-like models for text and graphs]]; guest lecture from Partha Talukdar
* Thus Apr 5. Decision trees and random forests 2.
+
* Thus Apr 5. Tentative: Guest lecture by U Kang, CMU.
* Tues Apr 10. Soft joins with KNN and inverted indices 1.
+
* Tues Apr 10. [[Class meeting for 10-605 2012 04 10|Speeding up LDA-like models: sampling and parallelization]]
* Thus Apr 12. Soft joins with KNN and inverted indices 1.
+
* Thus Apr 12. [[Class meeting for 10-605 2012 04 12|Fast KNN and similarity joins 1.]]
* Tues Apr 17. Structured prediction 1.
+
* Tues Apr 17. [[Class meeting for 10-605 2012 04 17|Fast KNN and similarity joins 2.]]
 
* Thus Apr 19. ''no class - Carnival''
 
* Thus Apr 19. ''no class - Carnival''
* Tues Apr 24. Structured prediction 2.
+
* Tues Apr 24. [[Class meeting for 10-605 2012 04 14|SGD for matrix factorization and online LDA]]
* Thus Apr 26. Additional topics.
+
* Thus Apr 26. [[Class meeting for 10-605 2012 04 16|Scaling up decision tree learning]]
  
 
== May ==
 
== May ==
Line 64: Line 65:
 
* Tues May 1. Project reports.
 
* Tues May 1. Project reports.
 
* Thus May 3. Project reports.
 
* Thus May 3. Project reports.
 +
* Fri May 4.
 +
** '''Project writeups due at 5:00pm'''.  Submit a paper to Blackbook in PDF in the [http://icml.cc/2012/author-instructions/ ICML 2012 format] (up to 8pp double column), except, of course, do not submit it anonymously.

Latest revision as of 09:48, 28 March 2013

This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2012. If you're taking 10-605 now, you're probably looking for the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013.

January

February

March

April

May

  • Tues May 1. Project reports.
  • Thus May 3. Project reports.
  • Fri May 4.
    • Project writeups due at 5:00pm. Submit a paper to Blackbook in PDF in the ICML 2012 format (up to 8pp double column), except, of course, do not submit it anonymously.