Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015"

From Cohen Courses
Jump to navigationJump to search
 
(16 intermediate revisions by 3 users not shown)
Line 41: Line 41:
 
* Sun Mar 1.
 
* Sun Mar 1.
 
** '''HW3 due: Naive Bayes with Hadoop MapReduce'''
 
** '''HW3 due: Naive Bayes with Hadoop MapReduce'''
 +
** HW4: [http://www.andrew.cmu.edu/user/amaurya/docs/10605/homework4.pdf PDF wrteup]
 
* Tues Mar 3. ''student presentations''
 
* Tues Mar 3. ''student presentations''
 
** Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/ppr_mapreduce.pdf]
 
** Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/ppr_mapreduce.pdf]
Line 59: Line 60:
 
* Thus Mar 19. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]]
 
* Thus Mar 19. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]]
 
** '''HW5 due: memory-efficient SGD'''  
 
** '''HW5 due: memory-efficient SGD'''  
** ''HW6: Subsampling and visualizing a graph.' [http://bit.ly/605_hw6 PDF handout]'
+
** ''HW6: Subsampling and visualizing a graph. [http://bit.ly/605_hw6 PDF handout]
 
* Tues Mar 24.  
 
* Tues Mar 24.  
 
** Student presentation: Rohan Ramanath, Bayesian Optimization
 
** Student presentation: Rohan Ramanath, Bayesian Optimization
** Guest lecture: Dai Wei, CMU, Parameter servers.  ('''Note''': This will be very relevant for one of the later HWs).
+
** Guest lecture: Dai Wei, CMU, Parameter servers.  ('''Note''': This will be very relevant for one of the later HWs) [https://dl.dropboxusercontent.com/u/65353654/daiwei01_release.pdf PDF] and [https://dl.dropboxusercontent.com/u/65353654/daiwei01_release.pptx ppt].
 
* Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
 
* Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
 
* Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
 
* Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
Line 70: Line 71:
 
* Wed April 1
 
* Wed April 1
 
** '''HW6 due: Subsampling and visualizing a graph.'''
 
** '''HW6 due: Subsampling and visualizing a graph.'''
** ''HW7: Matrix Factorization in Spark'' [http://www.andrew.cmu.edu/user/amaurya/docs/10605/homework7.pdf HW7 PDF Handout]
+
** ''HW7: Matrix Factorization in Spark'' [http://www.andrew.cmu.edu/user/amaurya/docs/10605/homework7.pdf HW7 PDF Handout] [http://www.cs.cmu.edu/~yipeiw/TA605/hw7/eval2.pyc Evaluation Script][http://www.cs.cmu.edu/~yipeiw/TA605/hw7/eval_acc.py Validation Script]
* Thus Apr 2. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and online LDA]]
+
* Thus Apr 2. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and other tricks]]
 
* Tues Apr 7. Guest lecture - Alex Beutel, SGD for Tensors
 
* Tues Apr 7. Guest lecture - Alex Beutel, SGD for Tensors
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/beutel.pptx Alex's slides]
 
** William's [http://www.cs.cmu.edu/~wcohen/10-605/HintsForMF.pptx hints for HW7 in PPT],[http://www.cs.cmu.edu/~wcohen/10-605/HintsForMF.pdf Hints for HW7 in PDF]
 
** William's [http://www.cs.cmu.edu/~wcohen/10-605/HintsForMF.pptx hints for HW7 in PPT],[http://www.cs.cmu.edu/~wcohen/10-605/HintsForMF.pdf Hints for HW7 in PDF]
 
* Thus Apr 9. Guest lecture - Alex Smola, [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/smola-param-serve.pdf Scalable parameter servers]
 
* Thus Apr 9. Guest lecture - Alex Smola, [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/smola-param-serve.pdf Scalable parameter servers]
Line 80: Line 82:
 
** Additionally, each '''project lead''' (i.e., each 805 student that has any 10-605 student working with them) should add a list of who's working on their project, and one line indicating if they're making good progress so far.
 
** Additionally, each '''project lead''' (i.e., each 805 student that has any 10-605 student working with them) should add a list of who's working on their project, and one line indicating if they're making good progress so far.
 
* Tues Apr 14.  [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
 
* Tues Apr 14.  [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
 +
* Thus Apr 16. ''no class : carnival''
 
** '''HW7 due'''
 
** '''HW7 due'''
** ''HW8: Using parameter servers''
+
** ''HW8: [http://bit.ly/605_hw8_ps Matrix factorization on parameter server]
* Thus Apr 16. ''no class : carnival''
 
 
* Tues Apr 21.  [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
 
* Tues Apr 21.  [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
 
* Thus Apr 23.  ''Presentation for 10/11-805 projects''
 
* Thus Apr 23.  ''Presentation for 10/11-805 projects''
 
* Tues Apr 28. Exam review session.  
 
* Tues Apr 28. Exam review session.  
 
** '''HW8: due'''
 
** '''HW8: due'''
** [http://curtis.ml.cmu.edu/w/courses/images/0/0a/Practice_questions.pdf PDF practice questions]
+
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf PDF practice questions from 2014]
** [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pptx Review session slides]
+
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf PDF practice questions for 2015]
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pptx Review session slides],  [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pdf PDF]
 
* Thus Apr 30. In-class exam.
 
* Thus Apr 30. In-class exam.
  
Line 100: Line 103:
 
* [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
 
* [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
 
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
 
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
* [[Class meeting for 10-605 SSL on Graphs|Subsamping continued and SSL on Graphs]]
 
 
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
 
* [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]

Latest revision as of 14:50, 14 October 2015

This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015.

Notes:

  • The assignments posted are drafts based on the assignments from 2014, and will be modified over the course of the semester - some may be changed substantially.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

January

February

March

  • Sun Mar 1.
    • HW3 due: Naive Bayes with Hadoop MapReduce
    • HW4: PDF wrteup
  • Tues Mar 3. student presentations
    • Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce [1]
    • Jakub Pachocki: factorization machines (and hash kernels?) [2]
    • Wanli Ma (wanlim at andrew): coresets for k-segmentation of streams
  • Thus Mar 5. student presentations
    • Quiz: [3]
    • Matt Gardner (mg1 at cs): Large-scale extensions of the path ranking algorithm [4]
    • Jesse Dodge (jessed at andrew): large-scale lasso regularization [5]
    • Ishan Misra (imisra at andrew): LSH for object detection [6]
    • HW5: memory-efficient SGD PDF handout
    • For 10/11-805 students: project proposal is due. This must contain a complete description of the data you will use.
  • Sat Mar 7 (extended from Friday):
    • HW4 due: Phrase-finding with Hadoop
  • Tues Mar 10. no class - spring break.
  • Thus Mar 12. no class - spring break.
  • Tues Mar 17. Scalable PageRank PDF handout
  • Thus Mar 19. Subsampling a graph with RWR
    • HW5 due: memory-efficient SGD
    • HW6: Subsampling and visualizing a graph. PDF handout
  • Tues Mar 24.
    • Student presentation: Rohan Ramanath, Bayesian Optimization
    • Guest lecture: Dai Wei, CMU, Parameter servers. (Note: This will be very relevant for one of the later HWs) PDF and ppt.
  • Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
  • Tues Mar 31. Sparse sampling and parallelization for LDA

April and May

  • Tues May 5.
    • For 10/11-805 students: project reports are due

Topics covered in previous years but not in 2015