Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015"

From Cohen Courses
Jump to navigationJump to search
 
(40 intermediate revisions by 4 users not shown)
Line 41: Line 41:
 
* Sun Mar 1.
 
* Sun Mar 1.
 
** '''HW3 due: Naive Bayes with Hadoop MapReduce'''
 
** '''HW3 due: Naive Bayes with Hadoop MapReduce'''
 +
** HW4: [http://www.andrew.cmu.edu/user/amaurya/docs/10605/homework4.pdf PDF wrteup]
 
* Tues Mar 3. ''student presentations''
 
* Tues Mar 3. ''student presentations''
 
** Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/ppr_mapreduce.pdf]
 
** Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/ppr_mapreduce.pdf]
Line 59: Line 60:
 
* Thus Mar 19. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]]
 
* Thus Mar 19. [[Class meeting for 10-605 Subsampling Graphs|Subsampling a graph with RWR]]
 
** '''HW5 due: memory-efficient SGD'''  
 
** '''HW5 due: memory-efficient SGD'''  
** ''HW6: Subsampling and visualizing a graph.''
+
** ''HW6: Subsampling and visualizing a graph. [http://bit.ly/605_hw6 PDF handout]
 
* Tues Mar 24.  
 
* Tues Mar 24.  
 
** Student presentation: Rohan Ramanath, Bayesian Optimization
 
** Student presentation: Rohan Ramanath, Bayesian Optimization
** Guest lecture: Dai Wei, CMU, Parameter servers.  ('''Note''': This will be very relevant for one of the later HWs).
+
** Guest lecture: Dai Wei, CMU, Parameter servers.  ('''Note''': This will be very relevant for one of the later HWs) [https://dl.dropboxusercontent.com/u/65353654/daiwei01_release.pdf PDF] and [https://dl.dropboxusercontent.com/u/65353654/daiwei01_release.pptx ppt].
 
* Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
 
* Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
 
* Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
 
* Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]]
** '''HW6 due: Subsampling and visualizing a graph.'''
 
** ''HW7: TBA''
 
  
 
== April  and May ==
 
== April  and May ==
  
* Thus Apr 2. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and online LDA]]
+
* Wed April 1
 +
** '''HW6 due: Subsampling and visualizing a graph.'''
 +
** ''HW7: Matrix Factorization in Spark'' [http://www.andrew.cmu.edu/user/amaurya/docs/10605/homework7.pdf HW7 PDF Handout] [http://www.cs.cmu.edu/~yipeiw/TA605/hw7/eval2.pyc Evaluation Script][http://www.cs.cmu.edu/~yipeiw/TA605/hw7/eval_acc.py Validation Script]
 +
* Thus Apr 2. [[Class meeting for 10-605 2013 LDA 2|Speeding up LDA-like models: All-reduce and other tricks]]
 
* Tues Apr 7. Guest lecture - Alex Beutel, SGD for Tensors
 
* Tues Apr 7. Guest lecture - Alex Beutel, SGD for Tensors
* Thus Apr 9. Guest lecture - Alex Smola, TBD
+
** [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/beutel.pptx Alex's slides]
* Tues Apr 14.  Overview of parallel ML approaches
+
** William's [http://www.cs.cmu.edu/~wcohen/10-605/HintsForMF.pptx hints for HW7 in PPT],[http://www.cs.cmu.edu/~wcohen/10-605/HintsForMF.pdf Hints for HW7 in PDF]
 +
* Thus Apr 9. Guest lecture - Alex Smola, [http://www.cs.cmu.edu/~wcohen/10-605/2015-guest-lecture/smola-param-serve.pdf Scalable parameter servers]
 +
** If you don't like the MediaTech one, a [http://youtu.be/bFnUeYDBtbk Youtube video on is also available] for Alex's talk.
 +
* Mon Apr 13. '''Informal update due for students working on project teams due.'''
 +
** Each '''student working on a project''' should send to wcohen+805@gmail.com an update, between 1/2 page and 1 page long, saying what concrete tasks you've accomplished to date, how these tasks are part of the overall project (if you're not the only member), and what you plan to do between 4/13 and the presentation on 4/23. 
 +
** Additionally, each '''project lead''' (i.e., each 805 student that has any 10-605 student working with them) should add a list of who's working on their project, and one line indicating if they're making good progress so far.
 +
* Tues Apr 14.  [[Class_meeting_for_10-605_SSL_on_Graphs|SSL on Graphs]]
 +
* Thus Apr 16. ''no class : carnival''
 
** '''HW7 due'''
 
** '''HW7 due'''
** ''HW8: TBA''
+
** ''HW8: [http://bit.ly/605_hw8_ps Matrix factorization on parameter server]
* Thus Apr 16. ''no class : carnival''
 
 
* Tues Apr 21.  [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
 
* Tues Apr 21.  [[Class meeting for 10-605 GraphLab|Graph models for large-scale ML]]
* Thus Apr 23.  ''Poster session for 10/11-805 projects''
+
* Thus Apr 23.  ''Presentation for 10/11-805 projects''
 
* Tues Apr 28. Exam review session.  
 
* Tues Apr 28. Exam review session.  
 
** '''HW8: due'''
 
** '''HW8: due'''
** [http://curtis.ml.cmu.edu/w/courses/images/0/0a/Practice_questions.pdf PDF practice questions]
+
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf PDF practice questions from 2014]
** [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pptx Review session slides]
+
** [http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf PDF practice questions for 2015]
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pptx Review session slides],  [http://www.cs.cmu.edu/~wcohen/10-605/exam-review.pdf PDF]
 
* Thus Apr 30. In-class exam.
 
* Thus Apr 30. In-class exam.
  
Line 94: Line 103:
 
* [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
 
* [[Class meeting for 10-605 Scalable FOL|Scalable First-order logics]]
 
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
 
* [[Class meeting for 10-605 Parallel Similarity Joins|Scalable Similarity Joins]]
* [[Class meeting for 10-605 SSL on Graphs|Subsamping continued and SSL on Graphs]]
 
 
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [[Class meeting for 10-605 Rocchio and On-line Learning|Messages, records and workflows; Rocchio]]
 
* [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
 
* [[Class meeting for 10-605 Spectral Clustering|Scalable spectral clustering techniques.]]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/schimmy.pptx Scalable pagerank - The Schimmy Pattern]

Latest revision as of 14:50, 14 October 2015

This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015.

Notes:

  • The assignments posted are drafts based on the assignments from 2014, and will be modified over the course of the semester - some may be changed substantially.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

January

February

March

  • Sun Mar 1.
    • HW3 due: Naive Bayes with Hadoop MapReduce
    • HW4: PDF wrteup
  • Tues Mar 3. student presentations
    • Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce [1]
    • Jakub Pachocki: factorization machines (and hash kernels?) [2]
    • Wanli Ma (wanlim at andrew): coresets for k-segmentation of streams
  • Thus Mar 5. student presentations
    • Quiz: [3]
    • Matt Gardner (mg1 at cs): Large-scale extensions of the path ranking algorithm [4]
    • Jesse Dodge (jessed at andrew): large-scale lasso regularization [5]
    • Ishan Misra (imisra at andrew): LSH for object detection [6]
    • HW5: memory-efficient SGD PDF handout
    • For 10/11-805 students: project proposal is due. This must contain a complete description of the data you will use.
  • Sat Mar 7 (extended from Friday):
    • HW4 due: Phrase-finding with Hadoop
  • Tues Mar 10. no class - spring break.
  • Thus Mar 12. no class - spring break.
  • Tues Mar 17. Scalable PageRank PDF handout
  • Thus Mar 19. Subsampling a graph with RWR
    • HW5 due: memory-efficient SGD
    • HW6: Subsampling and visualizing a graph. PDF handout
  • Tues Mar 24.
    • Student presentation: Rohan Ramanath, Bayesian Optimization
    • Guest lecture: Dai Wei, CMU, Parameter servers. (Note: This will be very relevant for one of the later HWs) PDF and ppt.
  • Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
  • Tues Mar 31. Sparse sampling and parallelization for LDA

April and May

  • Tues May 5.
    • For 10/11-805 students: project reports are due

Topics covered in previous years but not in 2015