Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015"

From Cohen Courses
Jump to navigationJump to search
Line 28: Line 28:
 
* Thus Feb 5.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
* Thus Feb 5.  [[Class meeting for 10-605 Similarity Joins|Fast KNN and similarity joins]]
 
* Tues Feb 10. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
 
* Tues Feb 10. [[Class meeting for 10-605 Parallel Perceptrons 1|Parallel Perceptrons 1]].
** ''HW3,4: Naive Bayes with Streaming Hadoop,  Naive Bayes with Hadoop & Phrase-finding with Hadoop''.  PDF Handouts: [http://curtis.ml.cmu.edu/w/courses/images/c/c0/Homework4a.pdf  HW3 - warmup],[http://curtis.ml.cmu.edu/w/courses/images/a/a2/Homework4b.pdf HW3],[http://curtis.ml.cmu.edu/w/courses/images/3/30/Homework4c.pdf HW4].
 
 
* Thus Feb 12. [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons 2]].
 
* Thus Feb 12. [[Class meeting for 10-605 Parallel Perceptrons 2|Parallel Perceptrons 2]].
 
** '''HW2 due: phrase finding with stream-and-sort'''
 
** '''HW2 due: phrase finding with stream-and-sort'''
 
* Tues Feb 17. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 
* Tues Feb 17. [[Class meeting for 10-605 SGD and Hash Kernels|Scalable SGD and Hash Kernels]]
 +
** ''HW3: Naive Bayes with Hadoop MapReduce''.  PDF Handouts: [http://www.andrew.cmu.edu/user/amaurya/docs/10605/homework3.pdf  HW3].
 
** ''For 10/11-805 students:'' '''initial draft of project proposal is due.'''  I will give you feedback on this, so please be clear about your proposal.  I'm expecting approximately one page.  You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to.  Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
 
** ''For 10/11-805 students:'' '''initial draft of project proposal is due.'''  I will give you feedback on this, so please be clear about your proposal.  I'm expecting approximately one page.  You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to.  Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
 
* Thus Feb 19. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]
 
* Thus Feb 19. [[Class meeting for 10-605 SGD for MF|Matrix Factorization and SGD]]

Revision as of 21:26, 17 February 2015

This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015.

Notes:

  • The assignments posted are drafts based on the assignments from 2014, and will be modified over the course of the semester - some may be changed substantially.
  • Lecture notes and/or slides will be (re)posted around the time of the lectures.

January

February

March

  • Tues Mar 3. student presentations
    • Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce
    • Jakub Pachocki: factorization machines (and hash kernels?)
    • Wanli Ma (wanlim at andrew): coresets for k-segmentation of streams
  • Thus Mar 5. student presentations
    • Matt Gardner (mg1 at cs): Large-scale extensions of the path ranking algorithm
    • Jesse Dodge (jessed at andrew): large-scale lasso regularization
    • Ishan Misra (imisra at andrew): LSH for object detection
    • HW4 due: Phrase-finding with Hadoop
    • HW5: memory-efficient SGD PDF handout
    • For 10/11-805 students: project proposal is due. This must contain a complete description of the data you will use.
  • Tues Mar 10. no class - spring break.
  • Thus Mar 12. no class - spring break.
  • Tues Mar 17. Scalable PageRank
    • HW5 due: memory-efficient SGD
    • HW6: Subsampling and visualizing a graph. PDF handout
  • Thus Mar 19. Subsampling a graph with RWR
  • Tues Mar 24. Subsamping continued and SSL on Graphs AAAI Spring Symposium week
  • Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
  • Tues Mar 31. Sparse sampling and parallelization for LDA
    • HW6 due: Subsampling and visualizing a graph.
    • HW7: TBA

April and May

  • Tues May 5.
    • For 10/11-805 students: project reports are due

Topics covered in previous years but not in 2015