Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015"
From Cohen Courses
Jump to navigationJump to search (→March) |
(→March) |
||
Line 60: | Line 60: | ||
** '''HW5 due: memory-efficient SGD''' | ** '''HW5 due: memory-efficient SGD''' | ||
** ''HW6: Subsampling and visualizing a graph.'' | ** ''HW6: Subsampling and visualizing a graph.'' | ||
− | * Tues Mar 24. | + | * Tues Mar 24. Guest lecture: Dai Wei, CMU, Parameter servers. (This will be very relevant for one of the later HWs). |
* Thus Mar 26. Guest lecture: D. Sculley, Google, TBA | * Thus Mar 26. Guest lecture: D. Sculley, Google, TBA | ||
* Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]] | * Tues Mar 31. [[Class meeting for 10-605 LDA 1|Sparse sampling and parallelization for LDA]] |
Revision as of 10:02, 16 March 2015
This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015.
Notes:
- The assignments posted are drafts based on the assignments from 2014, and will be modified over the course of the semester - some may be changed substantially.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
Contents
January
- Tues Jan 13. Overview of course, cost of various operations, asymptotic analysis.
- Thus Jan 15. Review of probabilities, joint distributions and naive Bayes
- HW1A: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Tues Jan 20. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- HW1B: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- For 10/11-805 students: a one-paragraph summary of a recent research result you'd like to present is due. If you're planning/hoping to transfer from 605, but haven't yet transferred, then also submit this assignment. Email to wcohen+805 AT gmail.com with the subject "Presentation" and include, in addition to your summary:
- Your name and andrew id
- A link to the paper
- Your best guess as to what lectures should precede the presentation
- Due by 11:59:59pm EST Tuesday.
- Thus Jan 22. Messages, records and workflows; Phrase finding.
- Tues Jan 27. Hadoop and Map-Reduce
- Thus Jan 29. PIG and Other Workflow Systems for Hadoop
- HW1A and HW1B due.
- HW2: phrase finding with stream-and-sort. PDF Handout Stopword List
February
- Tues Feb 3. Rocchio and TFIDF
- Thus Feb 5. Fast KNN and similarity joins
- Tues Feb 10. Parallel Perceptrons 1.
- Thus Feb 12. Parallel Perceptrons 2.
- HW2 due: phrase finding with stream-and-sort
- Tues Feb 17. Scalable SGD and Hash Kernels
- HW3: Naive Bayes with Hadoop MapReduce. PDF Handouts: HW3.
- For 10/11-805 students: initial draft of project proposal is due. I will give you feedback on this, so please be clear about your proposal. I'm expecting approximately one page. You should discuss what dataset you plan to use, what results you hope to obtain, what baseline technique you will build on and/or compare to. Also include a section saying if you have a partner; and if you are willing to work with/mentor one or more 605 students, and if so, how you anticipate them contributing to the project.
- Thus Feb 19. Randomized Algorithms 1
- Tues Feb 24. Randomized Algorithms 2
- Thus Feb 26. Matrix Factorization and SGD
March
- Sun Mar 1.
- HW3 due: Naive Bayes with Hadoop MapReduce
- Tues Mar 3. student presentations
- Thus Mar 5. student presentations
- Quiz: [3]
- Matt Gardner (mg1 at cs): Large-scale extensions of the path ranking algorithm [4]
- Jesse Dodge (jessed at andrew): large-scale lasso regularization [5]
- Ishan Misra (imisra at andrew): LSH for object detection [6]
- HW5: memory-efficient SGD PDF handout
- For 10/11-805 students: project proposal is due. This must contain a complete description of the data you will use.
- Sat Mar 7 (extended from Friday):
- HW4 due: Phrase-finding with Hadoop
- Tues Mar 10. no class - spring break.
- Thus Mar 12. no class - spring break.
- Tues Mar 17. Scalable PageRank PDF handout
- Thus Mar 19. Subsampling a graph with RWR
- HW5 due: memory-efficient SGD
- HW6: Subsampling and visualizing a graph.
- Tues Mar 24. Guest lecture: Dai Wei, CMU, Parameter servers. (This will be very relevant for one of the later HWs).
- Thus Mar 26. Guest lecture: D. Sculley, Google, TBA
- Tues Mar 31. Sparse sampling and parallelization for LDA
- HW6 due: Subsampling and visualizing a graph.
- HW7: TBA
April and May
- Thus Apr 2. Speeding up LDA-like models: All-reduce and online LDA
- Tues Apr 7. Guest lecture - Alex Beutel, SGD for Tensors
- Thus Apr 9. Guest lecture - Alex Smola, TBD
- Tues Apr 14. Overview of parallel ML approaches
- HW7 due
- HW8: TBA
- Thus Apr 16. no class : carnival
- Tues Apr 21. Graph models for large-scale ML
- Thus Apr 23. Poster session for 10/11-805 projects
- Tues Apr 28. Exam review session.
- Thus Apr 30. In-class exam.
- Tues May 5.
- For 10/11-805 students: project reports are due
Topics covered in previous years but not in 2015
- Workflows in PIG
- First-order logics
- Scalable First-order logics
- Scalable Similarity Joins
- Subsamping continued and SSL on Graphs
- Tues Jan 27. Messages, records and workflows; Rocchio
- Thus Mar 26. Scalable spectral clustering techniques.