Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015
From Cohen Courses
Jump to navigationJump to searchThis is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015.
Notes:
- The assignments posted are drafts based on the assignments from 2014, and will be modified over the course of the semester - some may be changed substantially.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
Contents
January
- Tues Jan 13. Overview of course, cost of various operations, asymptotic analysis.
- Thus Jan 15. Review of probabilities, joint distributions and naive Bayes
- HW1A: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Tues Jan 20. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- HW1B: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- For 10/11-805 students: a one-paragraph summary of a recent research result you'd like to present is due. If you're planning/hoping to transfer from 605, but haven't yet transferred, then also submit this assignment. Email to wcohen+805 AT gmail.com with the subject "Presentation" and include, in addition to your summary:
- Your name and andrew id
- A link to the paper
- Your best guess as to what lectures should precede the presentation
- Due by 11:59:59pm EST Tuesday.
- Thus Jan 22. Messages, records and workflows; Phrase finding.
- Tues Jan 27. Hadoop and Map-Reduce
- Thus Jan 29. PIG and Other Workflow Systems for Hadoop
- HW1A and HW1B due.
- HW2: phrase finding with stream-and-sort. PDF Handout (DRAFT)
February
- Tues Feb 3. Rocchio and TFIDF
- Thus Feb 5. Fast KNN and similarity joins
- Tues Feb 10. Parallel Perceptrons 1.
- HW2 due: phrase finding with stream-and-sort
- HW3,4: Naive Bayes with Streaming Hadoop, Naive Bayes with Hadoop & Phrase-finding with Hadoop. PDF Handouts: HW3 - warmup,HW3,HW4.
- Thus Feb 12. Parallel Perceptrons 2.
- Tues Feb 17. Scalable SGD and Hash Kernels
- For 10/11-805 students: initial draft of project proposal is due.
- Thus Feb 19. Matrix Factorization and SGD
- Tues Feb 24. SGD for Matrix Factorization, and Randomized Algorithms 1 (Bloom Filters)
- HW3 due: Naive Bayes with Hadoop
- Thus Feb 26. Randomized Algorithms
March
- Tues Mar 3. student presentations
- Adams Wei Yu (weiyu at andrew): fast PPR on Map-Reduce
- Jesse Dodge (jessed at andrew): large-scale lasso regularization
- Wanli Ma (wanlim at andrew): coresets for k-segmentation of streams
- Thus Mar 5. student presentations
- Matt Gardner (mg1 at cs): Large-scale extensions of the path ranking algorithm
- Jakub Pachocki: factorization machines (and hash kernels?)
- Ishan Misra (imisra at andrew): LSH for object detection
- HW4 due: Phrase-finding with Hadoop
- HW5: memory-efficient SGD PDF handout
- For 10/11-805 students: project proposal is due. This must contain a complete description of the data you will use.
- Tues Mar 10. no class - spring break.
- Thus Mar 12. no class - spring break.
- Tues Mar 17. Scalable PageRank
- HW5 due: memory-efficient SGD
- HW6: Subsampling and visualizing a graph. PDF handout
- Thus Mar 19. Subsampling a graph with RWR
- Tues Mar 24. Subsamping continued and SSL on Graphs AAAI Spring Symposium week
- Thus Mar 26. Scalable spectral clustering techniques. AAAI Spring Symposium week
- Tues Mar 31. Sparse sampling and parallelization for LDA
- HW6 due: Subsampling and visualizing a graph.
- HW7: TBA
April and May
- Thus Apr 2. Speeding up LDA-like models: All-reduce and online LDA
- Tues Apr 7. student presentations
- Thus Apr 9. student presentations
- Tues Apr 14. Scalable Similarity Joins
- HW7 due
- HW8: TBA
- Thus Apr 16. no class : carnival
- Tues Apr 21. Graph models for large-scale ML
- Thus Apr 23. Poster session for 10/11-805 projects
- Tues Apr 28. Exam review session.
- Thus Apr 30. In-class exam.
- Tues May 5.
- For 10/11-805 students: project reports are due
Topics covered in previous years but not in 2015
- Tues Jan 27. Messages, records and workflows; Rocchio