Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015
From Cohen Courses
Jump to navigationJump to searchThis is the syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015.
Notes:
- The assignments posted are drafts based on the assignments from spring 2015, and will be modified over the course of the semester - some may be changed substantially.
- Lecture notes and/or slides will be (re)posted around the time of the lectures.
Contents
September
- Tues Sep 1. Overview of course, cost of various operations, asymptotic analysis.
- Thus Sep 3. Review of probabilities, joint distributions and naive Bayes
- Tues Sep 8. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- Thus Sep 10. Messages, records and workflows; Phrase finding.
- Tues Sep 15. Hadoop and Map-Reduce
- Thus Sep 17. PIG and Other Workflow Systems for Hadoop
- Tues Sep 22. Rocchio and TFIDF
- Thus Sep 24. Fast KNN and similarity joins
- Tues Sep 29. Parallel Perceptrons 1.
- Thus Sep 30. Parallel Perceptrons 2.
need to revise
October
- Tues Oct 6. Scalable SGD and Hash Kernels
- Thus Oct 8. Randomized Algorithms 1
- Tues Oct 13. Randomized Algorithms 2
- Thus Oct 15. Matrix Factorization and SGD
- Tues Oct 20. TBA
- Thus Oct 22. TBA
- Tues Oct 27. TBA
- Thus Oct 29. TBA
November
- Tues Nov 3. Scalable PageRank PDF handout
- Thus Nov 5. Subsampling a graph with RWR
- Tues Nov 10. TBA
- Thus Nov 12. TBA
- Tues Nov 17. Sparse sampling and parallelization for LDA
April and May
- Wed April 1
- HW6 due: Subsampling and visualizing a graph.
- HW7: Matrix Factorization in Spark HW7 PDF Handout Evaluation ScriptValidation Script
- Thus Apr 2. Speeding up LDA-like models: All-reduce and other tricks
- Tues Apr 7. Guest lecture - Alex Beutel, SGD for Tensors
- Thus Apr 9. Guest lecture - Alex Smola, Scalable parameter servers
- If you don't like the MediaTech one, a Youtube video on is also available for Alex's talk.
- Mon Apr 13. Informal update due for students working on project teams due.
- Each student working on a project should send to wcohen+805@gmail.com an update, between 1/2 page and 1 page long, saying what concrete tasks you've accomplished to date, how these tasks are part of the overall project (if you're not the only member), and what you plan to do between 4/13 and the presentation on 4/23.
- Additionally, each project lead (i.e., each 805 student that has any 10-605 student working with them) should add a list of who's working on their project, and one line indicating if they're making good progress so far.
- Tues Apr 14. SSL on Graphs
- Thus Apr 16. no class : carnival
- HW7 due
- HW8: Matrix factorization on parameter server
- Tues Apr 21. Graph models for large-scale ML
- Thus Apr 23. Presentation for 10/11-805 projects
- Tues Apr 28. Exam review session.
- Thus Apr 30. In-class exam.
- Tues May 5.
- For 10/11-805 students: project reports are due