Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013
From Cohen Courses
Jump to navigationJump to searchThis is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013.
Contents
January
- Mon Jan 14. Overview of course, cost of various operations, asymptotic analysis.
- Wed Jan 16. Review of probabilities, joint-distributions, and naive Bayes
- Mon Jan 21. No class - Martin Luther King Day
- Wed Jan 23. Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.
- New Assignment: streaming Naive Bayes 1 (with feature counts in memory). PDF Handout
- Mon Jan 28. Messages and records 1; Phrase finding.
- Assignment due: streaming Naive Bayes 1 (with feature counts in memory).
- New Assignment: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort. PDF Handout
- Wed Jan 30. More on streaming algorithms: Rocchio, and theory of on-line learning
February
- Mon Feb 4. More on streaming algorithms: parallelized voted perceptrons.
- Assignment due: streaming Naive Bayes 2 (with feature counts on disk) with stream-and-sort
- New Assignment: phrase finding with stream-and-sort. PDF Handout
- Wed Feb 6. Map-reduce and Hadoop 1.
- Mon Feb 11. Map-reduce and Hadoop 2.
- Wed Feb 13. Hadoop helpers and Scalable SGD
- Assignment due: phrase finding with stream-and-sort
- New Assignments: Naive Bayes with Hadoop & Phrase-finding with Hadoop. PDF Handout
- Mon Feb 18. Scalable SGD and Hash Kernels
- Wed Feb 20. Guest lecture: Chris Dyer. Scalable feature selection with Map-Reduce
- Streaming run on Hadoop of Naive Bayes due - checkpoint
- Mon Feb 25. Background on randomized algorithms; Graph computations 1.
- Wed Feb 27. Guest Lecture: Aapo Kyrola - GraphLab and GraphChi
- Hadoop assignment (Naive Bayes) due
March
- Mon Mar 4. Learning on graphs 2.
- Wed Mar 6. Guest lecture: John Wong (Google): Machine Learning with Large Datasets in Google Shopping
- Hadoop assignment (phrase-finding) due
- New Assignment: memory-efficient SGD PDF writeup
- New assignment: initial project proposals. PDF writeup
- Mon Mar 11. no class - spring break.
- Wed Mar 13. no class - spring break.
- Mon Mar 18. Subsampling a graph with RWR
- Wed Mar 20. Semi-supervised learning via label propagation on graphs
- Assignment due: initial mini-project proposals.
- Assignment due: memory-efficient SGD
- New Assignment: Subsampling and visualizing a graph. PDF writeup
- Mon Mar 25. Label propagation 2: Unsupervised label propagation, label propagation as optimization, bipartite graphs
- Wed Mar 27. Understanding spectral clustering techniques.
- Assignment due: mini-project proposals (final version).
April and May
- Mon Apr 1. Speeding up LDA-like models: sparse sampling and parallelization
- Wed Apr 3. Speeding up LDA-like models: All-reduce and online LDA
- Assignment due: Subsampling and visualizing a graph.
- New Assignment: K-Means on MapReduce. PDF writeup
- Mon Apr 8. Fast KNN and similarity joins 1.
- Wed Apr 10. Fast KNN and similarity joins 2.
- Mon Apr 15. Scaling up decision tree learning
- Project progress report due
- Wed Apr 17. Gradient boosting with trees, and SGD for matrix factorization
- Assignment due: K-Means on MapReduce.
- New Assignment: Multi-class image classification or scalable classification using a linear classifier. Both of these count as one assignment toward your six.
- Mon Apr 22. Guest lecture, Evangelos Papalexakis, on Scalable Tensor Methods.
Project reports: Please upload your slides to Blackboard before the class, by *1:00pm*
- Wed Apr 24. Project reports.
- Team1: Namit Shetty, Namit Katariya
- Team2: Jieru Shi, Luzheng Sheng
- Team3: Edward Zhang, Weihua Cao, Yue Ma
- Team4: Yibin Lin, Yu Gong
- Team5: Sukhada Palkar
- Team6: Han Yang, Qiangjian Xi
- Team7: Russell Cullen, Jonathan Hsu
- Mon Apr 29. Project reports.
- Team8: Andrea Klein, Dipan Pal
- Team9: Zeyuan Li, Pengqi Liu, Fei Xie
- Team10: Yiwen Chen, Zhiqi Li, Yuliang Yin
- Team11: Ye Zhang, Hao Chen, Qi Wang
- Team12: Chunlei Liu, Zhen Tang
- Team13: Zaid Sheikh, Shourabh Rawat, Sushant Kumar
- Team14: Huanchen Zhang, Mengwei Ding
- Wed May 1. Project reports.
- Team15: Shu-Hao Yu, Guanyu Wang, Mayank Mohta
- Team16: Li Lu, Chun Chen, Yuchen Tian
- Team17: Shannon Quinn
- Team18: Avesh Singh, Adam Mihalcin
- Team19: Yubin Kim, Juan Manuel Caicedo Carvajal
- Team20: Yue Yu, Jie Dai, Mayank Ketkari
- Team21: Varuni Gang, Alkeshkumar Patel
- Assignment due: Multi-class image classification or scalable classification.
May
- 9am, Tuesday, May 7. Project writeups due. Submit a paper to Blackbook in PDF in the ICML 2013 format (minimum 5 pp, up to 8pp double column), except, of course, do not submit it anonymously.
- Note: this is extended from previous deadline of Fri May 3---but I can't give any further extensions! Your project report should discuss
- The problem you're trying to solve, and why it's important and/or interesting.
- Related work, especially any related work that you're building on.
- The data that you're working with.
- The methods that you're using (in some detail - even if these are off-the-shelf methods, I want to know that you understand them)
- The experiments you did, the metrics you used to evaluate them, and the results.
- What was learned from the experiments (the conclusions).
- You should think of this as an exercise in writing a conference-style paper: so try and write in that style. (Of course, your work doesn't need to advance the state-of-the-art in machine learning, or be highly novel, but it should be well-described.)
- Note: this is extended from previous deadline of Fri May 3---but I can't give any further extensions! Your project report should discuss