Difference between revisions of "Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013"

From Cohen Courses
Jump to navigationJump to search
 
(13 intermediate revisions by 2 users not shown)
Line 4: Line 4:
  
 
* Mon Jan 14. [[Class meeting for 10-605 2013 01 14|Overview of course, cost of various operations, asymptotic analysis.]]
 
* Mon Jan 14. [[Class meeting for 10-605 2013 01 14|Overview of course, cost of various operations, asymptotic analysis.]]
* Wed Jan 16. [[Class meeting for 10-605 2013 01 16|Review of probabilities.]]
+
* Wed Jan 16. [[Class meeting for 10-605 2013 01 16|Review of probabilities, joint-distributions, and naive Bayes]]
* Mon Jan 21. [[Class meeting for 10-605 2013 01 21|no class - Martin Luther King Day]]
+
* Mon Jan 21. ''No class - Martin Luther King Day''
 
* Wed Jan 23. [[Class meeting for 10-605 2013 01 23|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]
 
* Wed Jan 23. [[Class meeting for 10-605 2013 01 23|Streaming algorithms and Naive Bayes; The stream-and-sort design pattern; Naive Bayes for large feature sets.]]
 
** ''New Assignment: streaming Naive Bayes 1 (with feature counts in memory)''. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf PDF Handout]
 
** ''New Assignment: streaming Naive Bayes 1 (with feature counts in memory)''. [http://www.cs.cmu.edu/~wcohen/10-605/assignments/hashtable-nb.pdf PDF Handout]
Line 33: Line 33:
  
 
* Mon Mar 4. [[Class meeting for 10-605 2013 03 04|Learning on graphs 2]].  
 
* Mon Mar 4. [[Class meeting for 10-605 2013 03 04|Learning on graphs 2]].  
* Wed Mar 6. ''Guest lecture: John Wong (Google): Machine Learning with Large Datasets in Google Shopping"
+
* Wed Mar 6. ''Guest lecture: John Wong (Google): Machine Learning with Large Datasets in Google Shopping''
 
** '''Hadoop assignment (phrase-finding) due'''
 
** '''Hadoop assignment (phrase-finding) due'''
 
** ''New Assignment: memory-efficient SGD'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/sgd.pdf PDF writeup]
 
** ''New Assignment: memory-efficient SGD'' [http://www.cs.cmu.edu/~wcohen/10-605/assignments/sgd.pdf PDF writeup]
Line 58: Line 58:
 
* Mon Apr 15. [[Class meeting for 10-605 2013 04 15|Scaling up decision tree learning]]
 
* Mon Apr 15. [[Class meeting for 10-605 2013 04 15|Scaling up decision tree learning]]
 
** '''Project progress report due'''
 
** '''Project progress report due'''
* Wed Apr 17.  [[Class meeting for 10-605 2013 04 17|SGD for matrix factorization]]
+
* Wed Apr 17.  [[Class meeting for 10-605 2013 04 17|Gradient boosting with trees, and SGD for matrix factorization]]
 
** '''Assignment due: K-Means on MapReduce.'''
 
** '''Assignment due: K-Means on MapReduce.'''
 
** ''New Assignment: Multi-class image classification or scalable classification using a linear classifier.''  Both of these count as one assignment toward your six.
 
** ''New Assignment: Multi-class image classification or scalable classification using a linear classifier.''  Both of these count as one assignment toward your six.
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/assignments/image.pdf PDF writeup of image-classification assignment]
 
*** [http://www.cs.cmu.edu/~wcohen/10-605/assignments/image.pdf PDF writeup of image-classification assignment]
*** ''PDF writeup of scalable classification - to be added soon''
+
*** [http://www.cs.cmu.edu/~wcohen/10-605/assignments/big-classifier.pdf PDF writeup of scalable classification]
 
* Mon Apr 22. ''Guest lecture, Evangelos Papalexakis, on Scalable Tensor Methods.''
 
* Mon Apr 22. ''Guest lecture, Evangelos Papalexakis, on Scalable Tensor Methods.''
* Wed Apr 24.  Project reports.
+
Project reports: '''Please upload your slides to Blackboard before the class, by *1:00pm*'''
* Mon Apr 29. Project reports.
+
* Wed Apr 24.  Project reports.  
* Wed May 1. Project reports.
+
** Team1: Namit Shetty, Namit Katariya
** ''Assignment due: Multi-class image classification or scalable classification.''
+
** Team2: Jieru Shi, Luzheng Sheng
 +
** Team3: Edward Zhang, Weihua Cao, Yue Ma
 +
** Team4: Yibin Lin, Yu Gong
 +
** Team5: Sukhada Palkar
 +
** Team6: Han Yang, Qiangjian Xi
 +
** Team7: Russell Cullen, Jonathan Hsu
 +
* Mon Apr 29. Project reports.  
 +
** Team8: Andrea Klein, Dipan Pal
 +
** Team9: Zeyuan Li, Pengqi Liu, Fei Xie
 +
** Team10: Yiwen Chen, Zhiqi Li, Yuliang Yin
 +
** Team11: Ye Zhang, Hao Chen, Qi Wang
 +
** Team12: Chunlei Liu, Zhen Tang
 +
** Team13: Zaid Sheikh, Shourabh Rawat, Sushant Kumar
 +
** Team14: Huanchen Zhang, Mengwei Ding
 +
* Wed May 1. Project reports.  
 +
** Team15: Shu-Hao Yu, Guanyu Wang, Mayank Mohta
 +
** Team16: Li Lu, Chun Chen, Yuchen Tian
 +
** Team17: Shannon Quinn
 +
** Team18: Avesh Singh, Adam Mihalcin
 +
** Team19: Yubin Kim, Juan Manuel Caicedo Carvajal
 +
** Team20: Yue Yu, Jie Dai, Mayank Ketkari
 +
** Team21: Varuni Gang, Alkeshkumar Patel
 +
** '''Assignment due: Multi-class image classification or scalable classification.'''
  
 
== May ==
 
== May ==
  
* Fri May 3.  
+
* 9am, Tuesday, May 7. '''Project writeups due'''.  Submit a paper to Blackbook in PDF in the [http://icml.cc/2013/wp-content/uploads/2012/12/icml2013stylefiles.tar.gz ICML 2013 format] (minimum 5 pp, up to 8pp double column), except, of course, do not submit it anonymously.
** '''Project writeups due at 5:00pm'''.  Submit a paper to Blackbook in PDF in the [http://icml.cc/2013/author-instructions/ ICML 2013 format] (minimum 5 pp, up to 8pp double column), except, of course, do not submit it anonymously.
+
** ''Note: this is extended from previous deadline of Fri May 3---but I can't give any further extensions!''  Your project report should discuss
 +
*** The problem you're trying to solve, and why it's important and/or interesting.
 +
*** Related work, especially any related work that you're building on.
 +
*** The data that you're working with.
 +
*** The methods that you're using (in some detail - even if these are off-the-shelf methods, I want to know that you understand them)
 +
*** The experiments you did, the metrics you used to evaluate them, and the results.
 +
*** What was learned from the experiments (the conclusions).
 +
** You should think of this as an exercise in writing a conference-style paper: so try and write in that style.  (Of course, your work doesn't need to advance the state-of-the-art in machine learning, or be highly novel, but it should be well-described.)

Latest revision as of 17:20, 10 January 2014

This is the syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013.

January

February

March

April and May

Project reports: Please upload your slides to Blackboard before the class, by *1:00pm*

  • Wed Apr 24. Project reports.
    • Team1: Namit Shetty, Namit Katariya
    • Team2: Jieru Shi, Luzheng Sheng
    • Team3: Edward Zhang, Weihua Cao, Yue Ma
    • Team4: Yibin Lin, Yu Gong
    • Team5: Sukhada Palkar
    • Team6: Han Yang, Qiangjian Xi
    • Team7: Russell Cullen, Jonathan Hsu
  • Mon Apr 29. Project reports.
    • Team8: Andrea Klein, Dipan Pal
    • Team9: Zeyuan Li, Pengqi Liu, Fei Xie
    • Team10: Yiwen Chen, Zhiqi Li, Yuliang Yin
    • Team11: Ye Zhang, Hao Chen, Qi Wang
    • Team12: Chunlei Liu, Zhen Tang
    • Team13: Zaid Sheikh, Shourabh Rawat, Sushant Kumar
    • Team14: Huanchen Zhang, Mengwei Ding
  • Wed May 1. Project reports.
    • Team15: Shu-Hao Yu, Guanyu Wang, Mayank Mohta
    • Team16: Li Lu, Chun Chen, Yuchen Tian
    • Team17: Shannon Quinn
    • Team18: Avesh Singh, Adam Mihalcin
    • Team19: Yubin Kim, Juan Manuel Caicedo Carvajal
    • Team20: Yue Yu, Jie Dai, Mayank Ketkari
    • Team21: Varuni Gang, Alkeshkumar Patel
    • Assignment due: Multi-class image classification or scalable classification.

May

  • 9am, Tuesday, May 7. Project writeups due. Submit a paper to Blackbook in PDF in the ICML 2013 format (minimum 5 pp, up to 8pp double column), except, of course, do not submit it anonymously.
    • Note: this is extended from previous deadline of Fri May 3---but I can't give any further extensions! Your project report should discuss
      • The problem you're trying to solve, and why it's important and/or interesting.
      • Related work, especially any related work that you're building on.
      • The data that you're working with.
      • The methods that you're using (in some detail - even if these are off-the-shelf methods, I want to know that you understand them)
      • The experiments you did, the metrics you used to evaluate them, and the results.
      • What was learned from the experiments (the conclusions).
    • You should think of this as an exercise in writing a conference-style paper: so try and write in that style. (Of course, your work doesn't need to advance the state-of-the-art in machine learning, or be highly novel, but it should be well-described.)