Difference between revisions of "Class meeting for 10-605 Workflows For Hadoop"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2016]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2017]].
  
 
=== Slides ===
 
=== Slides ===
  
* First lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-1.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-1.pdf in PDF].
+
* First lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/workflows-1.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/workflows-1.pdf in PDF].
 +
 
 +
To be updated:
 
* Second lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pdf in PDF].
 
* Second lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pdf in PDF].
 
* Third lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pdf in PDF].
 
* Third lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pdf in PDF].
Line 9: Line 11:
 
=== Quiz ===
 
=== Quiz ===
  
 +
* [https://qna.cs.cmu.edu/#/pages/view/170 quiz for first lecture]
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgICwmqv_Cww Quiz] for first lecture.
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgICwmqv_Cww Quiz] for first lecture.
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQt_CyCQw Quiz] for second lecture.
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQt_CyCQw Quiz] for second lecture.

Revision as of 10:47, 12 September 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2017.

Slides

To be updated:

Quiz

Readings

Also discussed

Things to Remember

  • The TFIDF representation for documents.
  • The Rocchio algorithm.
  • Why Rocchio is easy to parallelize.
  • Definition of a similarity join/soft join.
  • Why inverted indices make TFIDF representations useful for similarity joins
    • e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure