Difference between revisions of "Class meeting for 10-605 Workflows For Hadoop"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgICwmqv_Cww Quiz] for first lecture.
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgICwmqv_Cww Quiz] for first lecture.
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQt_CyCQw Quiz] for second lecture.
 
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQt_CyCQw Quiz] for second lecture.
 +
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgICw9bvTCQw Quiz] for third lecture.
  
 
=== Readings ===
 
=== Readings ===

Revision as of 11:17, 20 September 2016

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.

Slides

Quiz

  • Quiz for first lecture.
  • Quiz for second lecture.
  • Quiz for third lecture.

Readings

Also discussed

Things to Remember

  • The TFIDF representation for documents.
  • The Rocchio algorithm.
  • Why Rocchio is easy to parallelize.
  • Definition of a similarity join/soft join.
  • Why inverted indices make TFIDF representations useful for similarity joins
    • e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure