Class meeting for 10-605 Similarity Joins

From Cohen Courses
Revision as of 10:03, 16 October 2015 by Wcohen (talk | contribs) (→‎Things to Remember)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2015.

Slides

Readings

  • None required.


Things to Remember

  • Definition of a similarity join/soft join.
  • Why inverted indices make TFIDF representations useful for similarity joins
    • e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure