Difference between revisions of "Class meeting for 10-605 Similarity Joins"

Latest revision as of 10:03, 16 October 2015

Definition of a similarity join/soft join.
Why inverted indices make TFIDF representations useful for similarity joins
- e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure

@@ Line 1: / Line 1: @@
-This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2015]].
+This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2015]].
 === Slides ===
@@ Line 9: / Line 9: @@
 * None required.
+=== Things to Remember ===
+* Definition of a similarity join/soft join.
+* Why inverted indices make TFIDF representations useful for similarity joins
+** e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure