Difference between revisions of "Class meeting for 10-605 Similarity Joins"

Latest revision as of 10:03, 16 October 2015

Definition of a similarity join/soft join.
Why inverted indices make TFIDF representations useful for similarity joins
- e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure

@@ Line 1: / Line 1: @@
-This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2014]].
+This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2015]].
 === Slides ===
-* [http://www.cs.cmu.edu/~wcohen/10-605/simjoins.pptx Workflows in PIG]
+* [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pptx Similarity Joins - PPT], [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pdf PDF]
 === Readings ===
 * None required.
+=== Things to Remember ===
+* Definition of a similarity join/soft join.
+* Why inverted indices make TFIDF representations useful for similarity joins
+** e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure