Difference between revisions of "Class meeting for 10-605 Similarity Joins"
From Cohen Courses
Jump to navigationJump to search(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2015]]. |
=== Slides === | === Slides === | ||
− | * [http://www.cs.cmu.edu/~wcohen/10-605/simjoins.pptx Similarity Joins] | + | * [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pptx Similarity Joins - PPT], [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pdf PDF] |
=== Readings === | === Readings === | ||
* None required. | * None required. | ||
− | * | + | |
+ | |||
+ | === Things to Remember === | ||
+ | |||
+ | * Definition of a similarity join/soft join. | ||
+ | * Why inverted indices make TFIDF representations useful for similarity joins | ||
+ | ** e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure |
Latest revision as of 10:03, 16 October 2015
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2015.
Slides
Readings
- None required.
Things to Remember
- Definition of a similarity join/soft join.
- Why inverted indices make TFIDF representations useful for similarity joins
- e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure