Difference between revisions of "Class meeting for 10-605 Similarity Joins"
From Cohen Courses
Jump to navigationJump to search (Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...") |
|||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2015]]. |
=== Slides === | === Slides === | ||
− | * [http://www.cs.cmu.edu/~wcohen/10-605/simjoins.pptx | + | * [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pptx Similarity Joins - PPT], [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pdf PDF] |
=== Readings === | === Readings === | ||
* None required. | * None required. | ||
+ | |||
+ | |||
+ | === Things to Remember === | ||
+ | |||
+ | * Definition of a similarity join/soft join. | ||
+ | * Why inverted indices make TFIDF representations useful for similarity joins | ||
+ | ** e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure |
Latest revision as of 10:03, 16 October 2015
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2015.
Slides
Readings
- None required.
Things to Remember
- Definition of a similarity join/soft join.
- Why inverted indices make TFIDF representations useful for similarity joins
- e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure