Difference between revisions of "Class meeting for 10-605 Similarity Joins"

From Cohen Courses
Jump to navigationJump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2014]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2015]].
  
 
=== Slides ===
 
=== Slides ===
  
  
* [http://www.cs.cmu.edu/~wcohen/10-605/simjoins.pptx Similarity Joins]
+
* [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pptx Similarity Joins - PPT], [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pdf PDF]
  
 
=== Readings ===
 
=== Readings ===
  
 
* None required.
 
* None required.
* Background on WHIRL: [http://www.cs.cmu.edu/~wcohen/postscript/aij-whirl-overview.ps]
+
 
 +
 
 +
=== Things to Remember ===
 +
 
 +
* Definition of a similarity join/soft join.
 +
* Why inverted indices make TFIDF representations useful for similarity joins
 +
** e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure

Latest revision as of 10:03, 16 October 2015

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2015.

Slides

Readings

  • None required.


Things to Remember

  • Definition of a similarity join/soft join.
  • Why inverted indices make TFIDF representations useful for similarity joins
    • e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure