Difference between revisions of "Class meeting for 10-605 Similarity Joins"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...")
 
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2014|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2014]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2015|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2015]].
  
 
=== Slides ===
 
=== Slides ===
  
  
* [http://www.cs.cmu.edu/~wcohen/10-605/simjoins.pptx Workflows in PIG]
+
* [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pptx Similarity Joins - PPT], [http://www.cs.cmu.edu/~wcohen/10-605/simjoins-and-tfidf.pdf PDF]
  
 
=== Readings ===
 
=== Readings ===
  
 
* None required.
 
* None required.
 +
 +
 +
=== Things to Remember ===
 +
 +
* Definition of a similarity join/soft join.
 +
* Why inverted indices make TFIDF representations useful for similarity joins
 +
** e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure

Latest revision as of 10:03, 16 October 2015

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2015.

Slides

Readings

  • None required.


Things to Remember

  • Definition of a similarity join/soft join.
  • Why inverted indices make TFIDF representations useful for similarity joins
    • e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure