Class meeting for 10-605 Workflows For Hadoop 2
From Cohen Courses
Jump to navigationJump to searchThis is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2016.
Slides
- TBD
Readings
- None required.
Things to Remember
- Definition of a similarity join/soft join.
- Why inverted indices make TFIDF representations useful for similarity joins
- e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure