Difference between revisions of "Class meeting for 10-605 Rocchio and Hadoop Workflows"

From Cohen Courses
Jump to navigationJump to search
 
Line 32: Line 32:
 
* Schapire et al, [http://dl.acm.org/citation.cfm?id=290996 Boosting and Rocchio applied to text filtering], SIGIR 98.
 
* Schapire et al, [http://dl.acm.org/citation.cfm?id=290996 Boosting and Rocchio applied to text filtering], SIGIR 98.
 
* Littlestone, [http://www.springerlink.com/index/X1022977778L1777.pdf Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm], MLJ 1988. Includes the mistake-bound theory.
 
* Littlestone, [http://www.springerlink.com/index/X1022977778L1777.pdf Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm], MLJ 1988. Includes the mistake-bound theory.
 +
 +
=== Things to Remember ===
 +
 +
* The TFIDF representation for documents.
 +
* The Rocchio algorithm.
 +
* Why Rocchio is easy to parallelize.

Latest revision as of 16:16, 14 October 2015

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2015.

Slides

Workflows for Hadoop:

Rocchio:

Also:

Readings

  • Pig: none required. A nice on-line resource for PIG is the on-line version of the O'Reilly Book Programming Pig.

Readings for the Class

Also discussed

Things to Remember

  • The TFIDF representation for documents.
  • The Rocchio algorithm.
  • Why Rocchio is easy to parallelize.