Difference between revisions of "Class meeting for 10-605 Rocchio and Hadoop Workflows"

From Cohen Courses
Jump to navigationJump to search
Line 2: Line 2:
  
 
=== Slides ===
 
=== Slides ===
 +
 +
Workflows for Hadoop:
 +
 +
* [http://www.cs.cmu.edu/~wcohen/10-605/beyond-hadoop.pptx Workflows for Hadoop]
 +
* The phrases example:
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/pig-example/phrases.pig PIG source code]
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/pig-example/SmoothedPKL.java Java source code]
 +
* Some other examples:
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/pig-example/phirl-naive.pig Naive Similarity Join]
 +
** [http://www.cs.cmu.edu/~wcohen/10-605/pig-example/phirl.pig Optimized Similarity Join]
 +
 +
Rocchio:
  
 
* [http://www.cs.cmu.edu/~wcohen/10-605/rocchio.pptx Rocchio - Another Fast Streaming Learning Algorithm - PPT], [http://www.cs.cmu.edu/~wcohen/10-605/rocchio.pdf PDF]
 
* [http://www.cs.cmu.edu/~wcohen/10-605/rocchio.pptx Rocchio - Another Fast Streaming Learning Algorithm - PPT], [http://www.cs.cmu.edu/~wcohen/10-605/rocchio.pdf PDF]
 +
Also:
 +
* [http://www.cs.cmu.edu/~wcohen/10-605/pig-example/tips-for-debugging-pig.txt My comments on debugging PIG.]
 +
 +
=== Readings ===
 +
 +
* Pig: none required.  A nice on-line resource for PIG is the on-line version of the O'Reilly Book [http://chimera.labs.oreilly.com/books/1234000001811/index.html Programming Pig].
  
 
=== Readings for the Class ===
 
=== Readings for the Class ===

Revision as of 14:17, 17 September 2015

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2015.

Slides

Workflows for Hadoop:

Rocchio:

Also:

Readings

  • Pig: none required. A nice on-line resource for PIG is the on-line version of the O'Reilly Book Programming Pig.

Readings for the Class

Also discussed