Class meeting for 10-605 Workflows For Hadoop
From Cohen Courses
Revision as of 15:54, 11 August 2016 by Wcohen (talk | contribs) (Wcohen moved page Class meeting for 10-605 Workflows For Hadoop 1 to Class meeting for 10-605 Workflows For Hadoop)
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2016.
Slides
- TBD
Readings
- Pig: none required. A nice on-line resource for PIG is the on-line version of the O'Reilly Book Programming Pig.
Readings for the Class
- Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütz, has a fairly self-contained chapter on the vector space model, including Rocchio's method.
Also discussed
- Joachims, Thorsten, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Proceedings of International Conference on Machine Learning (ICML), 1997.
- Relevance Feedback in Information Retrieval, SMART Retrieval System Experiments in Automatic Document Processing, 1971, Prentice Hall Inc.
- Schapire et al, Boosting and Rocchio applied to text filtering, SIGIR 98.
Things to Remember
- The TFIDF representation for documents.
- The Rocchio algorithm.
- Why Rocchio is easy to parallelize.