Difference between revisions of "Class meeting for 10-605 Rocchio and Hadoop Workflows"
From Cohen Courses
Jump to navigationJump to search (→Slides) |
(→Slides) |
||
Line 5: | Line 5: | ||
* [http://www.cs.cmu.edu/~wcohen/10-605/phrases.pptx Phrase-Finding] | * [http://www.cs.cmu.edu/~wcohen/10-605/phrases.pptx Phrase-Finding] | ||
* [http://www.cs.cmu.edu/~wcohen/10-605/details.pptx Performance Details - Sorting and Unix Pipes] | * [http://www.cs.cmu.edu/~wcohen/10-605/details.pptx Performance Details - Sorting and Unix Pipes] | ||
− | * | + | * [http://www.cs.cmu.edu/~wcohen/10-605/rocchio.pptx Another Fast Algorithm Streaming Learning Algorithm] |
=== Readings for the Class === | === Readings for the Class === |
Revision as of 14:18, 29 January 2014
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Spring_2014.
Slides
- Phrase-Finding
- Performance Details - Sorting and Unix Pipes
- Another Fast Algorithm Streaming Learning Algorithm
Readings for the Class
- Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütz, has a fairly self-contained chapter on the vector space model, including Rocchio's method.
Also discussed
- Joachims, Thorsten, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Proceedings of International Conference on Machine Learning (ICML), 1997.
- Relevance Feedback in Information Retrieval, SMART Retrieval System Experiments in Automatic Document Processing, 1971, Prentice Hall Inc.
- Schapire et al, Boosting and Rocchio applied to text filtering, SIGIR 98.
- Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm, MLJ 1988. Includes the mistake-bound theory.