Difference between revisions of "Class meeting for 10-605 Workflows For Hadoop"
From Cohen Courses
Jump to navigationJump to search (→Slides) |
|||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall_2017]]. |
=== Slides === | === Slides === | ||
− | * First lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/ | + | * First lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/workflows-1.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/workflows-1.pdf in PDF]. |
+ | |||
+ | To be updated: | ||
* Second lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pdf in PDF]. | * Second lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-2.pdf in PDF]. | ||
* Third lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pdf in PDF]. | * Third lecture: Slides [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pptx in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/workflow-3.pdf in PDF]. | ||
Line 9: | Line 11: | ||
=== Quiz === | === Quiz === | ||
+ | * [https://qna.cs.cmu.edu/#/pages/view/170 quiz for first lecture] | ||
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgICwmqv_Cww Quiz] for first lecture. | * [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgICwmqv_Cww Quiz] for first lecture. | ||
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQt_CyCQw Quiz] for second lecture. | * [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQt_CyCQw Quiz] for second lecture. |
Revision as of 10:47, 12 September 2017
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall_2017.
Slides
- First lecture: Slides in Powerpoint, in PDF.
To be updated:
- Second lecture: Slides in Powerpoint, in PDF.
- Third lecture: Slides in Powerpoint, in PDF.
Quiz
- quiz for first lecture
- Quiz for first lecture.
- Quiz for second lecture.
- Quiz for third lecture.
Readings
- Pig: none required. A nice on-line resource for PIG is the on-line version of the O'Reilly Book Programming Pig.
- Optional: Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütz, has a fairly self-contained chapter on the vector space model, including Rocchio's method.
Also discussed
- Joachims, Thorsten, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Proceedings of International Conference on Machine Learning (ICML), 1997.
- Relevance Feedback in Information Retrieval, SMART Retrieval System Experiments in Automatic Document Processing, 1971, Prentice Hall Inc.
- Schapire et al, Boosting and Rocchio applied to text filtering, SIGIR 98.
Things to Remember
- The TFIDF representation for documents.
- The Rocchio algorithm.
- Why Rocchio is easy to parallelize.
- Definition of a similarity join/soft join.
- Why inverted indices make TFIDF representations useful for similarity joins
- e.g., whether high-IDF words have shorter or longer indices, and more or less impact in a similarity measure