Difference between revisions of "Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes"
From Cohen Courses
Jump to navigationJump to searchm (Wcohen moved page Class meeting for 10-605 Streaming Naive Bayes to Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes) |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in | + | This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2016]]. |
=== Slides === | === Slides === | ||
− | * [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pptx Slides in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pdf in PDF] - the stream-and-sort pattern, and large-vocabulary Naive Bayes | + | * [http://www.cs.cmu.edu/~wcohen/10-605/2016/stream-and-sort.pptx Slides in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/stream-and-sort.pdf in PDF] - the stream-and-sort pattern, and large-vocabulary Naive Bayes |
+ | * [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQy6a0CAw Today's quiz] | ||
=== Readings for the Class === | === Readings for the Class === | ||
Line 9: | Line 10: | ||
* Required: [http://www.cs.cmu.edu/~wcohen/10-605/notes/scalable-nb-notes.pdf my notes on streaming and Naive Bayes] | * Required: [http://www.cs.cmu.edu/~wcohen/10-605/notes/scalable-nb-notes.pdf my notes on streaming and Naive Bayes] | ||
* Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345. | * Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345. | ||
+ | |||
+ | === Things to Remember === | ||
+ | |||
+ | * Zipf's law and the prevalence of rare features/words | ||
+ | * Communication complexity | ||
+ | * Stream and sort | ||
+ | ** Complexity of merge sort | ||
+ | ** How pipes implement parallel processing | ||
+ | ** How buffering output before a sort can improve performance | ||
+ | ** How stream-and-sort can implement event-counting for naive Bayes |
Latest revision as of 11:26, 11 August 2017
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2016.
Slides
- Slides in Powerpoint, in PDF - the stream-and-sort pattern, and large-vocabulary Naive Bayes
- Today's quiz
Readings for the Class
- Required: my notes on streaming and Naive Bayes
- Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.
Things to Remember
- Zipf's law and the prevalence of rare features/words
- Communication complexity
- Stream and sort
- Complexity of merge sort
- How pipes implement parallel processing
- How buffering output before a sort can improve performance
- How stream-and-sort can implement event-counting for naive Bayes