Difference between revisions of "Class meeting for 10-605 Streaming Naive Bayes"
From Cohen Courses
Jump to navigationJump to search (→Slides) |
(→Slides) |
||
(One intermediate revision by the same user not shown) | |||
Line 5: | Line 5: | ||
* [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pptx Slides in Powerpoint] - the stream-and-sort pattern, and large-vocabulary Naive Bayes | * [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pptx Slides in Powerpoint] - the stream-and-sort pattern, and large-vocabulary Naive Bayes | ||
* [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pdf Slides in PDF] | * [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pdf Slides in PDF] | ||
− | * [https://qna.cs.cmu.edu/#/pages/view/161] | + | |
+ | === Quiz === | ||
+ | |||
+ | * [https://qna.cs.cmu.edu/#/pages/view/161 Today's quiz] | ||
=== Readings for the Class === | === Readings for the Class === |
Latest revision as of 12:03, 5 September 2017
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2017.
Slides
- Slides in Powerpoint - the stream-and-sort pattern, and large-vocabulary Naive Bayes
- Slides in PDF
Quiz
Readings for the Class
- Required: my notes on streaming and Naive Bayes
- Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.
Things to Remember
- Zipf's law and the prevalence of rare features/words
- Communication complexity
- Stream and sort
- Complexity of merge sort
- How pipes implement parallel processing
- How buffering output before a sort can improve performance
- How stream-and-sort can implement event-counting for naive Bayes