Class meeting for 10-405 Streaming Naive Bayes
From Cohen Courses
Jump to navigationJump to search
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.
Slides
- Slides in Powerpoint - the stream-and-sort pattern, and large-vocabulary Naive Bayes
- Slides in PDF
Quiz
Readings for the Class
- Required: my notes on streaming and Naive Bayes
- Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.
Things to Remember
- Zipf's law and the prevalence of rare features/words
- Communication complexity
- Stream and sort
- Complexity of merge sort
- How pipes implement parallel processing
- How buffering output before a sort can improve performance
- How stream-and-sort can implement event-counting for naive Bayes