Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes
From Cohen Courses
Jump to navigationJump to searchThis is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2016.
Slides
- Slides in Powerpoint, in PDF - the stream-and-sort pattern, and large-vocabulary Naive Bayes
- Today's quiz
Readings for the Class
- Required: my notes on streaming and Naive Bayes
- Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.
Things to Remember
- Zipf's law and the prevalence of rare features/words
- Communication complexity
- Stream and sort
- Complexity of merge sort
- How pipes implement parallel processing
- How buffering output before a sort can improve performance
- How stream-and-sort can implement event-counting for naive Bayes