Difference between revisions of "Class meeting for 10-405 Streaming Naive Bayes"

From Cohen Courses

Jump to navigation Jump to search

Latest revision as of 11:07, 5 March 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Contents

1 Slides
2 Quiz
3 Readings for the Class
4 Things to Remember

Slides

Slides in Powerpoint - the stream-and-sort pattern, and large-vocabulary Naive Bayes
Slides in PDF

Quiz

Today's quiz.

Readings for the Class

Required: my notes on streaming and Naive Bayes
Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.

Things to Remember

What TFIDF weighting is and how to compute it
- Computing DFs requires extra pass over training set
How it's used in Rocchio

Zipf's law and the prevalence of rare features/words

Communication complexity
Stream and sort
- Complexity of merge sort
- How pipes implement parallel processing
- How buffering output before a sort can improve performance
- How stream-and-sort can implement event-counting for naive Bayes

Retrieved from "http://curtis.ml.cmu.edu/w/courses/index.php?title=Class_meeting_for_10-405_Streaming_Naive_Bayes&oldid=19151"

Navigation menu