Difference between revisions of "Class meeting for 10-405 Streaming Naive Bayes"

Latest revision as of 11:07, 5 March 2018

Required: my notes on streaming and Naive Bayes
Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.

What TFIDF weighting is and how to compute it
- Computing DFs requires extra pass over training set
How it's used in Rocchio

Communication complexity
Stream and sort
- Complexity of merge sort
- How pipes implement parallel processing
- How buffering output before a sort can improve performance
- How stream-and-sort can implement event-counting for naive Bayes

@@ Line 8: / Line 8: @@
 === Quiz ===
-* [https://qna.cs.cmu.edu/#/pages/view/161 Today's quiz]
+* [https://qna.cs.cmu.edu/#/pages/view/161 Today's quiz].
 === Readings for the Class ===
@@ Line 16: / Line 16: @@
 === Things to Remember ===
+* What TFIDF weighting is and how to compute it
+** Computing DFs requires extra pass over training set
+* How it's used in Rocchio
 * Zipf's law and the prevalence of rare features/words
 * Communication complexity
 * Stream and sort