Difference between revisions of "Class meeting for 10-605 Streaming Naive Bayes"

From Cohen Courses
Jump to navigationJump to search
 
Line 1: Line 1:
#REDIRECT [[Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes]]
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2017|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2017]].
 +
 
 +
=== Slides ===
 +
 
 +
* [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pptx Slides in Powerpoint] - the stream-and-sort pattern, and large-vocabulary Naive Bayes
 +
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQy6a0CAw Today's quiz]
 +
 
 +
=== Readings for the Class ===
 +
 
 +
* Required: [http://www.cs.cmu.edu/~wcohen/10-605/notes/scalable-nb-notes.pdf my notes on streaming and Naive Bayes]
 +
* Optional:  If you're interested in reading more about smoothing for naive Bayes, I recommend this paper:  Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.
 +
 
 +
=== Things to Remember ===
 +
 
 +
* Zipf's law and the prevalence of rare features/words
 +
* Communication complexity
 +
* Stream and sort
 +
** Complexity of merge sort
 +
** How pipes implement parallel processing
 +
** How buffering output before a sort can improve performance
 +
** How stream-and-sort can implement event-counting for naive Bayes

Revision as of 11:28, 11 August 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2017.

Slides

Readings for the Class

  • Required: my notes on streaming and Naive Bayes
  • Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.

Things to Remember

  • Zipf's law and the prevalence of rare features/words
  • Communication complexity
  • Stream and sort
    • Complexity of merge sort
    • How pipes implement parallel processing
    • How buffering output before a sort can improve performance
    • How stream-and-sort can implement event-counting for naive Bayes