Difference between revisions of "Class meeting for 10-605 in Fall 2016 Streaming Naive Bayes"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...")
 
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Spring 2013|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Spring_2013]].
+
This is one of the class meetings on the [[Syllabus for Machine Learning with Large Datasets 10-605 in Fall 2016|schedule]] for the course [[Machine Learning with Large Datasets 10-605 in Fall 2016]].
  
 
=== Slides ===
 
=== Slides ===
  
* [http://www.cs.cmu.edu/~wcohen/10-605/stream-nb.pptx Slides 1 - streaming Naive Bayes]
+
* [http://www.cs.cmu.edu/~wcohen/10-605/2016/stream-and-sort.pptx Slides in Powerpoint], [http://www.cs.cmu.edu/~wcohen/10-605/2016/stream-and-sort.pdf in PDF] - the stream-and-sort pattern, and large-vocabulary Naive Bayes
* [http://www.cs.cmu.edu/~wcohen/10-605/stream-and-sort.pptx Slides 2 - the stream-and-sort pattern, and large-vocabulary Naive Bayes]
+
* [https://qna-app.appspot.com/edit_new.html#/pages/view/aglzfnFuYS1hcHByGQsSDFF1ZXN0aW9uTGlzdBiAgIDQy6a0CAw Today's quiz]
  
 +
=== Readings for the Class ===
  
=== Readings for the Class ===
+
* Required: [http://www.cs.cmu.edu/~wcohen/10-605/notes/scalable-nb-notes.pdf my notes on streaming and Naive Bayes]
 +
* Optional:  If you're interested in reading more about smoothing for naive Bayes, I recommend this paper:  Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.
 +
 
 +
=== Things to Remember ===
  
* None required.  If you're interested in reading more about smoothing for naive Bayes, I recommend this paper:  Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.
+
* Zipf's law and the prevalence of rare features/words
 +
* Communication complexity
 +
* Stream and sort
 +
** Complexity of merge sort
 +
** How pipes implement parallel processing
 +
** How buffering output before a sort can improve performance
 +
** How stream-and-sort can implement event-counting for naive Bayes

Latest revision as of 11:26, 11 August 2017

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-605 in Fall 2016.

Slides

Readings for the Class

  • Required: my notes on streaming and Naive Bayes
  • Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.

Things to Remember

  • Zipf's law and the prevalence of rare features/words
  • Communication complexity
  • Stream and sort
    • Complexity of merge sort
    • How pipes implement parallel processing
    • How buffering output before a sort can improve performance
    • How stream-and-sort can implement event-counting for naive Bayes