Difference between revisions of "Class meeting for 10-405 Streaming Naive Bayes"

From Cohen Courses
Jump to navigationJump to search
(Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...")
 
Line 8: Line 8:
 
=== Quiz ===
 
=== Quiz ===
  
* [https://qna.cs.cmu.edu/#/pages/view/161 Today's quiz]
+
* [https://qna.cs.cmu.edu/#/pages/view/161 Today's quiz] - maybe best to do this after I finish the TFIDF material on Wed.
  
 
=== Readings for the Class ===
 
=== Readings for the Class ===

Revision as of 14:24, 24 January 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Slides

Quiz

  • Today's quiz - maybe best to do this after I finish the TFIDF material on Wed.

Readings for the Class

  • Required: my notes on streaming and Naive Bayes
  • Optional: If you're interested in reading more about smoothing for naive Bayes, I recommend this paper: Peng, Fuchun, Dale Schuurmans, and Shaojun Wang. "Augmenting naive Bayes classifiers with statistical language models." Information Retrieval 7.3 (2004): 317-345.

Things to Remember

  • Zipf's law and the prevalence of rare features/words
  • Communication complexity
  • Stream and sort
    • Complexity of merge sort
    • How pipes implement parallel processing
    • How buffering output before a sort can improve performance
    • How stream-and-sort can implement event-counting for naive Bayes