Difference between revisions of "10-601 Big Data"

From Cohen Courses
Jump to navigationJump to search
 
Line 3: Line 3:
 
=== Slides ===
 
=== Slides ===
  
* [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pptx Slides in PowerPoint].
+
* [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pptx Slides in PowerPoint], [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pdf Slides in PDF].
  
 
=== Readings ===
 
=== Readings ===

Latest revision as of 15:32, 24 November 2014

This a lecture used in the Syllabus for Machine Learning 10-601 in Fall 2014

Slides

Readings

  • None

Summary

You should know:

  • Why locality is important in working with large data
  • What the relative costs of operations are for accessing disk, network, and memory
  • What the Hadoop file system (HFS) is
  • What the stages of Map-Reduce are: map, shuffle, and reduce
  • Why combiners are often important in Map-Reduce
  • What sort of tasks Map-Reduce is well-suited for, and what it's not well-suited for
  • In outline, how Naive Bayes, or some other counting task, could be implemented on map-reduce