Difference between revisions of "10-601 Big Data"

Latest revision as of 16:32, 24 November 2014

You should know:

Why locality is important in working with large data
What the relative costs of operations are for accessing disk, network, and memory
What the Hadoop file system (HFS) is
What the stages of Map-Reduce are: map, shuffle, and reduce
Why combiners are often important in Map-Reduce
What sort of tasks Map-Reduce is well-suited for, and what it's not well-suited for
In outline, how Naive Bayes, or some other counting task, could be implemented on map-reduce

@@ Line 1: / Line 1: @@
-This a lecture used in the [[Syllabus for Machine Learning 10-601]]
+This a lecture used in the [[Syllabus for Machine Learning 10-601 in Fall 2014]]
 === Slides ===
-* [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pptx Slides in PowerPoint].
+* [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pptx Slides in PowerPoint], [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pdf Slides in PDF].
 === Readings ===