Difference between revisions of "10-601 Big Data"
From Cohen Courses
Jump to navigationJump to search (→Slides) |
|||
Line 3: | Line 3: | ||
=== Slides === | === Slides === | ||
− | * [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pptx Slides in PowerPoint]. | + | * [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pptx Slides in PowerPoint], [http://www.cs.cmu.edu/~wcohen/10-601/bigdata-nb.pdf Slides in PDF]. |
=== Readings === | === Readings === |
Latest revision as of 15:32, 24 November 2014
This a lecture used in the Syllabus for Machine Learning 10-601 in Fall 2014
Slides
Readings
- None
Summary
You should know:
- Why locality is important in working with large data
- What the relative costs of operations are for accessing disk, network, and memory
- What the Hadoop file system (HFS) is
- What the stages of Map-Reduce are: map, shuffle, and reduce
- Why combiners are often important in Map-Reduce
- What sort of tasks Map-Reduce is well-suited for, and what it's not well-suited for
- In outline, how Naive Bayes, or some other counting task, could be implemented on map-reduce