Difference between revisions of "Class meeting for 10-405 Hadoop Overview"
From Cohen Courses
Jump to navigationJump to searchLine 19: | Line 19: | ||
* Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ... | * Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ... | ||
+ | * The primary phases of a map-reduce computation, and what happens in each | ||
+ | ** Map | ||
+ | ** Shuffle/sort | ||
+ | ** Reduce | ||
+ | * Where data might be transmitted across the network | ||
+ | * How data is stored in Hadoop | ||
+ | ** Consequences of large block size for streaming and storage efficiency |
Latest revision as of 11:09, 5 March 2018
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.
Slides
Map-reduce overview:
Quiz
Readings for the Class
- There are lots of on-line tutorials for Hadoop. The O'Reilly Book is also quite good. You might also look at this annotated log of me interacting with streaming Hadoop.
Things to Remember
- Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
- The primary phases of a map-reduce computation, and what happens in each
- Map
- Shuffle/sort
- Reduce
- Where data might be transmitted across the network
- How data is stored in Hadoop
- Consequences of large block size for streaming and storage efficiency