Difference between revisions of "Class meeting for 10-405 Hadoop Overview"

Latest revision as of 11:09, 5 March 2018

Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
The primary phases of a map-reduce computation, and what happens in each
- Map
- Shuffle/sort
- Reduce
Where data might be transmitted across the network
How data is stored in Hadoop
- Consequences of large block size for streaming and storage efficiency

@@ Line 19: / Line 19: @@
 * Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
+* The primary phases of a map-reduce computation, and what happens in each
+** Map
+** Shuffle/sort
+** Reduce
+* Where data might be transmitted across the network
+* How data is stored in Hadoop
+** Consequences of large block size for streaming and storage efficiency