Difference between revisions of "Class meeting for 10-405 Hadoop Overview"

Latest revision as of 11:09, 5 March 2018

Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
The primary phases of a map-reduce computation, and what happens in each
- Map
- Shuffle/sort
- Reduce
Where data might be transmitted across the network
How data is stored in Hadoop
- Consequences of large block size for streaming and storage efficiency

@@ Line 10: / Line 10: @@
 === Quiz ===
-* There is no quiz today - you should instead spend the review time actually working with Hadoop on stoat.  You might also look at this [http://www.cs.cmu.edu/~wcohen/10-605/annotated-hadoop-log.txt  annotated log of me interacting with streaming Hadoop].
+* [https://qna.cs.cmu.edu/#/pages/view/244 Today's quiz]
 === Readings for the Class ===
-* There are lots of on-line tutorials for Hadoop.  The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good.
+* There are lots of on-line tutorials for Hadoop.  The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good. You might also look at this [http://www.cs.cmu.edu/~wcohen/10-605/annotated-hadoop-log.txt  annotated log of me interacting with streaming Hadoop].
 === Things to Remember ===
 * Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
+* The primary phases of a map-reduce computation, and what happens in each
+** Map
+** Shuffle/sort
+** Reduce
+* Where data might be transmitted across the network
+* How data is stored in Hadoop
+** Consequences of large block size for streaming and storage efficiency