Difference between revisions of "Class meeting for 10-405 Hadoop Overview"
From Cohen Courses
Jump to navigationJump to search (Created page with "This is one of the class meetings on the schedule for the course Machine Learning with Large Data...") |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
=== Quiz === | === Quiz === | ||
− | * | + | * [https://qna.cs.cmu.edu/#/pages/view/244 Today's quiz] |
=== Readings for the Class === | === Readings for the Class === | ||
− | * There are lots of on-line tutorials for Hadoop. The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good. | + | * There are lots of on-line tutorials for Hadoop. The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good. You might also look at this [http://www.cs.cmu.edu/~wcohen/10-605/annotated-hadoop-log.txt annotated log of me interacting with streaming Hadoop]. |
=== Things to Remember === | === Things to Remember === | ||
* Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ... | * Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ... | ||
+ | * The primary phases of a map-reduce computation, and what happens in each | ||
+ | ** Map | ||
+ | ** Shuffle/sort | ||
+ | ** Reduce | ||
+ | * Where data might be transmitted across the network | ||
+ | * How data is stored in Hadoop | ||
+ | ** Consequences of large block size for streaming and storage efficiency |
Latest revision as of 11:09, 5 March 2018
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.
Slides
Map-reduce overview:
Quiz
Readings for the Class
- There are lots of on-line tutorials for Hadoop. The O'Reilly Book is also quite good. You might also look at this annotated log of me interacting with streaming Hadoop.
Things to Remember
- Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
- The primary phases of a map-reduce computation, and what happens in each
- Map
- Shuffle/sort
- Reduce
- Where data might be transmitted across the network
- How data is stored in Hadoop
- Consequences of large block size for streaming and storage efficiency