Difference between revisions of "Class meeting for 10-405 Hadoop Overview"
From Cohen Courses
Jump to navigationJump to search (→Quiz) |
|||
(One intermediate revision by the same user not shown) | |||
Line 9: | Line 9: | ||
=== Quiz === | === Quiz === | ||
− | |||
* [https://qna.cs.cmu.edu/#/pages/view/244 Today's quiz] | * [https://qna.cs.cmu.edu/#/pages/view/244 Today's quiz] | ||
− | |||
=== Readings for the Class === | === Readings for the Class === | ||
− | * There are lots of on-line tutorials for Hadoop. The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good. | + | * There are lots of on-line tutorials for Hadoop. The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good. You might also look at this [http://www.cs.cmu.edu/~wcohen/10-605/annotated-hadoop-log.txt annotated log of me interacting with streaming Hadoop]. |
=== Things to Remember === | === Things to Remember === | ||
* Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ... | * Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ... | ||
+ | * The primary phases of a map-reduce computation, and what happens in each | ||
+ | ** Map | ||
+ | ** Shuffle/sort | ||
+ | ** Reduce | ||
+ | * Where data might be transmitted across the network | ||
+ | * How data is stored in Hadoop | ||
+ | ** Consequences of large block size for streaming and storage efficiency |
Latest revision as of 11:09, 5 March 2018
This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.
Slides
Map-reduce overview:
Quiz
Readings for the Class
- There are lots of on-line tutorials for Hadoop. The O'Reilly Book is also quite good. You might also look at this annotated log of me interacting with streaming Hadoop.
Things to Remember
- Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
- The primary phases of a map-reduce computation, and what happens in each
- Map
- Shuffle/sort
- Reduce
- Where data might be transmitted across the network
- How data is stored in Hadoop
- Consequences of large block size for streaming and storage efficiency