Difference between revisions of "Class meeting for 10-405 Hadoop Overview"

From Cohen Courses
Jump to navigationJump to search
 
(One intermediate revision by the same user not shown)
Line 9: Line 9:
  
 
=== Quiz ===
 
=== Quiz ===
 
  
 
* [https://qna.cs.cmu.edu/#/pages/view/244 Today's quiz]
 
* [https://qna.cs.cmu.edu/#/pages/view/244 Today's quiz]
* You might also look at this [http://www.cs.cmu.edu/~wcohen/10-605/annotated-hadoop-log.txt  annotated log of me interacting with streaming Hadoop].
 
  
 
=== Readings for the Class ===
 
=== Readings for the Class ===
  
* There are lots of on-line tutorials for Hadoop.  The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good.
+
* There are lots of on-line tutorials for Hadoop.  The [http://shop.oreilly.com/product/0636920010388.do O'Reilly Book] is also quite good. You might also look at this [http://www.cs.cmu.edu/~wcohen/10-605/annotated-hadoop-log.txt  annotated log of me interacting with streaming Hadoop].
  
 
=== Things to Remember ===
 
=== Things to Remember ===
  
 
* Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
 
* Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
 +
* The primary phases of a map-reduce computation, and what happens in each
 +
** Map
 +
** Shuffle/sort
 +
** Reduce
 +
* Where data might be transmitted across the network
 +
* How data is stored in Hadoop
 +
** Consequences of large block size for streaming and storage efficiency

Latest revision as of 11:09, 5 March 2018

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Slides

Map-reduce overview:

Quiz

Readings for the Class

Things to Remember

  • Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
  • The primary phases of a map-reduce computation, and what happens in each
    • Map
    • Shuffle/sort
    • Reduce
  • Where data might be transmitted across the network
  • How data is stored in Hadoop
    • Consequences of large block size for streaming and storage efficiency