Class meeting for 10-405 Hadoop Overview

From Cohen Courses
Revision as of 11:09, 5 March 2018 by Wcohen (talk | contribs) (→‎Things to Remember)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is one of the class meetings on the schedule for the course Machine Learning with Large Datasets 10-405 in Spring 2018.

Slides

Map-reduce overview:

Quiz

Readings for the Class

Things to Remember

  • Hadoop terminology: HDFS, shards, job tracker, combiner, mapper, reducer, ...
  • The primary phases of a map-reduce computation, and what happens in each
    • Map
    • Shuffle/sort
    • Reduce
  • Where data might be transmitted across the network
  • How data is stored in Hadoop
    • Consequences of large block size for streaming and storage efficiency