10-601 Big Data

You should know:

Why locality is important in working with large data
What the relative costs of operations are for accessing disk, network, and memory
What the Hadoop file system (HFS) is
What the stages of Map-Reduce are: map, shuffle, and reduce
Why combiners are often important in Map-Reduce
What sort of tasks Map-Reduce is well-suited for, and what it's not well-suited for
In outline, how Naive Bayes, or some other counting task, could be implemented on map-reduce

Navigation menu