Amazon Elastic MapReduce information

From Cohen Courses
Revision as of 12:51, 20 July 2015 by Wcohen (talk | contribs) (Created page with "MR (Elastic MapReduce) is a popular cloud processing service from Amazon that includes Hadoop. Running Guinea Pig on EMR is easy enough, but there are lots of steps. This is...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

MR (Elastic MapReduce) is a popular cloud processing service from Amazon that includes Hadoop. Running Guinea Pig on EMR is easy enough, but there are lots of steps. This is a walkthrough.

Setting up your AWS account

1) First you need to get an Amazon AWS account. If you have an Amazon account, you can just use that password to log into AWS at https://console.aws.amazon.com.

Installing and configuring the command-line tool on your local machine

2) Install the tools: You need to establish the credentials you need to use EC2, the "Elastic Cloud" service that includes EMR, and also use EC2 to launch new virtual clusters in EMR. I use a command-line program (aka a "CLI") to do this. So first, install that program, the AWS CLI. The details are here, but briefly, go to a convenient directory, say ~/code/aws-cli, and type

 % curl https://s3.amazonaws.com/aws-cli/awscli-bundle.zip > awscli-bundle.zip
 % unzip awscli-bundle.zip
 % ./awscli-bundle/install -i `pwd`/install
 % export PATH=$PATH:~/code/aws-cli/install/bin/

To test this, type aws --version at the command prompt.

3) Next, you need to get your access key. An "access key" is a set of codes, one private, and one public, that are used to interact with the AWS CLI tool. Follow the directions [here https://console.aws.amazon.com/iam/home?#security_credential], and save the result somewhere safe and private.

4) Then you need to tell the AWS CLI about your access codes. The command for this is 'aws configure': you'll be asked for your codes and some other info, and I used these:

 % aws configure
 AWS Access Key ID [None]: ...
 AWS Secret Access Key [None]:  ...
 Default region name [None]: us-east-1
D efault output format [None]: json

This info is stored somewhere off your home directory by the AWS CLI tool.