Difference between revisions of "GHC Hadoop cluster information"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
 
'''Status: all students registered for 10-605 should have accounts on these machines''' --[[User:Wcohen|Wcohen]] ([[User talk:Wcohen|talk]]) 11:18, 11 February 2014 (EST)
 
'''Status: all students registered for 10-605 should have accounts on these machines''' --[[User:Wcohen|Wcohen]] ([[User talk:Wcohen|talk]]) 11:18, 11 February 2014 (EST)
  
Anyone can log into these machines with their Andrew account.  Registered 10-605 students will also have a HDFS home directory.  To use Hadoop you need a small amount of setup: here's a working '''.bashrc''' file.  [Your default shell may or may not be bash - that's just what I use - W].
+
FQDNs are ghc{01..81}.ghc.andrew.cmu.edu.  You can log into any of these.
 +
 
 +
The NameNode and Map/Reduce Admin. URLs:
 +
*http://ghc81.ghc.andrew.cmu.edu:50070/dfshealth.jsp
 +
*http://ghc81.ghc.andrew.cmu.edu:50030/jobtracker.jsp
 +
 
 +
Specs:
 +
*25 nodes have 12 Intel Xeon cores @ 3.20GHZ, 12288KB cache.  12GB RAM.
 +
*56 nodes have 4 Intel Xeon cores @ 2.67GHz, with 8192KB cache.  Also 12GB RAM.
 +
 
 +
Anyone can log into these machines with their Andrew account.  Registered 10-605 students will also have a HDFS home directory under their andrew id (eg, /user/wcohen).  To use Hadoop you need a small amount of setup: below is a working '''.bashrc''' file.  [Your default shell may or may not be bash - that's just what I use - W].
  
 
  export PATH=$PATH:/usr/local/hadoop/bin
 
  export PATH=$PATH:/usr/local/hadoop/bin
Line 11: Line 21:
 
  hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.1.jar \
 
  hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.1.jar \
 
  -mapper cat -reducer cat -numReduceTasks 10 -input /user/wcohen/rcv1/small/unsharded -output tmp-output
 
  -mapper cat -reducer cat -numReduceTasks 10 -input /user/wcohen/rcv1/small/unsharded -output tmp-output
 
FQDNs are ghc{01..81}.ghc.andrew.cmu.edu.  There will be a user
 
directory for you already under your andrew id.
 
 
The NameNode and Map/Reduce Admin. URLs:
 
*http://ghc81.ghc.andrew.cmu.edu:50070/dfshealth.jsp
 
*http://ghc81.ghc.andrew.cmu.edu:50030/jobtracker.jsp
 
 
Specs:
 
*25 nodes have 12 Intel Xeon cores @ 3.20GHZ, 12288KB cache.  12GB RAM.
 
*56 nodes have 4 Intel Xeon cores @ 2.67GHz, with 8192KB cache.  Also 12GB RAM.
 

Revision as of 12:20, 11 February 2014

Status: all students registered for 10-605 should have accounts on these machines --Wcohen (talk) 11:18, 11 February 2014 (EST)

FQDNs are ghc{01..81}.ghc.andrew.cmu.edu. You can log into any of these.

The NameNode and Map/Reduce Admin. URLs:

Specs:

  • 25 nodes have 12 Intel Xeon cores @ 3.20GHZ, 12288KB cache. 12GB RAM.
  • 56 nodes have 4 Intel Xeon cores @ 2.67GHz, with 8192KB cache. Also 12GB RAM.

Anyone can log into these machines with their Andrew account. Registered 10-605 students will also have a HDFS home directory under their andrew id (eg, /user/wcohen). To use Hadoop you need a small amount of setup: below is a working .bashrc file. [Your default shell may or may not be bash - that's just what I use - W].

export PATH=$PATH:/usr/local/hadoop/bin
export JAVA_HOME=/usr/lib/jvm/jre-sun
export CLASSPATH=`ls -1 /usr/local/hadoop/*.jar|perl -ne 'do{chop;print $sep,$_;$sep=":";}'`

We recommend running some simple job very soon to verify your setup (and also make sure that the permissions and access for your account are set properly). For instance, to copy a sharded version of the rcv1 data from William's HDFS account to yours:

hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.1.jar \
-mapper cat -reducer cat -numReduceTasks 10 -input /user/wcohen/rcv1/small/unsharded -output tmp-output