Posts Tagged ‘hadoop pki security’

Hadoop Cluster Setup, SSH Key Authentication

April 20, 2010 5 comments

So you have spent your time in pseudo mode and you have finally started moving to your own cluster?  Perhaps you just jumped right into the cluster setup?  In any case, a distributed Hadoop cluster setup requires your “master” node [name node & job tracker] to be able to SSH (without requiring a password, so key based authentication) to all other “slave” nodes (e.g. data nodes).

The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the secondary node) to start/stop them, etc.  This is also required to be setup on the secondary name node (which is listed in your masters file) so that [presuming it is running on another machine which is a VERY good idea for a production cluster] will be started from your name node with ./ and job tracker node with ./

Make sure you are the hadoop user for all of these commands.  If you have not yet installed Hadoop and/or created the hadoop user you should do that first.  Depending on your distribution (please follow it’s directions for setup) this will be slightly different (e.g. Cloudera creates the hadoop user for your when going through the rpm install).

First from your “master” node check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

On your master node try to ssh again (as the hadoop user) to your localhost and if you are still getting a password prompt then.

$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys

Now you need to copy (however you want to-do this please go ahead) your public key to all of your “slave” machine (don’t forget your secondary name node).  It is possible (depending on if these are new machines) that the slave’s hadoop user does not have a .ssh directory and if not you should create it ($ mkdir ~/.ssh)

$ scp ~/.ssh/ slave1:~/.ssh/

Now login (as the hadoop user) to your slave machine.  While on your slave machine add your master machine’s hadoop user’s public key to the slave machine’s hadoop authorized key store.

$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

Now, from the master node try to ssh to slave.

$ssh slave1

If you are still prompted for a password (which is most likely) then it is very often just a simple permission issue.  Go back to your slave node again and as the hadoop user run this

$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys

Try again from your master node.

$ssh slave1

And you should be good to go. Repeat for all Hadoop Cluster Nodes.


Joe Stein

Categories: Cluster Tags: