Home > Cluster > Hadoop Cluster Setup, SSH Key Authentication

Hadoop Cluster Setup, SSH Key Authentication

So you have spent your time in pseudo mode and you have finally started moving to your own cluster?  Perhaps you just jumped right into the cluster setup?  In any case, a distributed Hadoop cluster setup requires your “master” node [name node & job tracker] to be able to SSH (without requiring a password, so key based authentication) to all other “slave” nodes (e.g. data nodes).

The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the secondary node) to start/stop them, etc.  This is also required to be setup on the secondary name node (which is listed in your masters file) so that [presuming it is running on another machine which is a VERY good idea for a production cluster] will be started from your name node with ./start-dfs.sh and job tracker node with ./start-mapred.sh

Make sure you are the hadoop user for all of these commands.  If you have not yet installed Hadoop and/or created the hadoop user you should do that first.  Depending on your distribution (please follow it’s directions for setup) this will be slightly different (e.g. Cloudera creates the hadoop user for your when going through the rpm install).

First from your “master” node check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

On your master node try to ssh again (as the hadoop user) to your localhost and if you are still getting a password prompt then.

$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys

Now you need to copy (however you want to-do this please go ahead) your public key to all of your “slave” machine (don’t forget your secondary name node).  It is possible (depending on if these are new machines) that the slave’s hadoop user does not have a .ssh directory and if not you should create it ($ mkdir ~/.ssh)

$ scp ~/.ssh/id_dsa.pub slave1:~/.ssh/master.pub

Now login (as the hadoop user) to your slave machine.  While on your slave machine add your master machine’s hadoop user’s public key to the slave machine’s hadoop authorized key store.

$ cat ~/.ssh/master.pub >> ~/.ssh/authorized_keys

Now, from the master node try to ssh to slave.

$ssh slave1

If you are still prompted for a password (which is most likely) then it is very often just a simple permission issue.  Go back to your slave node again and as the hadoop user run this

$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys

Try again from your master node.

$ssh slave1

And you should be good to go. Repeat for all Hadoop Cluster Nodes.

[tweetmeme http://wp.me/pTu1i-29%5D

Joe Stein

Categories: Cluster Tags:
  1. RahulRinayat
    July 6, 2010 at 4:51 am

    Very Very Thanks for this wonderful tutorial. Keep Uploading the this type of documents. Bye Bye

  2. Chris
    July 16, 2010 at 11:41 am

    I’m being asked to setup hadoop in an environment where key-based ssh logins aren’t allowed for application accounts for audit reasons. is there any way around this requirement?

  3. October 9, 2010 at 7:14 am

    thanks nice blog 😉

  4. April 9, 2013 at 6:29 pm

    Wow! Thank you! I always wanted to write on my blog something like that. Can I implement a fragment of your post to my site?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: