Home > Cluster, Hadoop > Tips, Tricks And Pointers When Setting Up Your First Hadoop Cluster To Run Map Reduce Jobs

Tips, Tricks And Pointers When Setting Up Your First Hadoop Cluster To Run Map Reduce Jobs

So you are ready now for your first real cluster?  Have you gone through all the stand alone and pseudo setup of Hadoop?  If not you could/should get familiar http://hadoop.apache.org/common/docs/current/quickstart.html.  The Cloudera distribution has a nice walk through http://archive.cloudera.com/docs/cdh.html.  Cloudera also has great training videos online http://www.cloudera.com/resources/?type=Training.  Also, if you have not already read the Hadoop bible you should (that would be Tom White’s Definitive Guide to Hadoop).

So now that you are setting up your cluster and looking to run some MR (Map Reduce) jobs on it there are a few points of interest to note that deviate from default settings and nuances.  Granted you would eventually end up learning these over a little bit of pain, trial & error, etc but I figure to share the knowledge for what is pretty much run of the mill stuff.

The first episode of our Podcast is also on this topic.

SSH Authentication

In a cluster, the namenode needs to communicate with all the other nodes.  I have already blogged on this previously http://wp.me/pTu1i-29 but it is important to getting you going so I wanted to mention it here again.

Namenode Configuration Changes

The image from your name node is the meta data for knowing where all of the blocks for all of your files in HDFS across your entire cluster are stored.  If you lose the image file your data is NOT recoverable (as the nodes will have no idea what block is for what file).  No fear you can just save a copy to an NFS share.  After you have mounted it then update your hdfs-site.xml configuration.  The property dfs.name.dir takes a comma separated list of directories to save your image to, add it.

Max tasks can be setup on the Namenode or overridden (and set final) on data nodes that may have different hardware configurations.  The max tasks are setup for both mappers and reducers.  To calculate this it is based on CPU (cores) and the amount of RAM you have and also the JVM max you setup in mapred.child.java.opts (the default is 200).  The Datanode and Tasktracker each are set to 1GB so for a 8GB machine the mapred.tasktracker.map.tasks.maximum could be set to 7  and the mapred.tasktracker.reduce.tasks.maximum set to 7 with the mapred.child.java.opts set to -400Xmx (assuming 8 cores).  Please note these task maxes are as much done by your CPU if you only have 1 CPU with 1 core then it is time to get new hardware for your data node or set the mask tasks to 1.  If you have 1 CPU with 4 cores then setting map to 3 and reduce to 3 would be good (saving 1 core for the daemon).

By default there is only one reducer and you need to configure mapred.reduce.tasks to be more than one.  This value should be somewhere between .95 and 1.75 times the number of maximum tasks per node times the number of data nodes.  So if you have 3 data nodes and it is setup max tasks of 7 then configure this between 25 and 36.

As of version 0.19 Hadoop is able to reuse the JVM for multiple tasks.  This is kind of important to not waste time starting up and tearing down the JVM itself and is configurable here mapred.job.reuse.jvm.num.tasks.

Another good configuration addition is mapred.compress.map.output.  Setting this value to true should (balance between the time to compress vs transfer) speed up the reducers copy greatly especially when working with large data sets.

Here are some more configuration for real world cluster setup,  the docs are good too http://hadoop.apache.org/common/docs/current/cluster_setup.html

  • Some non-default configuration values used to run sort900, that is 9TB of data sorted on a cluster with 900 nodes:
    Configuration File Parameter Value Notes
    conf/hdfs-site.xml dfs.block.size 134217728 HDFS blocksize of 128MB for large file-systems.
    conf/hdfs-site.xml dfs.namenode.handler.count 40 More NameNode server threads to handle RPCs from large number of DataNodes.
    conf/mapred-site.xml mapred.reduce.parallel.copies 20 Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
    conf/mapred-site.xml mapred.child.java.opts -Xmx512M Larger heap-size for child jvms of maps/reduces.
    conf/core-site.xml fs.inmemory.size.mb 200 Larger amount of memory allocated for the in-memory file-system used to merge map-outputs at the reduces.
    conf/core-site.xml io.sort.factor 100 More streams merged at once while sorting files.
    conf/core-site.xml io.sort.mb 200 Higher memory-limit while sorting data.
    conf/core-site.xml io.file.buffer.size 131072 Size of read/write buffer used in SequenceFiles
  • Updates to some configuration values to run sort1400 and sort2000, that is 14TB of data sorted on 1400 nodes and 20TB of data sorted on 2000 nodes:
    Configuration File Parameter Value Notes
    conf/mapred-site.xml mapred.job.tracker.handler.count 60 More JobTracker server threads to handle RPCs from large number of TaskTrackers.
    conf/mapred-site.xml mapred.reduce.parallel.copies 50
    conf/mapred-site.xml tasktracker.http.threads 50 More worker threads for the TaskTracker’s http server. The http server is used by reduces to fetch intermediate map-outputs.
    conf/mapred-site.xml mapred.child.java.opts -Xmx1024M Larger heap-size for child jvms of maps/reduces.

Secondary Namenode

You may have read or heard already a dozen times that the Secondary Namenode is NOT a backup of the Namenode.  Well, let me say it again for you.  The Secondary Namenode is not the backup of the Namenode.  The Secondary Namenode should also not run on the same machine as the Namenode.  The Secondary Namenode helps to keep the edit log of the of the Namenode updated in the image file.  All of the changes that happen in HDFS are managed in memory on the Namenode.  It writes the changes to a log file that the secondary name node will compact into a new image file.  This is helpful because on restart the Namenode would have to-do this and also helpful because the logs would just grow and could be normalized.

Now, the way that the Secondary Namenode does this is with GET & POST (yes over http) to the Namenode.  It will GET the logs and POST the image.  In order for the Secondary Namenode to-do this you have to configure dfs.http.address in the hdfs-site.xml on the server that you plan to run the Secondary Namenode on.

<property>
<name>dfs.http.address</name>
<value>namenode:50070</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>

Also, you need to put the Secondary Namenode into the masters file (which is defaulted to localhost).

Datanode Configuration Changes & Hardware

So here I only have to say that you should use JBOD (Just a Bunch Of Disks).  There have been tests showing 10%-30% degradation when HDFS blocks run on RAID.  So, when setting up your data nodes if you have a bunch of disks you can configure each one for that node to write to dfs.data.dir.

If you want you can also set a value that will always leave some space on the disk dfs.datanode.du.reserved.

Map Reduce Code Changes

In your Java code there is a little trick to help the job be “aware” within the cluster of tasks that are not dead but just working hard.  During execution of a task there is no built in reporting that the job is running as expected if it is not writing out.  So this means that if your tasks are taking up a lot of time doing work it is possible the cluster will see that task as failed (based on the mapred.task.tracker.expiry.interval setting).

Have no fear there is a way to tell cluster that your task is doing just fine.  You have 2 ways todo this you can either report the status or increment a counter.  Both of these will cause the task tracker to properly know the task is ok and this will get seen by the jobtracker in turn.  Both of these options are explained in the JavaDoc http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
*/

About these ads
Categories: Cluster, Hadoop Tags:
  1. Doug Balog
    May 3, 2010 at 5:28 pm

    Don’t forget to set dfs.hosts.exclude, then when you have a hardware failure, you can add the dead node to the exclude file and run `hadoop dfsadmin -refreshnodes`
    If you don’t set dfs.hosts.exclude, you will need to restart the namenode to properly remove the dead node.
    See http://wiki.apache.org/hadoop/FAQ#A17

  2. vikash kumar
    December 26, 2010 at 1:59 pm

    I have 1 master and 11 slaves that i am setting up with cloudera hadoop cdh3.
    Now after going through all the required configurations, i am stuck at the part where the namenode web interface shows only 2 nodes online, i read different forums that says by default it servers only 2 datanodes please confirm what and in which file in need to append changes to handle 12 datanodes.

  3. May 19, 2013 at 6:30 am

    I have been browsing on-line greater than 3
    hours today, but I by no means discovered any interesting article like yours.
    It is lovely price enough for me. In my view, if all
    webmasters and bloggers made excellent content as you probably did, the internet will be much
    more helpful than ever before.

  4. May 28, 2013 at 5:39 am

    Hello there, just became alert to your blog through
    Google, and found that it is really informative. I am going to
    watch out for brussels. I will be grateful if you
    continue this in future. Numerous people will be benefited
    from your writing. Cheers!

  5. June 20, 2013 at 10:39 pm

    I have read so many articles or reviews concerning the blogger lovers except this article is really a pleasant piece of writing, keep it up.

  6. June 29, 2013 at 12:57 pm

    I blog frequently and I genuinely appreciate your information.
    Your article has truly peaked my interest. I’m going to take a note of your site and keep checking for new information about once per week. I subscribed to your RSS feed as well.

  1. January 6, 2012 at 8:36 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 53 other followers

%d bloggers like this: