<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>All Things Hadoop</title>
	<atom:link href="http://allthingshadoop.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://allthingshadoop.com</link>
	<description>Scalable &#38; Distributed Computing for noobs, nerds and the elite Hadooper and Hadooperette.</description>
	<lastBuildDate>Wed, 30 Nov 2011 18:24:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='allthingshadoop.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/c6d1ce6389fbc4c5c50fe33c968530fc?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>All Things Hadoop</title>
		<link>http://allthingshadoop.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://allthingshadoop.com/osd.xml" title="All Things Hadoop" />
	<atom:link rel='hub' href='http://allthingshadoop.com/?pushpress=hub'/>
		<item>
		<title>Faster Datanodes with less wait io using df instead of du</title>
		<link>http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/</link>
		<comments>http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/#comments</comments>
		<pubDate>Sat, 21 May 2011 04:37:22 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=446</guid>
		<description><![CDATA[I have noticed often that the check Hadoop uses to calculate usage for the data nodes causes a fair amount of wait io on them driving up load. Every cycle we can get from every spindle we want! So I came up with a nice little hack to use df instead of du. Here is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=446&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have noticed often that the check Hadoop uses to calculate usage for the data nodes causes a fair amount of wait io on them driving up load.</p>
<p>Every cycle we can get from every spindle we want!</p>
<p>So I came up with a nice little hack to use df instead of du.</p>
<p>Here is basically what I did so you can do it too.</p>
<p><code><br />
mv /usr/bin/du /usr/bin/bak_du<br />
vi /usr/bin/du </code></p>
<p>and save this inside of it<br />
<code><br />
#!/bin/sh</p>
<p>mydf=$(df $2 | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $3 }')<br />
echo -e "$mydf\t$2"<br />
</code></p>
<p>then give it execute permission<br />
<code><br />
chmod a+x /usr/bin/du<br />
</code></p>
<p>restart you data node check the log for no errors and make sure it starts back up</p>
<p>viola</p>
<p>Now when Hadoop calls &#8220;du -sk /yourhdfslocation&#8221; it will be expedient with its results</p>
<p>whats wrong with this?</p>
<p>1) I assume you have nothing else on your disks that you are storing so df is really close to du since almost all of your data is in HDFS</p>
<p>2) If you have more than 1 volume holding your hdfs blocks this is not exactly accurate so you are skewing the size of each vol by only calculating one of them and using that result for the others&#8230;. this is simple to fix just parse your df result differently and use the path passed into the second paramater to know which vol to grep in your df result&#8230; your first volume is going to be larger anyways most likely and you should be monitoring disk space another way so it is not going to be very harmefull if you just check and report the first volume&#8217;s size</p>
<p>3) you might not have your HDFS blocks on your first volume &#8230;. see #2 you can just grep the volume you want to report</p>
<p>/*<br />
Joe Stein<br />
<a href="http://www.linkedin.com/in/charmalloc" target="_blank">http://www.linkedin.com/in/charmalloc</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/446/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/446/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/446/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/446/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/446/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/446/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/446/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=446&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloudera, Yahoo and the Apache Hadoop Community Security Branch Release Update</title>
		<link>http://allthingshadoop.com/2011/05/05/cloudera-yahoo-and-the-apache-hadoop-community-security-branch-release-update/</link>
		<comments>http://allthingshadoop.com/2011/05/05/cloudera-yahoo-and-the-apache-hadoop-community-security-branch-release-update/#comments</comments>
		<pubDate>Fri, 06 May 2011 02:03:39 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Open Source Projects]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=423</guid>
		<description><![CDATA[In the wake of Yahoo! having announced that they would discontinue their Hadoop distribution and focus their efforts into Apache Hadoop http://yhoo.it/i9Ww8W the landscape has become tumultuous. Yahoo! engineers have spent their time and effort contributing back to the Apache Hadoop security branch (branch of 0.20) and have proposed release candidates. Currently being voted and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=423&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the wake of Yahoo! having announced that they would discontinue their Hadoop distribution and focus their efforts into Apache Hadoop <a target="_blank" href="http://yhoo.it/i9Ww8W">http://yhoo.it/i9Ww8W</a> the landscape has become tumultuous.</p>
<p>Yahoo! engineers have spent their time and effort contributing back to the Apache Hadoop security branch (branch of 0.20) and have proposed release candidates.  </p>
<p>Currently being voted and discussed is &#8220;Release candidate 0.20.203.0-rc1&#8243;.  If you are following the VOTE and the DISCUSSION then maybe you are like me it just cannot be done without a bowl of popcorn before opening the emails.  It is getting heated in a good and constructive kind of way. <a target="_blank" href="http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/thread">http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/thread</a> there are already more emails in 5 days of May than there were in all of April. woot!</p>
<p>My take?  Has it become Cloudera vs Yahoo! and Apache Hadoop releases will become fragmented because of it? Well, it is kind of like that already.  0.21 is the latest and can anyone that is not a committer quickly know or find out the difference between that and the other release branches? It is esoteric <img src='http://s0.wp.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  0.22 is right around the corner too which is a release from trunk.</p>
<p>Lets take HBase as an example (a Hadoop project).  Do you know what version of HDFS releases can support HBase in production without losing data? If you do then maybe you don&#8217;t realize that many people still don&#8217;t even know about the branch. And, now that CDH3 is out you can use that (thanks Cloudera!) otherwise it is highly recommended to not be in production with HBase unless you use the append branch <a target="_blank" href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/</a> of 0.20 which makes you miss out on other changes in trunk releases&#8230;</p>
<p>__ eyes crossing inwards and sideways with what branch does what and when the trunk release has everything __</p>
<p>Hadoop is becoming an a la cart which features and fixes can I live without for all of what I really need to deploy &#8230; or requiring companies to hire a committer &#8230; or a bunch of folks that do nothing but Hadoop day in and day out (sounds like Oracle, ahhhhhh)&#8230; or going with the Cloudera Distribution (which is what I do and don&#8217;t look back).  The barrier to entry feels like it has increased over the last year. However, stepping back from that the system overall has had a lot of improvements!  A lot of great work by a lot of dedicated folks putting in their time and effort towards making Hadoop (in whatever form the elephant stampedes through its data) a reality.</p>
<p>Big shops that have teams of &#8220;Hadoop Engineers&#8221; (Yahoo, Facebook, eBay, LinkedIn, etc) with contributors and/or committers on that team should not have lots of impact because ultimately they are able to role their own releases for whatever they need/want themselves in production and just support it.  Not all are so endowed.</p>
<p>Now, all of that having been said I write this because the discussion is REALLY good and has a lot of folks (including those from Yahoo! and Cloudera) bringing up pain points and proposing some great solutions that hopefully will contribute to the continued growth and success of the Apache Hadoop Community <a href="http://hadoop.apache.org/" target="_blank">http://hadoop.apache.org/</a>&#8230;. still if you want to run it in your company (and don&#8217;t have a committer on staff) then go download CDH3 <a href="http://www.cloudera.com" target="_blank">http://www.cloudera.com</a> it will get you going with the latest and greatest of all the releases, branches, etc, etc, etc.  Great documentation too!</p>
<div class="tweetmeme-button" id="tweetmeme-button-post-423" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2011%2F05%2F05%2Fcloudera-yahoo-and-the-apache-hadoop-community-security-branch-release-update%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-6P%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2011%2F05%2F05%2Fcloudera-yahoo-and-the-apache-hadoop-community-security-branch-release-update%2F" height="61" width="51" /></a>
</div>
<p>/*<br />
Joe Stein<br />
<a target="_blank" href="http://www.linkedin.com/in/charmalloc">http://www.linkedin.com/in/charmalloc</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/423/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/423/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/423/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/423/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/423/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/423/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/423/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/423/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/423/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/423/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/423/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/423/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/423/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/423/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=423&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2011/05/05/cloudera-yahoo-and-the-apache-hadoop-community-security-branch-release-update/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop Streaming Made Simple using Joins and Keys with Python</title>
		<link>http://allthingshadoop.com/2010/12/16/simple-hadoop-streaming-tutorial-using-joins-and-keys-with-python/</link>
		<comments>http://allthingshadoop.com/2010/12/16/simple-hadoop-streaming-tutorial-using-joins-and-keys-with-python/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 00:20:46 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=355</guid>
		<description><![CDATA[There are a lot of different ways to write MapReduce jobs!!! I find streaming scripts a good way to interrogate data sets (especially when I have not worked with them yet or are creating new ones) and enjoy the lifecycle when the initial elaboration of the data sets lead to the construction of the finalized [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=355&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There are a lot of different ways to write MapReduce jobs!!!</p>
<p>I find streaming scripts a good way to interrogate data sets (especially when I have not worked with them yet or are creating new ones) and enjoy the lifecycle when the initial elaboration of the data sets lead to the construction of the finalized scripts for an entire job (or series of jobs as is often the case).</p>
<p>When doing streaming with Hadoop you do have a few library options.  If you are a Ruby programmer then <a target="_blank" href="http://mrflip.github.com/wukong/moreinfo.html">wukong</a> is awesome! For Python programmers you can use <a target="_blank" href="https://github.com/klbostee/dumbo/wiki">dumbo</a> and more recently released <a target="_blank" href="http://engineeringblog.yelp.com/2010/10/mrjob-distributed-computing-for-everybody.html">mrjob</a>.  </p>
<p>I like working under the hood myself and getting down and dirty with the data and here is how you can too.</p>
<p>Lets start first with defining two simple sample data sets.</p>
<p>Data set 1:  <strong>countries.dat</strong></p>
<p>name|key</p>
<p><pre class="brush: plain; gutter: false;">
United States|US
Canada|CA
United Kingdom|UK
Italy|IT
</pre></p>
<p>Data set 2: <strong>customers.dat</strong></p>
<p>name|type|country<br />
<pre class="brush: plain; gutter: false;">
Alice Bob|not bad|US
Sam Sneed|valued|CA
Jon Sneed|valued|CA
Arnold Wesise|not so good|UK
Henry Bob|not bad|US
Yo Yo Ma|not so good|CA
Jon York|valued|CA
Alex Ball|valued|UK
Jim Davis|not so bad|JA
</pre></p>
<p><strong>The requirements:</strong> you need to find out grouped by type of customer how many of each type are in each country with the name of the country listed in the countries.dat in the final result (and not the 2 digit country name).</p>
<p><strong>To-do this you need to:</strong></p>
<pre>
1) Join the data sets
2) Key on country
3) Count type of customer per country
4) Output the results</pre>
<p>So first lets code up a quick mapper called <strong>smplMapper.py</strong> (you can decide if smpl is short for simple or sample).</p>
<p>Now in coding the mapper and reducer in Python the basics are explained nicely here <a target="_blank" href="http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/">http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/</a> but I am going to dive a bit deeper to tackle our example with some more tactics.</p>
<p><pre class="brush: python; gutter: false;">
#!/usr/bin/env python

import sys

# input comes from STDIN (standard input)
for line in sys.stdin:
	try: #sometimes bad data can cause errors use this how you like to deal with lint and bad data
        
		personName = &quot;-1&quot; #default sorted as first
		personType = &quot;-1&quot; #default sorted as first
		countryName = &quot;-1&quot; #default sorted as first
		country2digit = &quot;-1&quot; #default sorted as first
		
		# remove leading and trailing whitespace
		line = line.strip()
	 	
		splits = line.split(&quot;|&quot;)
		
		if len(splits) == 2: #country data
			countryName = splits[0]
			country2digit = splits[1]
		else: #people data
			personName = splits[0]
			personType = splits[1]
			country2digit = splits[2]			
		
		print '%s^%s^%s^%s' % (country2digit,personType,personName,countryName)
	except: #errors are going to make your job fail which you may or may not want
		pass

</pre></p>
<p><strong>Don&#8217;t forget:</strong></p>
<p><pre class="brush: plain; gutter: false;">
chmod a+x smplMapper.py
</pre></p>
<p>Great! We just took care of #1 but time to test and see what is going to the reducer.</p>
<p><strong>From the command line run:</strong></p>
<p><pre class="brush: plain; gutter: false;">
cat customers.dat countries.dat|./smplMapper.py|sort
</pre></p>
<p><strong>Which will result in:</strong></p>
<p><pre class="brush: plain; gutter: false;">
CA^-1^-1^Canada
CA^not so good^Yo Yo Ma^-1
CA^valued^Jon Sneed^-1
CA^valued^Jon York^-1
CA^valued^Sam Sneed^-1
IT^-1^-1^Italy
JA^not so bad^Jim Davis^-1
UK^-1^-1^United Kingdom
UK^not so good^Arnold Wesise^-1
UK^valued^Alex Ball^-1
US^-1^-1^United States
US^not bad^Alice Bob^-1
US^not bad^Henry Bob^-1
</pre></p>
<p>Notice how this is sorted so the country is first and the people in that country after it (so we can grab the correct country name as we loop) and with the type of customer also sorted (but within country) so we can properly count the types within the country. =8^)</p>
<p>Let us hold off on #2 for a moment (just hang in there it will all come together soon I promise) and get <strong>smplReducer.py</strong> working first.</p>
<p><pre class="brush: python; gutter: false;">
#!/usr/bin/env python
 
import sys
 
# maps words to their counts
foundKey = &quot;&quot;
foundValue = &quot;&quot;
isFirst = 1
currentCount = 0
currentCountry2digit = &quot;-1&quot;
currentCountryName = &quot;-1&quot;
isCountryMappingLine = False

# input comes from STDIN
for line in sys.stdin:
	# remove leading and trailing whitespace
	line = line.strip()
	
	try:
		# parse the input we got from mapper.py
		country2digit,personType,personName,countryName = line.split('^')
		
		#the first line should be a mapping line, otherwise we need to set the currentCountryName to not known
		if personName == &quot;-1&quot;: #this is a new country which may or may not have people in it
			currentCountryName = countryName
			currentCountry2digit = country2digit
			isCountryMappingLine = True
		else:
			isCountryMappingLine = False # this is a person we want to count
		
		if not isCountryMappingLine: #we only want to count people but use the country line to get the right name 

			#first check to see if the 2digit country info matches up, might be unkown country
			if currentCountry2digit != country2digit:
				currentCountry2digit = country2digit
				currentCountryName = '%s - Unkown Country' % currentCountry2digit
			
			currentKey = '%s\t%s' % (currentCountryName,personType) 
			
			if foundKey != currentKey: #new combo of keys to count
				if isFirst == 0:
					print '%s\t%s' % (foundKey,currentCount)
					currentCount = 0 #reset the count
				else:
					isFirst = 0
			
				foundKey = currentKey #make the found key what we see so when we loop again can see if we increment or print out
			
			currentCount += 1 # we increment anything not in the map list
	except:
		pass

try:
	print '%s\t%s' % (foundKey,currentCount)
except:
	pass

</pre></p>
<p><strong>Don&#8217;t forget:</strong></p>
<p><pre class="brush: plain; gutter: false;">
chmod a+x smplReducer.py
</pre></p>
<p><strong>And then run:</strong></p>
<p><pre class="brush: plain; gutter: false;">
cat customers.dat countries.dat|./smplMapper.py|sort|./smplReducer.py
</pre></p>
<p>And voila!</p>
<p><pre class="brush: plain; gutter: false;">
Canada	not so good	1
Canada	valued	3
JA - Unkown Country	not so bad	1
United Kingdom	not so good	1
United Kingdom	valued	1
United States	not bad	2
</pre></p>
<p>So now #3 and #4 are done but what about #2?  </p>
<p><strong>First put the files into Hadoop:</strong></p>
<p><pre class="brush: plain; gutter: false;">
hadoop fs -put ~/mayo/customers.dat .
hadoop fs -put ~/mayo/countries.dat .
</pre></p>
<p><strong>And now run it like this (assuming you are running as hadoop in the bin directory):</strong></p>
<p><pre class="brush: plain; gutter: false;">
hadoop jar ../contrib/streaming/hadoop-0.20.1+169.89-streaming.jar -D mapred.reduce.tasks=4 -file ~/mayo/smplMapper.py -mapper smplMapper.py -file ~/mayo/smplReducer.py -reducer smplReducer.py -input customers.dat -input countries.dat -output mayo -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner -jobconf stream.map.output.field.separator=^ -jobconf stream.num.map.output.key.fields=4 -jobconf map.output.key.field.separator=^ -jobconf num.key.fields.for.partition=1
</pre></p>
<p><strong>Let us look at what we did:</strong></p>
<p><pre class="brush: plain; gutter: false;">
hadoop fs -cat mayo/part*
</pre></p>
<p><strong>Which results in: </strong></p>
<p><pre class="brush: plain; gutter: false;">
Canada	not so good	1
Canada	valued	3
United Kingdom	not so good	1
United Kingdom	valued	1
United States	not bad	2
JA - Unkown Country	not so bad	1
</pre></p>
<p>So #2 is the <strong>partioner</strong> KeyFieldBasedPartitioner explained here further <a target="_blank" href="http://hadoop.apache.org/common/docs/r0.20.1/streaming.html#A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29">Hadoop Wiki On Streaming</a> which allows the <em>key</em> to be whatever set of columns you output (in our case by country) configurable by the command line options and the rest of the <em>values</em> are sorted within that <em>key</em> and sent to the reducer together by <em>key</em>.</p>
<p>And there you go &#8230; Simple Python Scripting Implementing Streaming in Hadoop.   </p>
<p>Grab the tar <a href="http://www.gencolee.com/smpl.py.stream.tgz">here</a> and give it a spin.</p>
<p>/*<br />
Joe Stein<br />
Twitter: <a target="_blank" href="http://www.twitter.com/allthingshadoop">@allthingshadoop</a><br />
Connect: <a target="_blank" href="http://www.linkedin.com/in/charmalloc">On Linked In</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/355/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=355&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/12/16/simple-hadoop-streaming-tutorial-using-joins-and-keys-with-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>NoSQL HBase and Hadoop with Todd Lipcon from Cloudera</title>
		<link>http://allthingshadoop.com/2010/09/06/nosql-hbase-hadoop-todd-lipcon-cloudera/</link>
		<comments>http://allthingshadoop.com/2010/09/06/nosql-hbase-hadoop-todd-lipcon-cloudera/#comments</comments>
		<pubDate>Tue, 07 Sep 2010 02:46:47 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=330</guid>
		<description><![CDATA[Episode #6 of the Podcast is a talk with Todd Lipcon from Cloudera discussing HBase. We talked about NoSQL and how it should stand for &#8220;Not Only SQL&#8221; and the tight integration between Hadoop and HBase and how systems like Cassandra (which is eventually consistent and not strongly consistent like HBase) is complementary as these [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=330&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://feeds.feedburner.com/allthingshadoop/kjGc" target="_blank">Episode #6</a> of the <a href="http://allthingshadoop/podcast" target="_self">Podcast</a> is a talk with <a href="http://twitter.com/tlipcon" target="_blank">Todd Lipcon </a>from <a href="http://cloudera.com" target="_blank">Cloudera</a> discussing HBase.</p>
<p>We talked about NoSQL and how it should stand for &#8220;Not Only SQL&#8221; and the tight integration between Hadoop and HBase and how systems like Cassandra (which is eventually consistent and not strongly consistent like HBase) is complementary as these systems have applicability within big data eco system depending on your use cases.</p>
<p>With the strong consistency of HBase you get features like incrementing counters and the tight integration with Hadoop means faster loads with HDFS thanks to a new feature in the 0.89 development preview release in the doc folders called &#8220;bulk loads&#8221;.</p>
<p>We covered a lot more unique features, talked about more of what is coming in upcoming releases as well as some tips with HBase so <a href="http://feeds.feedburner.com/allthingshadoop/kjGc" target="_blank">subscribe to the podcast</a> and listen to all of what Todd had to say.</p>
<div class="tweetmeme-button" id="tweetmeme-button-post-330" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F09%2F06%2Fnosql-hbase-hadoop-todd-lipcon-cloudera%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-5k%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F09%2F06%2Fnosql-hbase-hadoop-todd-lipcon-cloudera%2F" height="61" width="51" /></a>
</div>
<p>/*<br />
Joe Stein<br />
<a href="http://www.medialets.com" target="_blank">http://www.medialets.com</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/330/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/330/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/330/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=330&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/09/06/nosql-hbase-hadoop-todd-lipcon-cloudera/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>Pre-Release from Pentaho &#8211; HIVE JDBC Adapter</title>
		<link>http://allthingshadoop.com/2010/08/15/pre-release-from-pentaho-hive-jdbc-adapter/</link>
		<comments>http://allthingshadoop.com/2010/08/15/pre-release-from-pentaho-hive-jdbc-adapter/#comments</comments>
		<pubDate>Sun, 15 Aug 2010 21:34:50 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hive]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=323</guid>
		<description><![CDATA[Pentaho&#8217;s Jordan Ganoff, Software Engineer, has open sourced some HIVE JDBC Adapters in what they are doing for their BI server http://forums.pentaho.com/showthread.php?77826-Hive-amp-Hadoop Not sure what state they are in but will try to check it on this week. To use from maven: &#60;dependency&#62; &#60;groupId&#62;org.apache.hadoop.hive&#60;/groupId&#62; &#60;artifactId&#62;hive-jdbc&#60;/artifactId&#62; &#60;version&#62;0.5.0-pentaho-SNAPSHOT&#60;/version&#62; &#60;/dependency&#62; You must also add the repository information to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=323&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Pentaho&#8217;s Jordan Ganoff, Software Engineer, has open sourced some HIVE JDBC Adapters in what they are doing for their BI server</p>
<p><a href="http://forums.pentaho.com/showthread.php?77826-Hive-amp-Hadoop" target="_blank">http://forums.pentaho.com/showthread.php?77826-Hive-amp-Hadoop</a></p>
<p>Not sure what state they are in but will try to check it on this week.</p>
<p><strong>To use from maven:</strong><br />
&lt;dependency&gt;<br />
&lt;groupId&gt;org.apache.hadoop.hive&lt;/groupId&gt;<br />
&lt;artifactId&gt;hive-jdbc&lt;/artifactId&gt;<br />
&lt;version&gt;0.5.0-pentaho-SNAPSHOT&lt;/version&gt;<br />
&lt;/dependency&gt;</p>
<p>You must also add the repository information to either the pom.xml or<br />
your local settings:<br />
&lt;repository&gt;<br />
&lt;id&gt;pentaho&lt;/id&gt;<br />
&lt;name&gt;Pentaho External Repository&lt;/name&gt;<br />
&lt;url&gt;http://repo.pentaho.org/artifactory/repo&lt;/url&gt;<br />
&lt;/repository&gt;</p>
<div class="tweetmeme-button" id="tweetmeme-button-post-323" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F08%2F15%2Fpre-release-from-pentaho-hive-jdbc-adapter%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-5d%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F08%2F15%2Fpre-release-from-pentaho-hive-jdbc-adapter%2F" height="61" width="51" /></a>
</div>
<p>/*</p>
<p>Joe Stein<br />
<a href="http://medialets.com" target="_blank">http://medialets.com</a></p>
<p>*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/323/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=323&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/08/15/pre-release-from-pentaho-hive-jdbc-adapter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop Development Tools By Karmasphere</title>
		<link>http://allthingshadoop.com/2010/06/29/hadoop-development-tools-by-karmasphere/</link>
		<comments>http://allthingshadoop.com/2010/06/29/hadoop-development-tools-by-karmasphere/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 10:07:45 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=297</guid>
		<description><![CDATA[In Episode #5 of the Hadoop Podcast http://allthingshadoop.com/podcast/ I speak with Shevek, the CTO of Karmasphere http://karmasphere.com/.  To subscribe to the Podcast click here. We talk a bit about their existing Community Edition (support Netbeans &#38; Eclipse) For developing, debugging and deploying Hadoop Jobs Desktop MapReduce Prototyping GUI to manipulate clusters, file systems and jobs [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=297&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In Episode #5 of the Hadoop Podcast <a href="http://allthingshadoop.com/podcast/" target="_blank">http://allthingshadoop.com/podcast/</a> I speak with Shevek, the CTO of Karmasphere <a href="http://karmasphere.com/" target="_blank">http://karmasphere.com/</a>.  To subscribe to the Podcast <a href="http://feeds.feedburner.com/allthingshadoop/kjGc" target="_blank">click here</a>.</p>
<p>We talk a bit about their existing Community Edition (support Netbeans &amp; Eclipse)</p>
<ul>
<li>For developing, debugging and deploying Hadoop Jobs</li>
<li>Desktop MapReduce Prototyping</li>
<li>GUI to manipulate clusters, file systems and jobs</li>
<li>Easy deployment to any Hadoop version, any distribution in any cloud</li>
<li>Works through firewalls</li>
</ul>
<p>As well as the new products they have launched:</p>
<h2><strong>Karmasphere Client:</strong></h2>
<p>The <a href="http://karmasphere.com/Products-Information/karmasphere-client.html" target="_blank">Karmasphere Client</a> is a cross platform library for ensuring MapReduce jobs can work from any desktop environment to any Hadoop cluster in any enterprise data network. By isolating the Big Data professional and version of Hadoop, Karmasphere Client simplifies the process of switching between data centers and the cloud and enables Hadoop jobs to be independent of the version of the underlying cluster.</p>
<p>Unlike the standard Hadoop client , Karmasphere Client works from Microsoft Windows as well as Linux and MacOS, and works through SSH-based firewalls. Karmasphere Client provides a cloud-independent environment that makes it easy and predictable to maintain a business operation reliant on Hadoop.</p>
<p><a href="http://charmalloc.files.wordpress.com/2010/06/application-framework-3.gif"><img class="aligncenter size-full wp-image-299" title="Application-Framework" src="http://charmalloc.files.wordpress.com/2010/06/application-framework-3.gif?w=600&#038;h=506" alt="" width="600" height="506" /></a></p>
<ul>
<li>Ensures Hadoop distribution and version independence</li>
<li>Works from Windows (unlike Hadoop Client)</li>
<li>Supports any cloud environment: public, private or public cloud service.</li>
<li>Provides:
<ul>
<li>Job portability</li>
<li>Operating system portability</li>
<li>Firewall hopping</li>
<li>Fault tolerant API</li>
<li>Synchronous and Asynchronous API</li>
<li>Clean Object Oriented Design</li>
</ul>
</li>
<li>Making it easy and predictable to maintain a business operation reliant on Hadoop</li>
</ul>
<h2>Karmasphere Studio Professional Edition</h2>
<p><a href="http://karmasphere.com/Products-Information/karmasphere-studio-professional-edition.html" target="_blank">Karmasphere Studio Professional Edition</a> includes all the functionality  of the Community Edition, plus a range of deeper functionality required  to simplify the developer&#8217;s task of making a MapReduce job robust,  efficient and production-ready.</p>
<p>For a MapReduce job to be robust, its functioning on the cluster has  to be well understood in terms of time, processing, and storage  requirements, as well as in terms of its behavior when implemented  within well-defined &#8220;bounds.&#8221; Karmasphere Studio Professional Edition  incorporates the tools and a predefined set of rules that make it easy  for the developer to understand how his or her job is performing on the  cluster and where there is room for improvement.</p>
<ul>
<li>Enhanced cluster visualization and debugging
<ul>
<li>Execution diagnostics</li>
<li>Job performance timelines</li>
<li>Job charting</li>
<li>Job profiling</li>
</ul>
</li>
<li>Job Export
<ul>
<li>For easy production deployment</li>
</ul>
</li>
<li>Support</li>
</ul>
<h2>Karmasphere Studio Analyst Edition</h2>
<ul>
<li>SQL interface for ad hoc analysis</li>
<li>Karmasphere Application Framework + Hive + GUI =
<ul>
<li>No cluster changes</li>
<li>Works over proxies and firewalls</li>
<li>Integrated Hadoop monitoring Interactive syntax checking</li>
<li>Detailed diagnostics</li>
<li>Enhanced schema browser</li>
<li>Full JDBC4 compliance</li>
<li>Multi-threaded &amp; concurrent</li>
</ul>
</li>
</ul>
<div class="tweetmeme-button" id="tweetmeme-button-post-297" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F06%2F29%2Fhadoop-development-tools-by-karmasphere%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-4N%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F06%2F29%2Fhadoop-development-tools-by-karmasphere%2F" height="61" width="51" /></a>
</div>
<p>/*<br />
Joe Stein<br />
<a href="http://www.linkedin.com/in/charmalloc" target="_blank">http://www.linkedin.com/in/charmalloc<br />
</a>*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/297/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/297/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/297/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=297&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/06/29/hadoop-development-tools-by-karmasphere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>

		<media:content url="http://charmalloc.files.wordpress.com/2010/06/application-framework-3.gif" medium="image">
			<media:title type="html">Application-Framework</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop and Pig with Alan Gates from Yahoo</title>
		<link>http://allthingshadoop.com/2010/05/31/hadoop-pig-yahoo-alan-gates/</link>
		<comments>http://allthingshadoop.com/2010/05/31/hadoop-pig-yahoo-alan-gates/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 03:54:39 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Open Source Projects]]></category>
		<category><![CDATA[Pig]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=284</guid>
		<description><![CDATA[Episode 4 of our Podcast is with Alan Gates, Senior Software Engineer @ Yahoo! and Pig committer. Click here to listen. Hadoop is a really important part of Yahoo&#8217;s infrastructure because processing and analyzing big data is increasingly important for their business. Hadoop allows Yahoo to connect their consumer products with their advertisers and users [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=284&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Episode 4 of our <a href="http://allthingshadoop/podcast" target="_self">Podcast</a> is with Alan Gates, Senior Software Engineer @ Yahoo! and Pig committer.  <a href="http://feeds.feedburner.com/allthingshadoop/kjGc" target="_blank">Click here </a>to listen. </p>
<p>Hadoop is a really important part of Yahoo&#8217;s infrastructure because processing and analyzing big data is increasingly important for their business.  Hadoop allows Yahoo to connect their consumer products with their advertisers and users for a better user experience.  They have been involved with Hadoop for many years now and have their own distribution.  Yahoo also sponsors/hosts a user group meeting which has grown to hundreds of attendees every month.</p>
<p>We talked about what Pig is now, the future of Pig and other projects like Oozie <a href="http://github.com/tucu00/oozie1" target="_blank">http://github.com/tucu00/oozie1</a> which Yahoo uses (and is open source) for workflow of MapReduce &amp; Pig script automation.  We also talked about Zebra <a target="_blank" href="http://wiki.apache.org/pig/zebra">http://wiki.apache.org/pig/zebra</a>, Owl <a target="_blank" href="http://wiki.apache.org/pig/owl">http://wiki.apache.org/pig/owl</a>, and Elephant Bird <a target="_blank" href="http://github.com/kevinweil/elephant-bird">http://github.com/kevinweil/elephant-bird</a></p>
<div class="tweetmeme-button" id="tweetmeme-button-post-284" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F31%2Fhadoop-pig-yahoo-alan-gates%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-4A%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F31%2Fhadoop-pig-yahoo-alan-gates%2F" height="61" width="51" /></a>
</div>
<p>/*<br />
Joe Stein<br />
<a href="http://www.linkedin.com/in/charmalloc" target="_blank">http://www.linkedin.com/in/charmalloc</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/284/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/284/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/284/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/284/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/284/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/284/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/284/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=284&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/05/31/hadoop-pig-yahoo-alan-gates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>Ruby Streaming for Hadoop with Wukong a talk with Flip Kromer from Infochimps</title>
		<link>http://allthingshadoop.com/2010/05/20/ruby-streaming-wukong-hadoop-flip-kromer-infochimps/</link>
		<comments>http://allthingshadoop.com/2010/05/20/ruby-streaming-wukong-hadoop-flip-kromer-infochimps/#comments</comments>
		<pubDate>Thu, 20 May 2010 05:00:29 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Open Source Projects]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=266</guid>
		<description><![CDATA[Another great discussion on our Podcast.  Click here to listen.  For this episode our guest was Flip Kromer from Infochimps http://www.infochimps.org.  Infochimps.org’s mission is to increase the world’s access to structured data.  They have been working since the start of 2008 to build the world’s most interesting data commons, and since the start of 2009 [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=266&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Another great discussion on our <a href="http://allthingshadoop.com/podcast/" target="_self">Podcast</a>.  <a href="http://feeds.feedburner.com/allthingshadoop/kjGc" target="_blank">Click here </a>to listen.  For this episode our guest was Flip Kromer from Infochimps <a href="http://www.infochimps.org" target="_blank">http://www.infochimps.org</a>.  Infochimps.org’s mission is to increase the world’s access to  structured data.  They have been working since the start of 2008 to build the world’s most  interesting data commons, and since the start of 2009 to build the  world’s first data marketplace. Our founding team consists of two  physicists (Flip Kromer and Dhruv Bansal) and one entrepreneur (Joseph  Kelly).</p>
<p>We talked about Ruby streaming with Hadoop and why to use the open source project <a href="http://mrflip.github.com/wukong/" target="_blank">Wukong</a> to simplify implementation of Hadoop using Ruby.  There are some great examples <a href="http://github.com/infochimps/wukong/tree/master/examples" target="_blank">http://github.com/infochimps/wukong/tree/master/examples</a> that are just awesome like the web log analysis that creates the paths (chain of pages) that users go through during their visited session.</p>
<p>It was interesting to learn some of the new implementations and projects that he has going on like using Cassandra to help with storing unique values for social network analysis.  This new project is called Cluster Chef<a href="http://github.com/infochimps/cluster_chef" target="_blank"> http://github.com/infochimps/cluster_chef</a>.  ClusterChef will help you create a scalable, efficient compute  cluster in the cloud. It has recipes for Hadoop, Cassandra, NFS and more — use as many or as few as you like.</p>
<ul>
<li>A small 1-5 node cluster for development or just to play around  with Hadoop or Cassandra</li>
<li>A spot-priced, ebs-backed cluster for unattended computing at  rock-bottom prices</li>
<li>A large 30+ machine cluster with multiple EBS  volumes per node running Hadoop and Cassandra, with optional NFS for</li>
</ul>
<ul>
<li>With Chef, you declare a final state for each node, not a procedure  to follow. Adminstration is more efficient, robust and maintainable.</li>
<li>You get a nice central dashboard to manage clients</li>
<li>You can easily roll out configuration changes across all your  machines</li>
<li>Chef is actively developed and has well-written recipes for  webservers, databases, development tools, and a ton of different  software packages.</li>
<li>Poolparty makes creating amazon cloud machines concise and easy:  you can specify spot instances, ebs-backed volumes,  disable-api-termination, and more.</li>
</ul>
<ul>
<li>Hadoop</li>
<li>NFS</li>
<li>Persistent HDFS on EBS  volumes</li>
<li>Zookeeper (<em>in progress</em>)</li>
<li>Cassandra (<em>in progress</em>)</li>
</ul>
<p>Another couple of good links we got from Flip were Peter Norvig&#8217;s &#8220;Unreasonable Effectiveness of Data&#8221; thing I mentioned: <a href="http://bit.ly/effectofdata" target="_blank">http://bit.ly/effectofdata</a> / <a href="http://bit.ly/norvigtalk" target="_blank">bit.ly/norvigtalk</a></p>
<div class="tweetmeme-button" id="tweetmeme-button-post-266" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F20%2Fruby-streaming-wukong-hadoop-flip-kromer-infochimps%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-4i%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F20%2Fruby-streaming-wukong-hadoop-flip-kromer-infochimps%2F" height="61" width="51" /></a>
</div>
<p>/*<br />
Joe Stein<br />
<a href="http://www.linkedin.com/in/charmalloc" target="_blank">http://www.linkedin.com/in/charmalloc</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/266/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/266/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/266/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/266/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/266/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/266/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/266/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/266/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/266/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/266/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/266/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/266/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/266/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/266/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=266&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/05/20/ruby-streaming-wukong-hadoop-flip-kromer-infochimps/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop, BigData and Cassandra with Jonathan Ellis</title>
		<link>http://allthingshadoop.com/2010/05/17/hadoop-bigdata-cassandra-a-talk-with-jonathan-ellis/</link>
		<comments>http://allthingshadoop.com/2010/05/17/hadoop-bigdata-cassandra-a-talk-with-jonathan-ellis/#comments</comments>
		<pubDate>Tue, 18 May 2010 03:18:02 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Open Source Projects]]></category>
		<category><![CDATA[hadoop cassandra nosql]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=248</guid>
		<description><![CDATA[Today I spoke with Jonathan Ellis who is the Project Chair of the Apache Cassandra project http://cassandra.apache.org/ and co-founder of Riptano, the source for professional Cassandra support http://riptano.com.  It was a great discussion about Hadoop, BigData, Cassandra and Open Source. We talked about the recent Cassandra 0.6 NoSQL integration and support for Hadoop Map/Reduce against [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=248&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Today I spoke with Jonathan Ellis who is the Project Chair of the Apache Cassandra project<a href="http://cassandra.apache.org/" target="_blank"> http://cassandra.apache.org/</a> and co-founder of Riptano, the source for  professional Cassandra support <a href="http://riptano.com/" target="_blank">http://riptano.com</a>.  It was a great discussion about Hadoop, BigData, Cassandra and Open Source.</p>
<p>We talked about the recent Cassandra 0.6 NoSQL integration and support for Hadoop Map/Reduce against the data stored in Cassandra and some of what is coming up in the 0.7 release.</p>
<p>We touched on how Pig is currently supported and why the motivation for Hive integration may not have any support with Cassandra in the future.</p>
<p>We also got a bit into a discussion of HBase vs Cassandra and some of the benefits &amp; drawbacks as they live in your ecosystem (e.g. HBase is to OLAP as Cassandra is to OLTP).</p>
<p>This was the second <a href="http://allthingshadoop.com/podcast/" target="_self">Podcast</a> and you can <a href="http://feeds.feedburner.com/allthingshadoop/kjGc" target="_blank">click here</a> to listen.</p>
<div class="tweetmeme-button" id="tweetmeme-button-post-248" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F17%2Fhadoop-bigdata-cassandra-a-talk-with-jonathan-ellis%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-40%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F17%2Fhadoop-bigdata-cassandra-a-talk-with-jonathan-ellis%2F" height="61" width="51" /></a>
</div>
<p>/*<br />
Joe Stein<br />
<a href="http://www.linkedin.com/in/charmalloc/" target="_blank">http://www.linkedin.com/in/charmalloc/</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/248/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/248/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/248/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/248/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/248/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/248/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/248/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/248/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/248/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/248/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/248/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/248/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/248/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/248/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=248&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/05/17/hadoop-bigdata-cassandra-a-talk-with-jonathan-ellis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
		<item>
		<title>Making Hadoop and MapReduce easier with Karmasphere</title>
		<link>http://allthingshadoop.com/2010/05/15/making-hadoop-and-mapreduce-easier-karmasphere/</link>
		<comments>http://allthingshadoop.com/2010/05/15/making-hadoop-and-mapreduce-easier-karmasphere/#comments</comments>
		<pubDate>Sat, 15 May 2010 05:21:32 +0000</pubDate>
		<dc:creator>charmalloc</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://allthingshadoop.com/?p=236</guid>
		<description><![CDATA[For those folks either just getting started or even already in the the daily trenches of M/R development every day Karmasphere has come about to help developers and technical professionals make Hadoop MapReduce easier http://www.karmasphere.com/. Karmasphere Studio is a desktop IDE for graphically prototyping MapReduce jobs and deploying, monitoring and debugging them on Hadoop clusters [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=236&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For those folks either just getting started or even already in the the daily trenches of M/R development every day Karmasphere has come about to help developers and technical professionals make Hadoop MapReduce easier <a href="http://www.karmasphere.com/">http://www.karmasphere.com/</a>.  Karmasphere Studio is a desktop IDE for graphically prototyping MapReduce jobs and deploying, monitoring and debugging them on Hadoop clusters in private and public clouds.</p>
<p>* Runs on Linux, Apple Mac OS and Windows.<br />
* Works with all major distributions and versions of Hadoop including Apache, Yahoo! and Cloudera.<br />
* Works with Amazon Elastic MapReduce.<br />
* Supports local, networked, HDFS and Amazon S3 file systems.<br />
* Support for Cascading<br />
* Enables job submission from all major platforms including Windows.<br />
* Operates with clusters and file systems behind firewalls.</p>
<p><strong>So, what can you do with it?</strong></p>
<ul>
<li>Prototype on the desktop: Get going with MapReduce job development quickly. No need for a cluster since Hadoop emulation is included.</li>
<li>Deploy to a private or cloud-based cluster: Whether you’re using a cluster in your own network or a cloud, deploy your job/s easily.</li>
<li>Debug on the cluster: One of the most challenging areas in MapReduce programming is debugging your job on the cluster. Visual tools deliver real-time insight into your job, including support for viewing and charting Hadoop job and task counters.</li>
<li>Graphically visualize and manipulate: Whether it’s clusters, file systems, job configuration, counters, log files or other debugging information, save time and get better insight by accessing it all in one place.</li>
<li>Monitor and analyze your jobs in real-time: Get realtime, workflow view of inputs, outputs and intermediate results including map, partition, sort and reduce phases.</li>
</ul>
<p>Whether you’re new to Hadoop and want to easily explore MapReduce programming or you like the sound of something that helps you prototype, deploy and manage in an integrated environment or you’re already using Hadoop but could use a lot more insight into your jobs running on a cluster, there’s something here for you.</p>
<p>All you need is NetBeans (version 6.7 or 6.8) and Java 1.6 and you’ll be ready to give Karmasphere Studio a whirl.</p>
<p>You do NOT need any kind of Hadoop cluster set up to begin prototyping. But when you are ready to deploy your job on a large data set, you’ll need a virtual or real cluster in your data center or a public cloud such as Amazon Web Services.</p>
<p>An Eclipse version is in progress.</p>
<div class="tweetmeme-button" id="tweetmeme-button-post-236" style='float: right; margin-left: 10px; margin-bottom: 5px; padding: 4px 0 2px 4px; background: #fff;'>
<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F15%2Fmaking-hadoop-and-mapreduce-easier-karmasphere%2Ftweetmeme_alias%3Dhttp%3A%2F%2Fwp.me%2FpTu1i-3O%26tweetmeme_source%3Dwordpressdotcom"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fallthingshadoop.com%2F2010%2F05%2F15%2Fmaking-hadoop-and-mapreduce-easier-karmasphere%2F" height="61" width="51" /></a>
</div>
<p>/*<br />
Joe Stein<br />
<a href="http://www.linkedin.com/in/charmalloc" target="_blank">http://www.linkedin.com/in/charmalloc</a><br />
*/</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/charmalloc.wordpress.com/236/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/charmalloc.wordpress.com/236/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/charmalloc.wordpress.com/236/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/charmalloc.wordpress.com/236/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/charmalloc.wordpress.com/236/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/charmalloc.wordpress.com/236/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/charmalloc.wordpress.com/236/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/charmalloc.wordpress.com/236/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/charmalloc.wordpress.com/236/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/charmalloc.wordpress.com/236/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/charmalloc.wordpress.com/236/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/charmalloc.wordpress.com/236/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/charmalloc.wordpress.com/236/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/charmalloc.wordpress.com/236/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=allthingshadoop.com&amp;blog=13223440&amp;post=236&amp;subd=charmalloc&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://allthingshadoop.com/2010/05/15/making-hadoop-and-mapreduce-easier-karmasphere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/c5949edcf9e35a9aeb2584b6d4a58dcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">charmalloc</media:title>
		</media:content>
	</item>
	</channel>
</rss>
