Archive

Archive for the ‘Pig’ Category

Hadoop as a Service cloud platform with the Mortar Framework and Pig

August 9, 2013 Leave a comment

Episode #11 of the podcast is a talk with K Young.  Available also on iTunes

Mortar is the fastest and easiest way to work with Pig and Python on Hadoop in the Cloud.

Mortar’s platform is for everything from joining and cleansing large data sets to machine learning and building recommender systems.

Mortar makes it easy for developers and data scientists to do powerful work with Hadoop. The main advantages of Mortar are:

  • Zero Setup Time: Mortar takes only minutes to set up (or no time at all on the web), and you can start running Pig jobs immediately. No need for painful installation or configuration.
  • Powerful Tooling: Mortar provides a rich suite of tools to aid in Pig development, including the ability to Illustrate a script before running it, and an extremely fast and free local development mode.
  • Elastic Clusters: We spin up Hadoop clusters as you need them, so you don’t have to predict your needs in advance, and you don’t pay for machines you don’t use.
  • Solid Support: Whether the issue is in your script or in Hadoop, we’ll help you figure out a solution.

We talked about the Open Source Mortar Framework and their new Open Source tool for visualizing data while writing Pig scripts called Watchtower

The Mortar Blog has a great video demo on Watchtower.

There are no two ways around it, Hadoop development iterations are slow. Traditional programmers have always had the benefit of re-compiling their app, running it, and seeing the results within seconds. They have near instant validation that what they’re building is actually working. When you’re working with Hadoop, dealing with gigabytes of data, your development iteration time is more like hours.

Subscribe to the podcast and listen to what K Young had to say.  Available also on iTunes

/*********************************
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
**********************************/

Categories: Hadoop, Pig, Podcast

Hadoop and Pig with Alan Gates from Yahoo

Episode 4 of our Podcast is with Alan Gates, Senior Software Engineer @ Yahoo! and Pig committer. Click here to listen.

Hadoop is a really important part of Yahoo’s infrastructure because processing and analyzing big data is increasingly important for their business. Hadoop allows Yahoo to connect their consumer products with their advertisers and users for a better user experience. They have been involved with Hadoop for many years now and have their own distribution. Yahoo also sponsors/hosts a user group meeting which has grown to hundreds of attendees every month.

We talked about what Pig is now, the future of Pig and other projects like Oozie http://github.com/tucu00/oozie1 which Yahoo uses (and is open source) for workflow of MapReduce & Pig script automation. We also talked about Zebra http://wiki.apache.org/pig/zebra, Owl http://wiki.apache.org/pig/owl, and Elephant Bird http://github.com/kevinweil/elephant-bird

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
*/

Follow

Get every new post delivered to your Inbox.

Join 44 other followers