Archive

Archive for the ‘ETL’ Category

Nifty Tool To Export Files From HDFS Into MySQL

April 19, 2010 5 comments

This is an interesting open source project I have recently heard about http://code.google.com/p/hiho/.

What is very interesting to me about this project is the export utility which takes data from HDFS and loads it into MySQL.

It also has a nice way for querying and importing data from a JDBC database directly into HDFS.  It looks much more robust than the out of the box DBInputFormat that Hadoop provides.  You can import the data as delimited records, with choice of delimiter. You can also import the data and save them as Avro records. It supports queries – you can say join two tables. It splits on user specified column ranges, instead of using LIMIT and OFFSET. It does no code generation or ORM mapping.

There are other ETL tools out there (e.g. Sqoop http://www.cloudera.com/developers/downloads/sqoop/).  In Cloudera’s Distrobution for Hadoop Version 3 (CDH3) Sqoop supports HDFS back into MySQL also now.

I am definitely going to have to give this utility a try. I here from their (HIHO) project folks that the next to-do is support for more databases for export.

[tweetmeme http://wp.me/pTu1i-q%5D

/*
Joe Stein
http://www.linkedin.com/in/charmalloc/
*/