Archive
Using Apache Drill for Large Scale, Interactive, Real-Time Analytic Queries
Episode #17 of the podcast is a talk with Jacques Nadeau available also on iTunes
Apache Drill http://incubator.apache.org/drill/, a modern interactive query engine that runs on top of Hadoop.
Jacques talked about how Apache Drill is a modern query engine that is meant to be a query layer on top of all big data open source systems. Apache Drill is being designed to make the storage engine as plug-able so it could be the interface for any big data storage engine. The first release came out recently to allow developers to understand the data pipeline.
Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization.
Perform interactive analysis on all of your data, including nested and schema-less. Drill supports querying against many different schema-less data sources including HBase, Cassandra and MongoDB. Naturally flat records are included as a special case of nested data.
Strongly defined tiers and APIs for straightforward integration with a wide array of technologies.
Subscribe to the podcast and listen to what Jacques had to say. Available also on iTunes
/*********************************
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
**********************************/