Archive for the ‘Meetups’ Category

Hadoop isn’t dead but you might be doing it wrong!

March 18, 2016 Leave a comment

I haven’t blogged (or podcasted for that matter) in a while. There are lots of different reasons for that and I am always happy to chat and grab tea if folks are interested but after attending this year’s HIMSS conference I just couldn’t hold it in anymore.

I went to HIMSS so excited it was supposed to be the year of Big Data! Everything was about transformation and interoperability and OMGZ the excitement.

The first keynote Monday evening was OFF THE HOOK The rest of the time myself and two of my colleagues where at the expo. It is basically CES for Healthcare (if you don’t know what CES is then think DEFCON for Healthcare… or something). Its big.

But where was the Big Data?

Not really anywhere … There were 3 recognizable “big data companies” and one of them was in the booth as a partner for cloud services. It was weird. What happened?

One of the engineers from Cerner has a lightening talk at the Kafka Summit, go Cerner!!

Didn’t everyone get the memo? We need to help reduce costs of patient care!

Here are two ways to help reduce costs of patient care!

  1. (Paraphrasing Michael Dell from his keynote) Innovation funding for Healthcare IT will come from optimizing your data center resources.
  2. (This one is from me but inspired by Bruce Schneier) Through Open Source we can enable better systems by sharing in the R&D costs and also make them more secure.

Totally agree with #1, have seen it first hand people saving 82% of their data center bill. Not even using spot (or as they say “preemptive“) instances yet. Amazing!

As for #2, you have to realize that different people are good at different things. One person can write anything but sometimes 2 or 3 or 45 of them can write it better…. at least make sure the tests always keep passing and evolving properly, etc, etc, etc, stewardship, etc.

Besides all of that, the conference was great. There were a lot of companies and people I recognized and bumped into and it was great to catch up.

I was also really REALLY excited to see how far physician signatures and form signing has (finally) come in healthcare removing all that paper. Fax is almost dead but there are still a couple of companies kicking.

One last thing, the cyber security part of the expo was also disappointing. I know it was during the RSA Conference but Healthcare needs good solutions too. For that there were a good set of solutions not bad in some cases legit and known (thanks for showing up!) but the “pavilion” was downstairs in the back left corner. Maybe if HIMSS coincided with Strata it would have been different, hard to say.

There was one tweet about it (at least) not sure if there were more.

So, Big Data, Healthcare, Security, OH MY! I am in!

I will be talking more about problems and solutions with using the Open Source Interoperable XML based FHIR standard in Healthcare removing the need to integrate and make interoperable HL7 systems in New York City on 03/29/2016 and getting into realtime stream processing on Mesos.

I will also be conducting a training on SMACK Stack 1.0 (Streaming Mesos Analytics Cassandra Kafka) using telephone systems and API to start stream events and interactions with different systems because of them. Yes, I bought phones and yes you get to keep yours.

What has attracted me (for almost 2 years now) to running on Mesos Hadoop systems and eco-system components is the ease it brings for the developers, systems engineers, data scientists, analysts and the users of the software systems that run (as a service often) those components. There are lots of things to research and read in those cases I would

1) scour my blog

2) read this

3) and this

4) your own thing

Hadoop! Mesos!


~ Joestein

p.s. if you have something good to say about Hadoop and want to talk about it and it is gripping and good and gets back to the history and continued efforts. Let me know. Thanks!




Hadoop NYC Meetup With Yale University and Datameer

April 22, 2010 Leave a comment

The NYC Hadoop meetup on April 21st was great.  Many thanks as always to the Cloudera folks and The Winter Wyman Companies for the pizza.  Also thanks to Hiveat55 for use of their office and the Datameer folks for a good time afterwards.

The first part of the meetup was presentation by Azza Abouzeid and Kamil Bajda-Pawlikowski (Yale University) on Hadoop DB (  Their vision is to take the best of both worlds from the Map Reduce bliss for lots of data that we get from Hadoop as well as the DBMS complex data analysis capabilities.

The basic idea behind HadoopDB is to give Hadoop access to multiple single-node DBMS servers (eg. PostgreSQL or MySQL) deployed across the cluster. HadoopDB pushes as much as possible data processing into the database engine by issuing SQL queries (usually most of the Map/Combine phase logic is expressible in SQL). This in turn results in creating a system that resembles a shared-nothing parallel database. Applying techniques taken from the database world leads to a performance boost, especially in more complex data analysis. At the same time, the fact that HadoopDB relies on MapReduce framework ensures scores on scalability and fault/heterogeneity tolerance similar to Hadoop.

They have a spent a lot of time thinking through, finding and resolving the tradeoffs that occur and continue to make progress on this end.  They have had 2,200 downloads as of this posting and are actively looking for developers to contribute to their project.   I think it is great to see a University involved at this level for Open Source in general and more specifically doing work related to Hadoop.  The audience was very engaging and it made for a very lively discussion.  Their paper tells all the gory details

The rest of the meetup was off the hook.  Stefan Groschupf got off to a quick start throwing down some pretty serious street cred as a long-standing commit-er for Nutch, Hadoop, Katta, Bixo and more.    He was very engaging with a good sort of anecdotes for the question that drives the Hadoop community “What do you want to-do with your data?”.  It is always processing it or querying it and there is not one golden bullet solution.  We were then demoed Datameer’s product (which is one of the best user interface concept solutions I have seen).

In short the Datameer Analytic Solution (DAS) is a spreadsheet user interface allowing users to take a sample of data and (with 15 existing data connections and over 120 functions) like any good spreadsheet pull the data into an aggregated format.  Their product then turns that format pushing it down into Hadoop (like through Hive) which then goes into a map/reduce job in Hadoop.

So end to end you can have worthy analytic folks (spreadsheet types) do their job against limitless data.  wicked.

From their website

With DAS, business users no longer have to rely on intuition or a “best guess” based on what’s happened in the past to make business decisions. DAS makes data assets available as they are needed regardless of format or location so that users have the facts they need to reach the best possible conclusions.

DAS includes a familiar interactive spreadsheet that is easy to use, but also powerful so that business users don’t need to turn to developers for analytics. The spreadsheet is specifically designed for visualization of big data and includes more than 120 built-in functions for exploring and discovering complex relationships. In addition, because DAS is extensible, business analysts can use functions from third-party tools or they can write their own commands.

Drag & drop reporting allow users to quickly create their own personalized dashboard    Users simply select the information they want to view and how to display it on the dashboard – tables, charts, or graphs.

The portfolio of analytical and reporting tools in organizations can be broad. Business users can easily share data in DAS with these tools to either extend their analysis or to give other users access.

After the quick demo Stefan walked us through a solution for using Hadoop to pull the “signal from the noise” in social data and used twitter as an example. He used a really interesting graph exploration tool (going to give it a try myself)  Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.  He then talked a bit about X-RIME which is Hadoop based large-scale social network analysis (Open Source).


Joe Stein

Categories: Meetups, Open Source Projects Tags: