Wednesday, September 03, 2014

Elasticsearch

The rise of BigData is now prompting engineering teams around the world to graple the last mile that comes along with huge data sets: how do you turn them to actionable insights? A picture is worth a thousand numbers, naturally, data visualization is the preferred medium.

If you start separating the tiers required for managing tons of data for visualization, the need for a fast searchable data store with analytical capabilities will emerge. This is where elasticsearch comes into the picture. Its a swiss army knife stack that can be used as a regular lucene backed data store, or call in the power of aggregations (facet replacement) and you have a decent analytical engine.

The question is where does Hadoop/Teradata maps into this in the overall scheme of Big Data. I'd say if your data set is only in the few GBs a clustered Elasticsearch is all that you need. If your data is running in terrabytes then its probably worth to have that in a Hadoop cluster and run your jobs there that either feeds in back to Elasticsearch or use Elasticsearch as one of the inputs to your favorite Hadoop script.

On one end, I can't believe this prompted me to write a three paragraph blog post!