Friday, 6 June 2014

Cool Solution# Hadoop File Services with EMC Isilon

I'm Sold!

First let me say, by my own admittance, I am an infrastructure guy! That said,  I have been lucky enough recently to be given the chance to dive into the world of Big Data and PaaS. As a techie the extensive technology options in this area are very impressive and as a consequence, I have had a lot of fun combined with late nights and heavy learning.

I have always worked on the KISS principle 'Keep It Simple Stupid!' and Hadoop first into that with the exception of its file services HDFS. By having that layer of abstraction to the file system the ability to manage and populate the file system is non existent without specific tools written for the task.

I know DAS and scale-out through the data nodes is great and builds a big pond for your big data by combining the compute and disk resources of a 1000+ server nodes . Yay all good! But putting my EMC and Infrastructure hat on for a moment what about the following:

  1. How do I backup my HDFS based data
  2. How do you use those other cool storage capabilities such as snapshots, auto-tierring, etc
  3. How do I get real-time analysis on data without having to move it into HDFS
  4. How could you share the data from within HDFS
  5. What if you need more compute resources in your cluster but not storage, or the other way around. 
This is where EMCs Isilon comes to the rescue through its OneFS. The flexible open access into its large single name space is invaluable. All the joys you expect from a true scale-out, enterprise class storage array are there as well as the capability of accessing the same file system in multiple ways at the same time. Think about a web site logging to an NFS mount that is part of the HDFS file system allowing for realtime analytics against it! 

Want to see this in action, look at this demo
Further information can be found at

No comments:

Post a Comment