Friday, 6 June 2014

Cool Solution# Hadoop File Services with EMC Isilon

I'm Sold!

First let me say, by my own admittance, I am an infrastructure guy! That said,  I have been lucky enough recently to be given the chance to dive into the world of Big Data and PaaS. As a techie the extensive technology options in this area are very impressive and as a consequence, I have had a lot of fun combined with late nights and heavy learning.

I have always worked on the KISS principle 'Keep It Simple Stupid!' and Hadoop first into that with the exception of its file services HDFS. By having that layer of abstraction to the file system the ability to manage and populate the file system is non existent without specific tools written for the task.

I know DAS and scale-out through the data nodes is great and builds a big pond for your big data by combining the compute and disk resources of a 1000+ server nodes . Yay all good! But putting my EMC and Infrastructure hat on for a moment what about the following:

  1. How do I backup my HDFS based data
  2. How do you use those other cool storage capabilities such as snapshots, auto-tierring, etc
  3. How do I get real-time analysis on data without having to move it into HDFS
  4. How could you share the data from within HDFS
  5. What if you need more compute resources in your cluster but not storage, or the other way around. 
This is where EMCs Isilon comes to the rescue through its OneFS. The flexible open access into its large single name space is invaluable. All the joys you expect from a true scale-out, enterprise class storage array are there as well as the capability of accessing the same file system in multiple ways at the same time. Think about a web site logging to an NFS mount that is part of the HDFS file system allowing for realtime analytics against it! 

Want to see this in action, look at this demo
Further information can be found at

EMC ESI with Microsoft Applications

EMCs ESI, it's FREE!!!!

A few weeks back I recorded a series of short videos that showcase the ease of which you can extend your applications direct from EMCs FREE Storage Integration Suite (ESI). These are short sub 90 second videos that each focus on only one particular feature provided through the ESI console.  As part of our application stack value prop this is a great example of what EMC can provide for a customer and of course, it is FREE!

ESI is like EMCs glue into the Microsoft ecosystem and includes:

  • Simple user console for storage, replication and application management
  • Support for Windows, HyperV, vSphere and XenServer
  • Powershell library with almost 200 cmdlets
  • System Center Orchestrator Integration Pack
  • System Center Operations Manager Management Pack
  • Hyper-V VSS Provider and associated PowerShell library
  • Support for Exchange (up to 2013) including Native and Third party replication (enabled by RecoverPoint)
  • Support for Sharepoint and SQL Server (SQL AlwaysOn and FC coming in a few weeks)
Storage Provisioning with ESI

Microsoft Exchange Discovery with ESI

Microsoft Exchange Database Provisioning with ESI

Microsoft Exchange Database Replication with ESI

Microsoft SharePoint Content Provisioning with ESI

Tuesday, 3 June 2014

It's Not Just About Backup!

Why Have Different Levels of Protection?

I like the saying that comes when people refer to an old classic car 'they don't make them like that anymore'. The thing is, the statement should be followed with the statement 'thank goodness'. Can you imagine putting up with that level of reliability, quality, safety, comfort level with a car you purchased from a dealer today that you would have got in years gone by?

Same goes with technology. I still wake-up some nights and shudder when remembering the old tape backup routine. This use to go in two ways, the first was the nightly scheduled tape shuffle which thankfully I was not tasked with. The other was the pre-rollout backup we would run before pushing any new service out. This use to comprise us kicking the job off, then trotting down the road to a golf driving range to knock a few buckets of balls into no mans land to kill the time (usually hours).

So looking at that, we had two different use cases but only one method at our disposal, the inglorious, forever unreliable tape backup. Fast forward 15 years (maybe more but I keep that quiet) and we have many different forms of protection at our disposal including storage replication, snapshotting, tape backup (but hopefully VTL, not those horrible tapes).

All these options serve a purpose and work together nicely. If we were to classify the different use cases into buckets they could be:

Business Continuity (DR) = Remote Replication, constant protection driven by RPO and RTO requirements

Operational Protection = Local Snapshotting (or continuous protection as is provided with RecoverPoint) typically done by set interval (storage level) or self service (VM level)

Long Term Retention = Traditional Backup style typically run once a day

Each of these has there purpose and specific business requirements that drive their implementation. Interestingly I see a lot of the BCM and Long Term Retention being positioned but little of the Operational Protection being catered for. Back to my scenario at the start, if I could of had snapshotting at my disposal (or if I was real lucky, EMC RecoverPoint Continuous Data Protection) my golf practice (aka time wasting) would not have happened as we could have snapshotted the services that we were updating and started the rollout straight away.

Self service snapshotting is easily available these days thanks to two things:

  1. Virtualisation with the VM level snapshotting (Checkpointing in a HyperV world)
  2. Automation tools for storage (EMCs Storage Integration Suite is a good example)

So that is all cool for those known events but what about those unknown such as data corruption. That is what scheduled snapshots protect you against, providing a much more granular way to protect your systems beyond the typical 24 hour cycle of Long Term Retention. You may only retain a short set of snapshots such as 1-7 days but they can provide good peace of mind for those services deemed worthy.

I do realise that replication can also provide a way of rolling back but typically it is 'to the last change' or committed IO operation, so a corruption could easily by on the remote site as well as the source. Also replication would require that traffic to come back down the wire which adds time to the recovery / rollback process.

Another benefit of Operational Protection is that it can provide an easy / quick way for copies of  datasets such as those within a database to be presented to an alternate location such as a from production to a test/dev instance.

Anyway, I got that of my chest so I feel better. Operation Protection = Good!

Just as a last note on this, I did not include High Availability (HA) in this as I am more looking at where the old functions we used tape for have evolved. There is some real cool stuff that can be done with stretched high availability that spans physical locations as is supported with vSphere Metro Storage Clusters and Microsofts Failover Clustering with products such as EMCs VPLEX, but that is a big enough topic on its own.

Thin Provisioning with VAAI and VASA in vSphere Working Together

That Damn Pesky Thin Provisioned Threshold Alarm in vCenter

Have you ever had those alarms go off in vCenter telling you that your thin LUN has exceeded its consumed space threshold? This is an interesting repercussion of VASA (storage awareness reporting)   feeding its view of the thin provisioned LUN back to vCenter and then it reacting to it.

So first let's have a look at the issue. Thin provisioning in a block storage world tends to start small and then continue to grow until the configured upper limit is reached. Makes sense and it is what you would expect, as data matures the efficiencies are reduced with Thin Provisioning. In the wonderful world of virtualisation where VMs tend to be fairly fluid due to provisioning, deletion, moving activities a LUN can be full one moment and half empty the next. Add vSphere Storage Clusters and SDRS and it does not even need manual intervention.

Anyway back to the problem at hand, the thin datastore all of a sudden looks full so you shuffle stuff around and clean up space. All good at the datastore level but you are still getting alarms off the datastore, specifically the 'Thin-provisioned volume capacity threshold exceeded' alarm. The reason behind this is that once a block has been written to the array does not know how it is being utilised so reports it as consumed.

This is what VAAI Unmap is all about, giving the host a way of clearing a previously written to block in a way that the array can act on. The drawback on this is that there VAAI unmap is still not an automated process so needs to be executed through the CLI (esxcli storage vmfs unman <datastore>) or via PowerShell as a manual process.

Be aware though as this only returns space back to the array that has been released at the datastore level such as what would happen if you deleted a VM. There is also the in-guest space issue, a thin provisioned VMDK will also grow in much the same way that a datastore does. You can also return space back that has been deleted to within the guest through tools that need to be run within the guest such as Microsoft's SDELETE ( or VMware's own Guest-Reclaim tool (

By getting both your in-guest and ESXi host level space reclaim strategy in place you can ensure that VAAI + VASA = Happy House and you are getting the most efficiency out of your storage (at the space level anyway).

EMC and Microsoft Applications Working Together

EMC Storage Integration Suite

I recently recorded a number of demo videos showing off some of the cool capabilities of the EMCs ESI. These videos show simple storage provisioning activities but aligned to applications such as Microsoft Exchange and SharePoint.

ESI is a very cool tool that is available from EMC for FREE (yes you read that correctly) and provides a rich set of capabilities including:

  • Simple user console for storage, replication and application management
  • Support for Windows, HyperV, vSphere and XenServer
  • Powershell library with almost 200 cmdlets
  • System Center Orchestrator Integration Pack
  • System Center Operations Manager Management Pack
  • Hyper-V VSS Provider and associated PowerShell library
  • Support for Exchange (up to 2013) including Native and Third party replication (enabled by RecoverPoint)
  • Support for Sharepoint and SQL Server


Storage provisioning with EMCs Storage Integration Suite

EMC Storage Integration with Exchange using ESI

Exchange Database Provisioning with EMCs ESI

Exchange Database Replication Enabled with EMCs ESI

SharePoint Content Database Provisioning with EMCs ESI

Product Page: