Virtual Outlands: EMC

Showing posts with label EMC. Show all posts

Tuesday, 24 May 2016

VMware Photon Platform, Notes from the Field - Entry Two

Time to Build The Cluster Services

Is it a bird, plane or the Photon Platform? In the early days of Photon Platforms announcements, in my own mind, I had it pegged as a purpose built platform alternative to 'vSphere Integrated Containers (VIC)'. Keeping in mind that VICs sole purpose in life is to enable Containers in vSphere with a cool concept of a 1:1 alignment of VM to Container. This meant a container can have all the flexibility inherent in a container with all the control and security of a VM. Alright come on down Photon Platform just bigger, better more focused right? Well not really....

So one thing that strikes you with the Photon Platform is the flexibility of choices it provides. From a scheduling cluster manager's perspective you can utilise Swarm, Kubernetes or Mesos out of the box. On the other side, this is ESXi after all (marketing slides aside that keep alluding to the slimmed down hypervisor) so it will happily run Virtual Machines and Containers side by side (well containers within VMs anyway).

Being able to support both is important as not all workloads are created equal in this new age. Let's be honest, not everyone working in this new 'Cloud Native' world has grown a beard, rides a bicycle and wears skinny jeans, there is a variety of personalities that need to be catered for. What is common is we don't all need the assurance and crutches that come from the more traditional platforms. If I fall over it is ok, I will just get myself back up. In the traditional world I would have crutches, cushions, people watching me, ready to catch me if a stumble, etc. This is because in this traditional world I am seen as needed to be treated like a VIP (i.e. Very Important Process) and require all the care of advanced services to keep me going (and the costs associated with those advanced services). Sometimes a more traditional approach may be preferable to containerising (is that a word?) the workload so Virtual Machines live on even if they are seen as cattle (I'll let Duncan Epping define that for you).

This mixed capability is emphasised by the fact that all the management VMs are hosted out of the hypervisors that are flagged as Management hosts (you designate if a host is 'Management' so participates in the control plane or 'Cloud' which only provide a hosting platform capacity, or both). Anyway myself for one, I'm a big fan as by including the ability to host containers and VMs as this provides that flexible, cost and scale focused platform I want without having to look at alternatives such as Openstack. Add to this the fact that it also includes support for the 3 main container and workload cluster scheduling services out of the box and your covering a lot of bases for possible consumers of Photon Platform.

My personal choice is to go forward with Mesos at this stage as it gives me the open choices I want when paired with Marathon. I have played with all three though on Photon as all it takes is that you import the 3 images and enabled the 3 cluster types. I am not here to tell you how to do this or how to create a cluster as there are plenty of blogs that do that, I just want to give my view and some lessons I learnt on the way that may help someone out there.

It is important to note that when creating clusters, they align to there traditional model and not some Hybrid like in VIC. What I mean by that is that if you create a Docker Swarm cluster, the slave nodes will be running the containers in a shared multi-node format. The nodes are utilising PhotonOS as there Operating System but we don't have the 1:1 Container to VM alignment of VIC. This is important to consider as your sizing of those nodes should reflect your needs. There are default 'Flavors' (remember Flavors define the resource configuration profiles) aligned to the different clusters but they can be overwritten at the API, CLI level when creating a cluster.

This system is built as a multi-tenant service from the ground up so when you create workloads they need to be placed within a Project assigned to a Tenant. As you would expect, a Tenant gets an allotment of Resources (vCPUS, memory, Disk Capacity, VMs, etc) which are then subdivided into Resource Tickets (think Gold , Silver Bronze classes or whatever). A Project is then assigned a Resource Ticket which it then further carves out a resource reservation /limit out of. All the way down this logical structure you can add Access Controls via the integration with VMware Lightwave.

So importantly as a tenant, I can create my own cluster service to host workloads or just create VMs directly based on virtual appliances or virtual disks imported into the system. I can then control my VM workloads via the CLI and API into Photon Platforms Controller VM as you would of normally via vCenter, or use the cluster manager of choice to control the workloads within them.

As a final statement on the installation of the clusters what I can say is this. I have built all 3 flavours using simple sandbox methods as well as going through the full manual processes (more then once as I like punishment) and there is a highly compelling value proposition here. To be able to create these services as you require them on a production class platform so simply is huge. There will be challenges with the way it is done now such as how do they stay close to release parity (hence the compelling part of it being Open Source), but this is a new era platform that brings together the traditional on-premise controls to the new cloud native era's service requirements in the one stack. This is great stuff for a version 0.8 release!

Some Notes On Cluster Enablement:

Be Wary Of Resource Stinginess

My nature is to give less then more when allocating resources. This sent my on a dance to the requirements of the various clusters as I did not have enough resources aligned to my Project. Just be aware of the sizing needs (they vary of course depending on your own configuration requirements) and if you are resource constrained, use custom 'Flavors'. I prefer to bang my head on the wall so got very use to the 'Photon Cluster Create' operation being closely followed by the 'Photon Cluster Delete' operation to delete the failure and started again (side note, it didn't like me trying to recreate deleted clusters with the same name, didn't ping this down but got into the habit of using unique names each attempt) :)

Cluster Creation IP Address Requirements

Within the instructions it is emphasised that DHCP needs to be disabled / not present in the management network which is where the clusters are installed. Ironically, if DHCP is disabled the cluster deployment fails with a 'Unable to Obtain an IP Address' error. To progress forward I re-enabled DHCP in the management network (the one that the Photon Controller is installed into). The more accurate requirement description is to ensure the static IP addresses you provide (i.e. for the 'zookeeper' servers in Mesos and 'etcd' servers in Kubernetes and Swarm) are not also served within the active DHCP scopes.

Those Damn Certificates

Maybe you lucky that your company provides an open network and all traffic is created equal. That is not the case in mine, we trust no-one and inject ourselves into every certificate coming into the network. Any problems with this is mitigated by adding our own certificates into the trusted root but this is difficult for those environments that are created dynamically as these clusters are.

When you create a cluster PhotonOS based VMs are created corresponding to your configurations requirements (i.e. how many slaves, how many provide the interconnect service) and are then configured by templates contained on the Controller. Where the certificates being trusted becomes critical is the fact that the services run as containers within the VMs and as such the images need to be downloaded as part of the initial execution. The end result is the cluster creation operations fail with 'Time Exceeded' errors.

Anyway its an easy fix! I edited the template files contained in the controller directory '/usr/lib/esxcloud/deployer/scripts/clusters' with descriptive names such as 'kubernetes-master-user-data-template' to inject our certificates into the VMs. This at least made that problem go away and you of course could do this to apply any other customisations that may be required. It would be nice to see some option in the future to inject certificates into the process. I also provided this same feedback for 'vSphere Integrated Containers' as had the same problem there!

Friday, 25 March 2016

Unbox, Power On, Watch the RackHD Magic Begin!

What am I saying, you are probably asking? I have had great fun over the last week getting down and dirty with one of the headline EMC {code} projects, RackHD. This is a very cool solution for taking care of your low level activities for bare metal infrastructure. Think configuration management for all those physical pieces in a rack in the data center. This is a way to manage the firmware through to the personas of these devices, very good stuff and it is Open Source!

I guess first up I should give some context around what I am referring to with EMC {code}. At EMC we can be perceived to be a player in the more traditional areas of IT Infrastructure, EMC {code} is one example (of many) where this is a long way from the current day EMC that I work for. EMC Code is the landing place for developer enablement and open source projects for EMC.

This is your one stop shop to find any open source projects supported by EMC, community projects by EMC staff, partners and customers. It also includes training content and projects aligned to helping enable the developer community on wide spectrum of those new Mode 2, Platform 3 Cloud Native, SDDC technologies (you get the idea). I strongly encourage anyone to have a look and join in. There are some very smart people in this community and very accessible via tools such as Git and Slack.

Anyway back to RackHD, what is it really. So, we know that there is a number of configuration management solutions out there today such as Puppet, Ansible, Salt, Chef, etc but these tend to have a common trait, they look after the nodes/hosts/clients through remote agentless access (SSH, WBEM, etc) or via agents that are installed in the target device. What this of course requires is that the device is ready to accept remote requests or have agents installed. in other words they take control and configuration management of the platform once it is operational. A few such as Puppet with Razor have the ability to control the physical world but not as an all inclusive service with mulitple action workflow smarts.

Looking at RackHD you have a solution that provides:

Bare metal configuration management across the physical infrastructure stack. So not just with the compute but all the bits that go in a rack (hence the name) including:

The compute
The Network
The Storage
The enclosures that may contain the nodes
The Racks themselves and PDUs (remember, just plug the thing in)

A strong but intuitive Restful API
Aligns to the 'Infrastructure as Code' model. Allows node definition, associated workflows and SKUs to be fed in as JSON files via the API (or UI if thats your preference)
Fully self contained service providing all the mechanisms required to control a physical environment such as DHCP, PXE,TFTP, HTTP, etc
A scale-out architecture that can grow to your environment needs
Full support of dynamic discovery and physical control through interactions with hardware via physical interaction standards like IPMI, SNMP, BMC, DMI
Ongoing low level configuration tracking and management through Pollers
Provides that one stop shop for physical telemetry data and alerts

This stuff is cool and to watch something be discovered and then have a profile assigned and provisioning actions kicked off is very cool. It does not matter if is using Zerotouch for Network switches or building out a Docker Cluster via Kickstart scripts, Ansible modules and Docker-Machine (did I mention that there is a Docker-Machine driver), it is great to watch.

The process of discovery and workflow execution

The best way to learn about tools like this is to start playing with it and luckily the EMC {code} have mode that very easy for all of us with a fully functional Vagrant Demo setup that leverages VirtualBox off your laptop. I highly recommend anyone to give this a go as I definitely have had some fun with it. The guys have also written a docker-machine driver that can also be tested with RackHD within Vagrant, get it from GitHub now in under an hour you would have your first workload up and going!

If you want to see this in action, Kendrick Coleman did a great demo video on YouTube.

http://bit.ly/rackhd-docker

Wednesday, 17 February 2016

Welcome VxRail, VCE's Entry into the HCIA Market

You want the ease of a Hyperconverged appliance but still have all those advanced capabilities that service enterprise data centres, then the new VCE VxRail is for! This is not so much an evolution of the VMware EVO:Rail appliances but more a revolution.

Why do I say that? The Achilles heal of EVO:Rail was the lack of flexibility, all you could do was chose between two RAM configurations (no variant in disk or CPU), 128GB or 196GB which for some reason had the 196GB model referred to as the desktop service model... never quite got my head around that one, why wouldn't other services want a larger memory footprint? The other side was the fact that you could only grow your appliance service out by adding additional appliances, start with a 4 node 2U appliance and then grow in increments of 4 from there till you hit the 64 node ceiling (aligned to the maximum size for a HA / VSAN cluster).

Now think the VCE VxRail appliance here we have flexibility that can meet a huge range of use cases, your choices now include:

Starting point is now 3 nodes (today) and growth can then be in single node increments up to the 64 node maximum
CPU model options provide a range of 6 cores (single socket) through to 28 cores (dual socket) per node
Memory configurations from 64GB through to 512GB
Disk configurations of Hybrid (Flash + Spinning HDD) or all flash ranging from 3.6TB through to 19TB per node
Even networking options will support 1G connections on the low end nodes with the standard being dual 10G

You can also now mix different model configurations and EVO:Rail appliances under the same management. Phew... all that flexibility but with the ease of use that comes with the VxRail appliance.

Some other cool features of note are things such as:

All the coolness of VSAN 6.2 with Dedupe, compression, flash optimisation, erasure coding, snapshots, cross site HA (yes you can link two remote appliances into a vSphere Metro Cluster)
VM Level replication with EMC RecoverPoint/VM (yep included)
Cloud based storage tiering EMC Cloud Array (yep also included)
VCE Vision software for service health and interoperabilty compliance monitoring and reporting
vRealize LogInsight for lag aggregation and analytics
vSphere Data Protection (EMC Avamar)
Appliance Market Place, your VxRail App Store for easy request / enablement of additional services

And that is not the end of it, too much to list :)

If you want to have a look for yourself have a look at this demo of the VxRail Manager I recorded!

More information is available at http://www.vce.com/products/hyper-converged/vxrail

Thursday, 6 November 2014

Phew, vForum Day One Done!

Well just a quick one as thought it was time for a new entry and having just done day one at vForum at Luna Park it was a good chance for an update. I had the great oppurtunity to present in the EMC session today with Matt Zwolenski, our ANZ SE leader on some of our cool software based solutions.

This 40 minute section covered 4 demos (maybe to many for 40 minutes) that we produced out of the local solution center. I think it could be a week before my sleep catches up for that one. It is interesting just how much time filling a 40 minute segment can take up in preparation. This is amplified by my belief that if I am going to show it, I need to build it and prove it first. All demos I show I also buil and produce myself. Maybe that explains the dodgy Camtasia call-outs.

That said we showed:

The ViPR Data services

This is very relevant for VMware as it is now providing the Object Storage Services and the snpashot repository for the Database-as-a-Service offerings in vCloud Air

VVols with the EMC VNXe

A demo showing the 4 parts of implementing Virtial Volumes 1) Protocol Endpoint 2) Storage Provider, 3) Storage Containers 4) Storage Policies

Recoverpoint for Virtual Machines

Very cool replication technology now available at a VM granular level

The VMware + EMC Hybrid Cloud

Showed the vRealize suite servicing Virtual Machines,catalogues (vRealize Automation), VM mobility (vCloud Connector), Cost transparency (vRealize Business), Storage-as-a-Service (vRealize Automation + ViPR)

As a techie I really do enjoy building this stuff and the technology from both VMware and EMC is very cool. It is hard to put a prep time on the work to produce the demonstrations but it was stretched over weeks. With 3 of the 4 solutions shown, done with release levels still in beta or earlier it did bring on some interesting challenges.

It does go a long way to show how far these technologies have come. The old days of any service management solution being an endless drag on professional services are definitely starting to move behind us.

Anyway day one done, one speaking slot and a live vBronwbag podcast completed. Once the vForum roadshow completes in ANZ I will publish the 4 demos into Youtube.

Friday, 6 June 2014

Cool Solution# Hadoop File Services with EMC Isilon

I'm Sold!

First let me say, by my own admittance, I am an infrastructure guy! That said, I have been lucky enough recently to be given the chance to dive into the world of Big Data and PaaS. As a techie the extensive technology options in this area are very impressive and as a consequence, I have had a lot of fun combined with late nights and heavy learning.

I have always worked on the KISS principle 'Keep It Simple Stupid!' and Hadoop first into that with the exception of its file services HDFS. By having that layer of abstraction to the file system the ability to manage and populate the file system is non existent without specific tools written for the task.

I know DAS and scale-out through the data nodes is great and builds a big pond for your big data by combining the compute and disk resources of a 1000+ server nodes . Yay all good! But putting my EMC and Infrastructure hat on for a moment what about the following:

How do I backup my HDFS based data
How do you use those other cool storage capabilities such as snapshots, auto-tierring, etc
How do I get real-time analysis on data without having to move it into HDFS
How could you share the data from within HDFS
What if you need more compute resources in your cluster but not storage, or the other way around.

This is where EMCs Isilon comes to the rescue through its OneFS. The flexible open access into its large single name space is invaluable. All the joys you expect from a true scale-out, enterprise class storage array are there as well as the capability of accessing the same file system in multiple ways at the same time. Think about a web site logging to an NFS mount that is part of the HDFS file system allowing for realtime analytics against it!

Want to see this in action, look at this demo http://www.youtube.com/watch?v=Qx9BMzZa8UI

Further information can be found at http://www.emc.com/domains/isilon/index.htm

EMC ESI with Microsoft Applications

EMCs ESI, it's FREE!!!!

A few weeks back I recorded a series of short videos that showcase the ease of which you can extend your applications direct from EMCs FREE Storage Integration Suite (ESI). These are short sub 90 second videos that each focus on only one particular feature provided through the ESI console. As part of our application stack value prop this is a great example of what EMC can provide for a customer and of course, it is FREE!

ESI is like EMCs glue into the Microsoft ecosystem and includes:

Simple user console for storage, replication and application management
Support for Windows, HyperV, vSphere and XenServer
Powershell library with almost 200 cmdlets
System Center Orchestrator Integration Pack
System Center Operations Manager Management Pack
Hyper-V VSS Provider and associated PowerShell library
Support for Exchange (up to 2013) including Native and Third party replication (enabled by RecoverPoint)
Support for Sharepoint and SQL Server (SQL AlwaysOn and FC coming in a few weeks)

Storage Provisioning with ESI

http://www.youtube.com/watch?v=9effRPY7uLU

Microsoft Exchange Discovery with ESI

http://www.youtube.com/watch?v=CveZDDBf6pE

Microsoft Exchange Database Provisioning with ESI

http://www.youtube.com/watch?v=VVdOA7LsZw0

Microsoft Exchange Database Replication with ESI

http://www.youtube.com/watch?v=0Vy3uNlAbsc

Microsoft SharePoint Content Provisioning with ESI

http://www.youtube.com/watch?v=_22OiKxsV_Y

You can download it from http://www.emc.com/data-center-management/storage-integrator-for-windows-suite.htm

Tuesday, 3 June 2014

It's Not Just About Backup!

Why Have Different Levels of Protection?

I like the saying that comes when people refer to an old classic car 'they don't make them like that anymore'. The thing is, the statement should be followed with the statement 'thank goodness'. Can you imagine putting up with that level of reliability, quality, safety, comfort level with a car you purchased from a dealer today that you would have got in years gone by?

Same goes with technology. I still wake-up some nights and shudder when remembering the old tape backup routine. This use to go in two ways, the first was the nightly scheduled tape shuffle which thankfully I was not tasked with. The other was the pre-rollout backup we would run before pushing any new service out. This use to comprise us kicking the job off, then trotting down the road to a golf driving range to knock a few buckets of balls into no mans land to kill the time (usually hours).

So looking at that, we had two different use cases but only one method at our disposal, the inglorious, forever unreliable tape backup. Fast forward 15 years (maybe more but I keep that quiet) and we have many different forms of protection at our disposal including storage replication, snapshotting, tape backup (but hopefully VTL, not those horrible tapes).

All these options serve a purpose and work together nicely. If we were to classify the different use cases into buckets they could be:

Business Continuity (DR) = Remote Replication, constant protection driven by RPO and RTO requirements

Operational Protection = Local Snapshotting (or continuous protection as is provided with RecoverPoint) typically done by set interval (storage level) or self service (VM level)

Long Term Retention = Traditional Backup style typically run once a day

Each of these has there purpose and specific business requirements that drive their implementation. Interestingly I see a lot of the BCM and Long Term Retention being positioned but little of the Operational Protection being catered for. Back to my scenario at the start, if I could of had snapshotting at my disposal (or if I was real lucky, EMC RecoverPoint Continuous Data Protection) my golf practice (aka time wasting) would not have happened as we could have snapshotted the services that we were updating and started the rollout straight away.

Self service snapshotting is easily available these days thanks to two things:

Virtualisation with the VM level snapshotting (Checkpointing in a HyperV world)
Automation tools for storage (EMCs Storage Integration Suite is a good example)

So that is all cool for those known events but what about those unknown such as data corruption. That is what scheduled snapshots protect you against, providing a much more granular way to protect your systems beyond the typical 24 hour cycle of Long Term Retention. You may only retain a short set of snapshots such as 1-7 days but they can provide good peace of mind for those services deemed worthy.

I do realise that replication can also provide a way of rolling back but typically it is 'to the last change' or committed IO operation, so a corruption could easily by on the remote site as well as the source. Also replication would require that traffic to come back down the wire which adds time to the recovery / rollback process.

Another benefit of Operational Protection is that it can provide an easy / quick way for copies of datasets such as those within a database to be presented to an alternate location such as a from production to a test/dev instance.

Anyway, I got that of my chest so I feel better. Operation Protection = Good!

Just as a last note on this, I did not include High Availability (HA) in this as I am more looking at where the old functions we used tape for have evolved. There is some real cool stuff that can be done with stretched high availability that spans physical locations as is supported with vSphere Metro Storage Clusters and Microsofts Failover Clustering with products such as EMCs VPLEX, but that is a big enough topic on its own.

Thin Provisioning with VAAI and VASA in vSphere Working Together

That Damn Pesky Thin Provisioned Threshold Alarm in vCenter

Have you ever had those alarms go off in vCenter telling you that your thin LUN has exceeded its consumed space threshold? This is an interesting repercussion of VASA (storage awareness reporting) feeding its view of the thin provisioned LUN back to vCenter and then it reacting to it.

So first let's have a look at the issue. Thin provisioning in a block storage world tends to start small and then continue to grow until the configured upper limit is reached. Makes sense and it is what you would expect, as data matures the efficiencies are reduced with Thin Provisioning. In the wonderful world of virtualisation where VMs tend to be fairly fluid due to provisioning, deletion, moving activities a LUN can be full one moment and half empty the next. Add vSphere Storage Clusters and SDRS and it does not even need manual intervention.

Anyway back to the problem at hand, the thin datastore all of a sudden looks full so you shuffle stuff around and clean up space. All good at the datastore level but you are still getting alarms off the datastore, specifically the 'Thin-provisioned volume capacity threshold exceeded' alarm. The reason behind this is that once a block has been written to the array does not know how it is being utilised so reports it as consumed.

This is what VAAI Unmap is all about, giving the host a way of clearing a previously written to block in a way that the array can act on. The drawback on this is that there VAAI unmap is still not an automated process so needs to be executed through the CLI (esxcli storage vmfs unman <datastore>) or via PowerShell as a manual process.

Be aware though as this only returns space back to the array that has been released at the datastore level such as what would happen if you deleted a VM. There is also the in-guest space issue, a thin provisioned VMDK will also grow in much the same way that a datastore does. You can also return space back that has been deleted to within the guest through tools that need to be run within the guest such as Microsoft's SDELETE (http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx) or VMware's own Guest-Reclaim tool (https://labs.vmware.com/flings/guest-reclaim).

By getting both your in-guest and ESXi host level space reclaim strategy in place you can ensure that VAAI + VASA = Happy House and you are getting the most efficiency out of your storage (at the space level anyway).

EMC and Microsoft Applications Working Together

EMC Storage Integration Suite

I recently recorded a number of demo videos showing off some of the cool capabilities of the EMCs ESI. These videos show simple storage provisioning activities but aligned to applications such as Microsoft Exchange and SharePoint.

ESI is a very cool tool that is available from EMC for FREE (yes you read that correctly) and provides a rich set of capabilities including:

Simple user console for storage, replication and application management
Support for Windows, HyperV, vSphere and XenServer
Powershell library with almost 200 cmdlets
System Center Orchestrator Integration Pack
System Center Operations Manager Management Pack
Hyper-V VSS Provider and associated PowerShell library
Support for Exchange (up to 2013) including Native and Third party replication (enabled by RecoverPoint)
Support for Sharepoint and SQL Server

Demos:

Storage provisioning with EMCs Storage Integration Suite

EMC Storage Integration with Exchange using ESI

Exchange Database Provisioning with EMCs ESI

Exchange Database Replication Enabled with EMCs ESI

SharePoint Content Database Provisioning with EMCs ESI

Product Page:
http://www.emc.com/data-center-management/storage-integrator-for-windows-suite.htm