Blogs
Highlights
Categories
Tags
AI Alignment Apple application management Atos Themes Big Data Business Models BYO Changing Customer Cloud cloud computing Collaboration Connected World Current Affairs data Delivery Models downturn Education Facebook generation y Google Innovation J2016 Knowledge and Craftmanship London2012 media Microsoft mobile Mobility open innovation open source recession Research SaaS Security semantic web social media Social networking Sustainability The Future! Tracks Transformation trends trust Twitter Video virtualisation web White Papers Working in IT
Virtualised; supported?
August 26th, 2010 Dave Trew Tags: server, The Future!, virtualisation
Posted in Data Science, Firm of the Future, Technology |
This post is the second and final part of my Virtualisation mini-series. Please find the first part here.
Is there any option other than virtualisation?
From a technical viewpoint, yes, but it’s unattractive in the modern world; as described in the previous post, putting one operating system instance on a modern data centre server is inflexible and unnecessarily expensive through (maybe hugely) inefficient use of hardware resources.
So let’s assert that economic factors give a powerful urge toward virtualisation.
What is the practical support risk in a virtualised environment?
Most mainstream hypervisors are very well proven and problems are very rare – typically involving performance or the fine detail of resource lifecycles.
The issue is actually about receiving support for any problem, regardless of whether virtualisation is a factor. A vendor can always claim that the problem could be due to virtualisation and decline to help until that is ruled out. So the gate to getting meaningful support for any problem could be a rebuild-as-physical of the server(s) in question.
Oracle at least make some concession here. If your problem has been recognised before in an all-physical context then they’ll tell you the solution. Otherwise…
How do I reproduce the problem on a physical server?
It isn’t too difficult or costly to have extra server blades available. The problem is the time taken to get to a physical deployment. To consider a rebuild from scratch highlights the option of automated server/application build/deploy. This is attractive in concept, but only has RoI above a certain number of similar servers, and some products practically demand manual installation steps. In the absence of automated rebuild, the issue would be the quality of the documented build instructions (and the not-insignificant cost of producing these and ensuring they are correct and up-to-date). Even so, a server rebuild might involve anywhere between half a day and a week, depending on complexity and access to relevant staff.
Does the problem-reproduction environment need to be a scratch build?
Some hypervisor vendors have virtual-to-physical (V2P) conversion tools that can get you to a physical situation quickly. But is a V2P’d server image as valid for testing as one freshly built as physical? Can you prove it? For example, are the drivers used by the operating system for the genericised virtual hardware equivalent to those needed for the specific physical hardware?
HP’s Insight Control server migration provides a good answer here. This performs automatic driver replacement as it migrates P2V from any x86 server to VMWare ESX, Microsoft Hyper-V or Citrix XenServer, and V2P from these to one of their Proliant servers.
Any other options?
The most practical measure is likely to be to obtain urgent support through the professional services arm of most vendor organisations. They should be able to investigate not-strictly-supported configurations and obtain access to 3rd-line support, either pointing toward a resolution or bridging the gap while a rebuild takes place.
So the risk is mainly in time-to-resolve? What’s the impact of that?
Well, the system will (of course) have undergone full functional and representative performance testing in an environment that matches production. So we know it worked at that point.
Hence, problems are more likely to involve service deterioration or temporary disruptions, rather than total failure.
Consequently, it is necessary to assign a “cost” to certain degrees of disruption. Will revenue be lost? Will the parties using the service retry after a failure? Up to what threshold? What is the impact on the client of the cost of the users’ time? Can the users decide not to use the service, and what would be the cumulative result of that? What is the total impact of reputational risk from service instability?
By some means, we need to be able to relate these costs with the savings from virtualisation and, ultimately, to relate the commercial implications with the probable risk.
How can support for virtualisation be made to work, then?
It boils down to a trusted partnership between an intelligent client and an outsource provider committed to supporting the client’s business. It should be the case that the client will always understand that the cost base of an offered solution depends on use of server virtualisation, and hence involves the risks outlined above.
It shouldn’t involve a situation where all the cost benefits of virtualisation go to the client while all the risks go to the outsourcer (unless the arrangements recompense the outsourcer for taking on that risk). No relationship will last long if either party is in a position to take advantage of the other and then exploits that.
Nevertheless, relationships involve individuals and corporate stances, both of which can change. So everything must come down in the end to the contract, and this isn’t easy.
One option is for relief (relaxation) of Service Level requirements where it can be shown that virtualisation has resulted in a delay in resolving the problem – N.B. not that the problem necessarily relates to virtualisation.
But how much relief is appropriate? Can a client really agree that a severity 1 total service outage is allowed to continue for a week while the outsourcer rebuilds servers?
Positioning in this area needs to be the result of a clearly- and coldly-evaluated, explicit decision on balance between cost, risk and impact as described above. An inherently commercial, rather than technical matter.



Very interesting blog on virtualisation.
The other major area of interest / risk is in the design of the network supporting the newly virtualised solutions, ie moving to a flattened layer 2 structure, use of TRILL (or FabricPath™), “tromboning” of data paths on moving a heavily used virtual server from one DC to another whilst live, even just the fact that virtualisation will force a change from burst traffic on a 1Gb link to possibly sustained utilisation of a 10Gb link. Combine that with ip based storage running on the same core fabric and suddenly the move to a virtualisation solution can turn into a forklift upgrade of a datacenter with a new network design accessing barely ratified networking features!
As you clearly state, all this and more needs to be taken into account when deciding on positioning.