Saturday, November 10, 2012

Let People Manage People

I had a great conversation this week about SLA enforcement this week that I thought I'd share here.

The conversation was about private cloud provisioning.  When you use VMware's vCD product, it wil select the correct datastore based on the assigned storage profile.  All well and good.  The question we got was:  "What if I have an SLA that says the storage must be located in the USA?  For example, if this is a federal government customer."

Ah, simple we say.  Just create a specific storage profile that is called "US Only Data" or some such and you're all set.

"Yes, but."  OK, here comes the but.  "I have a contract with the federal government that says that I have to ensure that my data does not leave the USA.  How can your product ensure that doesn't happen?" 

That's the rub, isn't it?  How can we model even a very simple SLA like geo location of data?  The reality is that you cannot.

As I have said many times before, the movement to cloud is a fundamental business shift that is enabled by technology but is not fundamentally a technology shift.  Or to put it another way, technology does not solve every problem.

In this case, we are really talking about business rules that govern the placement of data.  This business rule is created by humans and must be implemented by humans.  Yes, we can create scripts or other tools to help make this simpler, no we cannot insure that the software will always do the right thing.  The humans are, in the end, responsible for ensuring that the business rules are met.

Oh man, I hear the engineers out there protesting.  You are fuming on the other side of your screens!  There must be a technology answer here!   No, there isn't.

Let's think this one through.  Let's say that we did create a system that allowed us to model SLA's.  For example, we could do something simple like a tag for all VM's that's created at provision time that lists the countries that they're allowed to run in.  Then, when we provision, this tag gets set to "USA".  Then we tag all assets in the data center with a geo tag stating their location.  Then we write a script that compares these values.  Simple, right?

Wrong.  We are still dependant on the humans to enter the right value tags.  If that doesn't happen, the whole system breaks down.  And this is a relatively simple business rule.  If you start looking at your SLA's closely, you will find that it is almost impossible to model them completely.  And that's only the "Formal" SLA's.  What about the informal ones?  The reality is that the complexity of human interaction is so complex that it's almost impossible to model.  I'm sure that there are some really smard dudes out there who are trying and they may prove me wrong one day, but I doubt it somehow.

So, what's the message here?  The message is to let the humans manage the humans.  Create scripts and processes to check on things that are easy for computers to understand.  Workload X does on datastore Y.  Why?  Because we say so.  The computer doesn't really need to know.  Maximum acceptable latency on datastore X cannot be more than 12ms.  Whatever.  These are simple things to model and relatively simple to build into monitoring software.  Focus on these things.

The converse is also true.  When setting up formal and informal SLA's with your customers, keep in mind the limits of what the system can actully measure.  Saying that performance must be "acceptable" is really tough to gague.  An SLA around average response time of xx miliseconds can actually be measured and tracked.  Time to first byte, latency, throughput....  These things are all good fodder for SLA's.  Words that require actual judgement "Good," "Acceptable" or "Excellent" are not words that should ever be included in a formal SLA document because these require human judgement.

No comments: