Although 99.999%
availability sounds quite impressive, one wonders if such a high number is good enough. Let's make a simple calculation and find out.
This simple excercise is based on a critical network component that is expected to provide year-round service without a single glitch. Even the slightest hick in service becomes costly, as we shall see.
Every single second that this particular system is down means a revenue loss of 500 messages per second
times netto ten dollar cents a message which equals 50 dollars per second
.
Further, in a single year there are 60 x 60 x 24 x 365 equals 31,536,000 seconds
. Take 0.00001%
of that total and you get: 315 seconds
.
Damages for this slight glitch are: 315 x 50 equals $15,750
.
Of course this calculation is based merely on an average, but if we have bad luck and it is happening to a multi-node system during a critical traffic peak, or even worse when these messages are responsible for business transactions involving thousands of dollars per request, then the losses are going to be quite significant.
Direct revenue losses are caused by the inability to process messages, but there are also indirect costs: immediate costs meaning for example tons of extra calls to customer support etc, and long-term costs being a seriously blemished brand-name due to this negative publicity and more.
Now the same question again: is this acceptable? If not, then what are we going to do about it?
I guess we could either raise this availability by: improving the technology or mobilizing available resource to offer efficient and immediate support to ease the pain.
Best of all would be a healthy balance of the two depending on the type of organization involved.
Good luck, keep your fingers crossed, and be prepared.