Governments, the media, and other pundits appear to conspiring to cause confusion about probability, although I suspect that the truth is more likely to be that are just don’t understand what they are talking about.
I live in Cumbria in the UK, and last weekend we suffered from severe flooding, and the news seems to have been dominated by people talking about a “once in a hundred year event” that seems to have occurred several times in the last decade. These comments always seem to lead on to people saying that it can’t be a “once in a hundred year event”, or that it must be caused by something else if it has happened more than once. This displays a total ignorance of probability, and educated people who talk such rubbish should be ashamed of themselves.
A “once in a hundred year event” is actually an event that has a probability of 1% of happening in any one year, just like a coin has a 50% chance of tails when tossed, or a dice has a 1 in 6 chance of turning up as a 6 when rolled. The “once in a hundred year event” can happen twice is succeeding years, or twice in a decade, and still remain an event that has a probability of 1% of happening in any one year. Similarly, a coin that is tossed 10 times can land as a tail more than 5 times, and a dice that is rolled 12 times can turn up as a six more than twice. Using the phrase”once in a hundred year event” is not only misleading, but wrong, as it implies that something can only happen once in any hundred years.
So, this is a plea to all of you out there who comment of the chance of an event happening. Please state your views as “there is a chance of x% that the event will happen in any one year” and not as “x times in a hundred years”. The first is mathematically correct and is not misleading, the second is not correct and is immensely misleading.
Finally, there is real concrete evidence that an organisation’s ability to recover is central to its immediate survival. Not its ability to recover after an incident, but its ability to demonstrate its recovery capability as perceived by others before any incident occurs. Business Continuity is now firmly center stage.
According to The Times, senior UK government officials “want the Co-operative Bank to be sold to a bigger player that could stabilise its IT system, which is feared to be so precarious that the bank could not cope with a serious problem.” For years I’ve been telling senior executives that not being able to demonstrate the existence of credible and tested Business Continuity arrangements could mean the difference between survival and failure, and now I can point to a real example. Business Continuity is not just for use in response to an incident – it must be demonstrable to interested parties well before any incident takes place.
Apparently, In the risk factors disclosed in its annual report, the Co-operative Bank has stated that “whilst a basic level of resilience to a significant data outage is in place, the bank does not currently have a proven end-to-end disaster recovery capability”. How many organisations can really hand on heart state that they have a proven end-to-end disaster recovery capability? Not that many.
Business Continuity has been practised in the banking industry for more than 25 years, and many of today’s accepted Business Continuity ideas and practices started in banking. Where banking leads in Business Continuity, other industries follow.
How long will it be before organisation’s in other industries are put at risk because they do not have a proven end-to-end disaster recovery capability?
Resiliency, or rather Business Resilience, seems to be the flavour of the month in the Business Continuity and Risk industries. Apparently, businesses are moving away from having separate silos for Security, Risk, Health & Safety, Business Continuity, etc., and are bringing all these related disciples under the heading of resiliency and are appointing a Head of Resilience.
This all sounds quite good, and is for once a piece of joined up thinking, except that the idea of Resiliency goes beyond these operational areas to the idea of ensuring that the business itself is resilient, which takes the discipline into the areas of leadership, reputation, innovation, product development, marketing, etc.. In other words, it seems to be about everything that the business does, and that a single manager should be appointed to ensure that the business should remain resilient in the changing environment in which it operates.
Now, tell me if I’m wrong, but I thought that this was actually the point of a Board of Directors. One of the prime responsibilities of a Director of a company according to UK law is to “try to make the company a success, using your skills, experience and judgement”. In other words it is the responsibility of every Director of a company to ensure that the company is resilient – it should not be delegated to a manager as Head of Resilience.
The Business Continuity and Risk industries should either start talking about Operational Resilience, or stop talking about Resiliency.
What is your organisation’s Business Continuity planning horizon? By that, I mean what time-scale after an incident that causes disruption do your Business Continuity Plans cover? A day, a week, a month, longer?
Every organisation that I’ve ever come across determines some kind of time limit, which is linked to the level of service that it plans to recover to. Without such a planning horizon, recovery plans would cover the complete resumption of the organisation back to its original state – which would be far too detailed and complex, and assume that nothing would change after the incident.
This planning horizon needs to be agreed at an early stage of the Business Continuity Management (BCM) process, before the Business Impact Analysis (BIA) is undertaken. This is because the BIA needs to concentrate on those activities that need to be recovered within the planning horizon. If this boundary hasn’t been put on the BIA, then a lot of time and effort will be wasted analysing every single activity.
So what? If everyone has a planning horizon then why mention it in a blog? Because it’s something that the Business Continuity industry chooses to keep secret. Try finding it is the ISO standard or the BCI’s Good Practice Guidelines. The idea of concentrating on the urgent activities in the BIA is there, but you won’t find anything about top management deciding on a planning horizon in the BCM Programme management sections. What’s everyone being so coy about?
I don’t think that anyone is going to object to the recommendation that organisations need to document disruptive events so that there is a clear record of what happened and how the organisation handled it. But what if I was to recommend to you that your organisations should document events that haven’t happened?
Sounds crazy? Yes, I agree, but this is the recommendation contained in a book that I’m currently reading on ISO 22301 and Business Continuity. Unfortunately, the book doesn’t elaborate on how this is to be done in terms of which events to include and how much detail should be recorded. Both of these need to be defined if the recommendation is to be followed, otherwise it is a rather pointless recommendation.
If you decide to include every event that might happen then you’ll spend the rest of your life listing them and still not have covered them all. And once you’ve decided on a finite list, do you then go into detail about what the different responses might have been and what might have been the result of each response? In other words, write a novel about each event that hasn’t yet happened?
After a short period of reflection, I’ve decided that this is not a good idea. Is there anyone out there who thinks it is?
I was attending a local Business Continuity Institute (BCI) forum the other day when someone mentioned the fact that there had been a ‘flu pandemic the other year. From a technical world health view this is correct, but from a Business Continuity (BC) perspective in the UK, I believe that this is dangerously misleading. As a consequence, I stated the view that as far as BC professionals are concerned, there was no ‘flu pandemic.
Why do I hold this view? Well, quite simply, the ‘flu pandemic did not cause any more disruption to UK organisations than the ‘flu normally does in any year. In other words, it was a “business as usual” type of disruption, which could be treated by local management as just one of those day to day issues that need to be handled. Yes, I know that lots of organisations, particularly in the public sector, convened weekly meetings of managers to monitor the situation, just in case they needed to invoke their Business Continuity plans (or special’Flu Pandemic plans), but the impact of the incident was very small.
It’s a bit like saying that an organisation suffered from a fire just because someone burnt the toast. Yes, technically there was a fire, but it would have been quickly put out, there would have been very little business disruption, and no Business Continuity plans would be invoked. It would be dealt with as a “business as usual” type of disruption.
Does this matter? Well, yes, I think that it does. To talk about ‘flu pandemic in the way that it was being talked about at the BCI meeting implies that there had been a business disruption and that Business Continuity plans had been successfully invoked. There was no significant business disruption , and although ‘flu pandemic teams met, no Business Continuity plans were invoked. In other words, the threat of the ‘flu pandemic was not realised, even though there was, technically, a ‘flu pandemic.
My message is simple. Don’t fool yourself into thinking that your plans dealt with the threat. It didn’t happen.
I do wish that government departments and business continuity professionals would stop describing random events, such as a severe flood, in terms of how many times that the event is expected to happen over a period of time, as in it’s a “once in a hundred year event”. Why? Because most people that I meet misunderstand its meaning.
A “once in a hundred year event” means that the event has a chance of 1% of happening within the next year, not that it happens only once in a hundred years. If the event happened last year, then the chance that it will happen this year is still 1%, it doesn’t mean that it won’t happen again for at least another 100 years. Given the fact that random events seem to come in clumps, or groups (like buses), the fact that the event happened last year should make people think that it could well happen again in the next few years.
I would be much happier if government departments and business continuity professionals would start describing random events in terms of the chance of the event happening within the coming year, such as in “there is 1% chance of a severe flood in the coming year”. This would be far less likely to be misinterpreted.
I’m very pleased that I’ve managed to get my latest client, a small electronics company that actually decided by themselves to implement Business Continuity Management (BCM) rather than being told to, to think about the maximum scale of incident that it wants to plan to survive. Many organisations shy away from this issue, which makes it difficult when advising on safe separation distances for backups and recovery sites, but my client’s management team understands the issues and will be coming up with an answer.
I think that the factor that will determine the answer is the geographic spread of their staff. If there is some kind of natural or man made disaster that affects the homes and families of most of the staff then it is unlikely that they will want to come to work to help out their employer, particularly if their employer is asking them to work a significant distance from their families who may be evacuated.
If this is the case then we’re probably talking of their surviving an incident that has an effective radius of about 30km. Such an incident would take quite a catastrophic and unlikely event given that the client is nowhere near a nuclear or chemical facility, well away from the coast, and not in an earthquake zone or near an active volcano. The most likely wide spread event is a river flood, but that doesn’t usually last more than a few weeks in the UK.
I was in the process of undertaking a risk assessment exercise for a client of mine when RBS suffered their systems failure the other day. By an amazing coincidence, I was working on the risks to the most urgent activities undertaken by their Finance department, and one of those activities is to make payments via BACS. I had identified that this activity was dependent on their bank, RBS, and was confronted with the very real problem of how do I assess the risk of RBS not being able to process payments because of a failure of their systems?
The likelihood of RBS not being able to process payments because of a failure of their systems would normally be rated as being very low, but today it’s a racing certainty! Had I undertaken the risk assessment a week ago I would have rated the risk as very low and not worth taking any mitigating measures. Now the risk is very high (certainty multiplied by the impact on my client), so they should consider mitigating action – such as have an alternative bank.
The question is, what does this say about value of undertaking such risk assessments?
One of the things that I always look for when helping a client to implement Business Continuity is single points of failure. My latest client has managed to provide me with the best example of one yet, and the name of the single point of failure is Malcolm.
The client will remain nameless, to protect the innocent, but quite by chance it was revealed that one of their most critical and urgent activities is totally dependent one a single person working for an outsource supplier, and his name is Malcolm. If Malcolm is not available to do an activity that is on the critical path to enable one of the client’s most important services to be delivered, the client’s reputation will be destroyed. This service is delivered just once a year, and is vital to thousands of my client’s stakeholders.
I’ve never met Malcolm, but apparently he’s been undertaking this activity for many years, and it’s never failed. I can’t help wondering how old Malcolm is, or whether or not he’s in good health. I know who he works for, but again, I must protect the innocent.
I nearly missed this single point of failure, so from now on I’m going to redouble my efforts to find the Malcolms of this world.