Skip navigation

A few days before the recent British Airways (BA) catastrophic IT failure I was in Kuala Lumpur, Malaysia, giving a talk at the second ASEAN Business Continuity Conference entitled “Building a Robust ITDR Plan”.

The main thrust of this talk was that as IT is at the heart of every organisation, ITDR is at the heart of Business Continuity, and that it is up to the organisation’s top management to ensure that its ITDR plans both meet the needs of the organisation and are known to work.

It appears that BA’s ITDR plans did not work, and although we don’t know whether the plans were appropriate for BA, the possibility is that they weren’t. In any event, the failure certainly came as a nasty surprise to BA’s top management.

I was asked to provide a closing thought to my talk on “Building a Robust ITDR Plan”, and I used a quote from Georges Clemenceau, the Prime Minister of France in the First World War, to sum up my ideas. For those of you who aren’t that aware of the catastrophe suffered by France in that war, it lost a generation of young men. Out of 8 million men conscripted, 4 million were wounded and 1 in 6 killed.

Georges Clemenceau said “War is too serious a matter to entrust to military men.”

I said “ITDR is too serious a matter to entrust to technologists.”

BA will have learnt that lesson, as France did, the hard way.

Advertisements

Cyber and terrorist attacks currently appear to dominate Business Continuity (BC) thinking, but over the weekend we had a classic example of a good old fashioned failure of a critical IT system causing major disruption and some resulting poor incident management that compounded the problem. The company involved was British Airways (BA), and I say poor incident management because this is what the public has perceived and what BA customers experienced. No doubt there will be an internal BA investigation into what went wrong, but as a BC professional I’d love to know about three aspects of the incident and BA’s response:

  1. How long did it take from the initial failure of the system for the IT support technicians to realise that they were dealing with a major incident, who did they escalate the incident to (if anyone), were the people designated to handle major incident contactable, and was the problem compounded by the fact that BA’s IT had been outsourced to India?
  2. The system that failed is so critical to BA’s operations that it must have had a Recovery Time Objective (RTO) of minutes, or at worst, a couple of hours. To achieve this, BA should have put in place a duplicate live version of the system (Active/Active). Either BA did not have such a recovery option in place (I’m guessing that they had a replica – Active/Passive), which implies that they failed to understand the need to have a very short downtime on the system, or it had not been properly tested and failed when required.
  3. Why were the communications with customers  (people who were booked on BA flights) handled so badly? BA must have a plan to communicate with passengers, but was this dependent on the very system that failed?

For me, even before the inquest takes place, the major lesson to be learned is that the effectiveness of an organisation’s BC and incident response plans can only be assured by actually using the plans and responding to incidents. If you don’t want to find this out in response to a real incident, then you need to run realistic and regular exercises so that every aspect of your response is tested and the people involved know what to do. It doesn’t matter how good your Business Continuity Management (BCM) process is, how closely aligned to ISO 22301 it is, how good the result of the latest BC audit, or how much documentation you have. It’s your ability to respond effectively and recover in time that matters.

BA have suffered damage to their reputation , how much is yet to be seen. They will have suffered financial damage, and when the London Stock Market opens for trading we’ll see how much it has affected their share price. Maybe BA do run realistic and regular exercises. If they do, they should have identified the issues with the systems and incident response that were encountered over the weekend and acted on the lessons learned.

 

 

In a survey about the experience of handling major losses undertaken Vericlaim and Alarm, more than half of respondents “rated the practical assistance offered by a BCP (Business Continuity Plan) following a major incident as one or two out of a possible score of five”. In other words, the BC Plans of the organisations responding to the survey were found to not particularly helpful when responding to a major loss!

This finding seems to have been rather under reported by the BC community who are usually so forward in explaining the importance of having a BC Plan and extolling the virtues of BC in improving resilience. Personally, I find it a damning indictment of the BC profession.

One of the things that constantly both amuses and horrifies me is how far most BC Plans are from the description given in the Business Continuity Institute’s (BCI’s) Good Practice Guidelines. This states that a BC Plan should be “…focused, specific and easy to use…”, and that the important characteristics for an effective BC Plan are that is direct, adaptable, concise, and relevant.

Over the years I have had the pleasure of see hundreds, if not thousands of BC Plans from a wide variety of organisations, and I can safely say that more than 90% of these plans do not fit in with this description. They tend to contain lots of information that is irrelevant to the purpose of responding to a major incident and seem to be written more for the benefit of the organisation’s auditors than for use by people who need to take action to reduce the impact of the incident on the organisation.

As a BC consultant, I keep trying my best to improve BC Plans, but I’m constantly being knocked back by people who tell me that all sorts of things need to be put into their BC Plans, more often than not because of an audit or review undertaken by a third party.

For far too long this situation has been allowed to continue unchallenged. It cannot do so for too much longer without the BC profession losing credibility.

 

 

Finally, at long last, there appears to be some real evidence that Business Continuity (BC) works. After years of effort trying to debunk the 80% myth (80% of organisations that don’t have a BC plan fail withing 18 months of suffering from a major incident – or something similar), I’ve now seen some real research that demonstrates that BC does, in fact, have a beneficial impact.

The research takes the form of a study from IBM Security (conducted by the Ponemon Institute), which analyses the financial impact of data breaches. According to the study, leveraging an incident response team was the single biggest factor associated with reducing the cost of a data breach: saving companies nearly $400,000 on average (or $16 per record).  The study also found that the longer it takes to detect and contain a data breach, the more costly it becomes to resolve.

Admittedly, the study covers only cyber security, but at least it’s a start. It confirms the long held assumption in BC circles that being able to quickly and effectively activate a response team to handle an incident is one of the most effective ways of reducing the impact of the incident on the organisation.

Now all we need is for someone to widen the research to cover all disruptive incidents. Anyone want to do a PhD is BC?

The report can be downloaded at http://www-03.ibm.com/security/data-breach/index.html.

I have just attended a very good Business Continuity (BC) conference held in Malaysia by GRC Consulting Services in conjunction with the Business Continuity Institute (BCI), but I couldn’t help being concerned about the fact that the standards industry is producing more and more management systems standards in and around the subject of BC.

Why is this happening? Well, to my mind, there seem to be two drivers behind this trend, neither of which are good for BC.

The first one, which an increasing number of people seem to be talking about, is that the main bodies behind the development of all these standards have discovered a rich source of revenue and are now exploiting this for all that it’s worth. These bodies claim to be “not for profit”, but like many such organisations there are large numbers of people engaged in standards activities that derive considerable profit from the work that they do. The more standards that they produce the more these people profit from the work that they do.

This driver is simply the age old story of people making a profit when they can, and is not too dangerous as it will eventually come to an end when the people buying and using the standards come to realise what’s going on. The second driver though, it much more dangerous, as it strikes at the heart of BC and has the capacity to cause enormous damage.

This second driver is the desire to make something that is difficult, complex, and demanding, and which requires considerable skill and experience, simple to implement through a process that can be implemented by a management system. To see what I mean, you need look no further than BS 65000, the recently published Guidance for Organizational Resilience, which, to quote the body that produced it – “This landmark standard provides an overview of resilience, describing the foundations required and explaining how to build resilience.”

Organizational Resilience is something that every company continuously tries to achieve. It is nothing new, and has been an essential goal ever since the first company was founded. Few manage it over the long term, and the life of most companies is very short as the products and services that they produce become outdated and overtaken by new trends, ideas, and inventions. If explaining how to build resilience can be described in a short pamphlet and implemented by anyone with the capability to read and follow a set of procedures, then how come it was missed by so many millions of people involved in the running of the hundreds of thousands of companies that have failed?

The international standard for Organizational Resilience (ISO 22316) is due to publish in 2016, which must be a great relief for all those organisations that are struggling to survive in the ever more competitive markets in which they operate. All they now have to do is implement the standard, be audited for compliance, and get the certificate. So much easier than researching and developing new products, finding new markets, producing the products and services at competitive cost, controlling cash flow, hiring and maintaining the right people with the right skills, complying with ever increasing legislation, developing and enhancing reputation, etc.

 

I know that it’s only January, but this year’s prize for a misquote must go to Fire Security Ltd, who have stated the following on their website  “According to research by economic analysts Mel Gosling and Andrew Hiles, 70 per cent of businesses would fail after a fire – either by not reopening immediately after the blaze, or gradually dwindling in resources and effectiveness to close within three years”.

I have spent many years trying to debunk this myth, and some years ago Andrew and I jointly published our research on the much quoted statistic and its many variations (see http://www.continuitycentral.com/feature0660.html). Now I find that I’m being quoted as not only supporting the myth, but as having undertaken and published research showing it to be true!

How did Fire Security Ltd come to believe this? Is it deliberate, or a genuine mistake? I’m not one to believe in Machiavellian plots – the “cock-up” theory is usually right, but I’m struggling to understand how anyone could read the research that Andrew and I published and reach the conclusion that they did. Maybe someone from Fire Security Ltd had heard about the myth (most people have) and Googled it – only to find that we had done some research, and then just looked at the 2 line extract that Google provides and decided to quote us without going to or reading the link. Is that what passes for intelligent research nowadays?

I could become famous for proving that 80% of businesses that suffer from a major incident and do not have a business continuity plan go out of business within 18 months (not to mention being thought of as an “economic analyst”), which would be terribly ironic.

In case anyone out there has any doubt, I believe this statistic and its many variations to be not only a total myth, but absolute rubbish.

I have been further convinced of the need for the Business Continuity (BC) profession to get back to its fundamentals by the juxtaposition of the publication by the Business Continuity Institute (BCI) of a comprehensive list of legislation, regulations, standards and guidelines in the field of Business Continuity Management (BCM) and the experience of many business that were affected by the recent floods in the north-west of England.

Some small businesses, mainly those that operate and serve very local markets, have temporarily closed until their premises can be refurbished, but others are up and running and continuing to trade even though their premises were badly flooded. The businesses that are back up and running had implemented BC, but not in the way envisaged by the BC profession through its standards and guidelines.

These businesses had taken steps to ensure that they could recover from incidents like the recent flooding by doing such things as backing up their data, implementing cloud computing, knowing where they could obtain replacement premises and equipment, being able to redirect their telephones, and having adequate insurance cover. They are also managed by people who know how to respond to incidents, are committed to the continued success of their business, and know what needs to be recovered by when without having to read a plan.

None of these businesses had implemented a formal BCM programme, none of them had followed any guidelines, and none of them had implemented a Business Continuity Management System (BCMS) or been certified to a BCM standard.

The publication by the BCI of a comprehensive list of BCM legislation, regulations, standards and guidelines is very useful, and I’m not decrying it. But, and it is a very big but, the purpose of BC is to enable organisations to be resilient to incidents that affect their ability to operate. The people who own and run business in the north-west of England that had taken steps to ensure that they could recover from the recent flooding are practising the fundamentals of BC, and by and large have never even heard of BCM legislation, regulations, standards and guidelines.

Don’t get me wrong, there’s nothing wrong with BCM legislation, regulations, standards and guidelines, but they are not the end in itself. I sometimes think that BC professionals lose sight of this.

Governments, the media, and other pundits appear to conspiring to cause confusion about probability, although I suspect that the truth is more likely to be that are just don’t understand what they are talking about.

I live in Cumbria in the UK, and last weekend we suffered from severe flooding, and the news seems to have been dominated by people talking about  a “once in a hundred year event” that seems to have occurred several times in the last decade. These comments always seem to lead on to people saying that it can’t be a “once in a hundred year event”, or that it must be caused by something else if it has happened more than once. This displays a total ignorance of probability, and educated people who talk such rubbish should be ashamed of themselves.

A “once in a hundred year event” is actually an event that has a probability of 1% of happening in any one year, just like a coin has a 50% chance of  tails when tossed, or a dice has a 1 in 6 chance of turning up as a 6 when rolled. The “once in a hundred year event” can happen twice is succeeding years, or twice in a decade, and still remain an event that has a probability of 1% of happening in any one year. Similarly, a coin that is tossed 10 times can land as a tail more than 5 times, and a dice that is rolled 12 times can turn up as a six more than twice. Using the phrase”once in a hundred year event” is not only misleading, but wrong, as it implies that something can only happen once in any hundred years.

So, this is a plea to all of you out there who comment of the chance of an event happening. Please state your views as “there is a chance of x% that the event will happen in any one year” and not as “x times in a hundred years”. The first is mathematically correct and is not misleading, the second is not correct and is immensely misleading.

As most people are only too well aware, the way that we find and use information is going through a radical and fundamental change, which is being driven by the Internet. What doesn’t seem to have permeated the world of Business Continuity though, is that this change is revolutionising the Business Continuity Plan.

Not too many years ago, in our house, we used to keep a telephone directory and combined bus and train timetable near our front door, close to where we had our telephone. Today, we have neither of those things, and if we want to find a telephone number or the time of a bus or train we’ll simply use the Internet, and rapidly find what we’re looking without wading through pages and pages of small print trying to decipher how the directory or timetable is organised before getting to the information that we want. We also had the depressing problem of finding out later on that we’d looked up the information in a document that was out of date, and that one of the family had inadvertently thrown away the new version and kept the old one.

Telephone directories and timetables are just two examples of documents that are being used by fewer and fewer people, and most of those are older people who find it hard to change a lifetime’s habits. Using printed documents to find information is becoming a thing of the past, as anyone who mixes with youngsters will confirm. Why then, do we persist with documents in the world of Business Continuity, what’s wrong with just finding the information that we need from the Internet?

The problems of document based Business Continuity Plans are only too well known. Unfortunately, more often than not, they are difficult to use in a crisis, contain unnecessary information, and are out of date. What we really need is something that is simple to use, delivers exactly what is required, and provides the latest information. That is an App.

An App is short for an Application, and is quite simply a piece of software designed to fulfil a particular purpose, and is downloaded by a user to a computing device from which it can be used. Apps can be used to obtain information, and when designed to provide the information required to respond to an incident, they are an ideal and powerful tool.

Don’t make the mistake of thinking that holding a Business Continuity Plan as a PDF document and making it available on the Internet via an App is the same thing as an App designed to enable someone to respond to an incident, it’s not. You don’t look up the time of a train on the Internet by opening up a PDF document and searching through it, do you?
A Business Continuity App can provide responders with clear, action orientated, and time-based direction, while allowing quick access to relevant and up to date support information. Exactly what we want to achieve.

This revolution has profound consequences for world of Business Continuity, and if you’d to find out what these are, then come and listen to me present at the BCI World Conference and Exhibition in November. The Business Continuity Plan, as a document, is dead, long live the Business Continuity App.

There seems to be a growing under current of opinion that is seriously starting question the current direction of Business Continuity (BC). It is best summarised by three issues that have been identified by David Lindstedt: it isn’t evolving; executives aren’t engaged; and there aren’t any meaningful metrics. To these I would add a fourth issue, and this is that the profession seems to have backed itself into a standards corner.

By pure coincidence I’ve just come across a new way forward for BC whilst undertaking research for a paper that I’ll be presenting at this year’s BCI World Conference and Exhibition in London in November. The title of my paper is “The BC Plan is Dead!”, and whilst looking for a practical example of the ideas that I’ll be presenting, I came across a novel and exciting approach to BC that has been implemented by a major UK company. I don’t want to spoil the presentation, so I can’t reveal yet who it is and what I’ll be saying, but a representative from that company will, as part of my presentation, show a new approach that is measurable, adds value to the business, has the active support of the Top Executive, extends the traditional boundaries of BC to include all disruptive incidents, and puts BC in front of the Top Executive on a regular basis.

On the assumption that this new approach “holds water”  when publicly presented, I intend to explain and document it after the Conference. I have to admit that it’s not an approach that I’ve developed, I just stumbled across it. However, I’m so impressed by what I’ve seen that I believe that it needs to be properly put in front of Business Continuity professionals.