Skip navigation

Tag Archives: good practice

A few days before the recent British Airways (BA) catastrophic IT failure I was in Kuala Lumpur, Malaysia, giving a talk at the second ASEAN Business Continuity Conference entitled “Building a Robust ITDR Plan”.

The main thrust of this talk was that as IT is at the heart of every organisation, ITDR is at the heart of Business Continuity, and that it is up to the organisation’s top management to ensure that its ITDR plans both meet the needs of the organisation and are known to work.

It appears that BA’s ITDR plans did not work, and although we don’t know whether the plans were appropriate for BA, the possibility is that they weren’t. In any event, the failure certainly came as a nasty surprise to BA’s top management.

I was asked to provide a closing thought to my talk on “Building a Robust ITDR Plan”, and I used a quote from Georges Clemenceau, the Prime Minister of France in the First World War, to sum up my ideas. For those of you who aren’t that aware of the catastrophe suffered by France in that war, it lost a generation of young men. Out of 8 million men conscripted, 4 million were wounded and 1 in 6 killed.

Georges Clemenceau said “War is too serious a matter to entrust to military men.”

I said “ITDR is too serious a matter to entrust to technologists.”

BA will have learnt that lesson, as France did, the hard way.

Cyber and terrorist attacks currently appear to dominate Business Continuity (BC) thinking, but over the weekend we had a classic example of a good old fashioned failure of a critical IT system causing major disruption and some resulting poor incident management that compounded the problem. The company involved was British Airways (BA), and I say poor incident management because this is what the public has perceived and what BA customers experienced. No doubt there will be an internal BA investigation into what went wrong, but as a BC professional I’d love to know about three aspects of the incident and BA’s response:

  1. How long did it take from the initial failure of the system for the IT support technicians to realise that they were dealing with a major incident, who did they escalate the incident to (if anyone), were the people designated to handle major incident contactable, and was the problem compounded by the fact that BA’s IT had been outsourced to India?
  2. The system that failed is so critical to BA’s operations that it must have had a Recovery Time Objective (RTO) of minutes, or at worst, a couple of hours. To achieve this, BA should have put in place a duplicate live version of the system (Active/Active). Either BA did not have such a recovery option in place (I’m guessing that they had a replica – Active/Passive), which implies that they failed to understand the need to have a very short downtime on the system, or it had not been properly tested and failed when required.
  3. Why were the communications with customers  (people who were booked on BA flights) handled so badly? BA must have a plan to communicate with passengers, but was this dependent on the very system that failed?

For me, even before the inquest takes place, the major lesson to be learned is that the effectiveness of an organisation’s BC and incident response plans can only be assured by actually using the plans and responding to incidents. If you don’t want to find this out in response to a real incident, then you need to run realistic and regular exercises so that every aspect of your response is tested and the people involved know what to do. It doesn’t matter how good your Business Continuity Management (BCM) process is, how closely aligned to ISO 22301 it is, how good the result of the latest BC audit, or how much documentation you have. It’s your ability to respond effectively and recover in time that matters.

BA have suffered damage to their reputation , how much is yet to be seen. They will have suffered financial damage, and when the London Stock Market opens for trading we’ll see how much it has affected their share price. Maybe BA do run realistic and regular exercises. If they do, they should have identified the issues with the systems and incident response that were encountered over the weekend and acted on the lessons learned.

 

 

I have been further convinced of the need for the Business Continuity (BC) profession to get back to its fundamentals by the juxtaposition of the publication by the Business Continuity Institute (BCI) of a comprehensive list of legislation, regulations, standards and guidelines in the field of Business Continuity Management (BCM) and the experience of many business that were affected by the recent floods in the north-west of England.

Some small businesses, mainly those that operate and serve very local markets, have temporarily closed until their premises can be refurbished, but others are up and running and continuing to trade even though their premises were badly flooded. The businesses that are back up and running had implemented BC, but not in the way envisaged by the BC profession through its standards and guidelines.

These businesses had taken steps to ensure that they could recover from incidents like the recent flooding by doing such things as backing up their data, implementing cloud computing, knowing where they could obtain replacement premises and equipment, being able to redirect their telephones, and having adequate insurance cover. They are also managed by people who know how to respond to incidents, are committed to the continued success of their business, and know what needs to be recovered by when without having to read a plan.

None of these businesses had implemented a formal BCM programme, none of them had followed any guidelines, and none of them had implemented a Business Continuity Management System (BCMS) or been certified to a BCM standard.

The publication by the BCI of a comprehensive list of BCM legislation, regulations, standards and guidelines is very useful, and I’m not decrying it. But, and it is a very big but, the purpose of BC is to enable organisations to be resilient to incidents that affect their ability to operate. The people who own and run business in the north-west of England that had taken steps to ensure that they could recover from the recent flooding are practising the fundamentals of BC, and by and large have never even heard of BCM legislation, regulations, standards and guidelines.

Don’t get me wrong, there’s nothing wrong with BCM legislation, regulations, standards and guidelines, but they are not the end in itself. I sometimes think that BC professionals lose sight of this.

There seems to be a growing under current of opinion that is seriously starting question the current direction of Business Continuity (BC). It is best summarised by three issues that have been identified by David Lindstedt: it isn’t evolving; executives aren’t engaged; and there aren’t any meaningful metrics. To these I would add a fourth issue, and this is that the profession seems to have backed itself into a standards corner.

By pure coincidence I’ve just come across a new way forward for BC whilst undertaking research for a paper that I’ll be presenting at this year’s BCI World Conference and Exhibition in London in November. The title of my paper is “The BC Plan is Dead!”, and whilst looking for a practical example of the ideas that I’ll be presenting, I came across a novel and exciting approach to BC that has been implemented by a major UK company. I don’t want to spoil the presentation, so I can’t reveal yet who it is and what I’ll be saying, but a representative from that company will, as part of my presentation, show a new approach that is measurable, adds value to the business, has the active support of the Top Executive, extends the traditional boundaries of BC to include all disruptive incidents, and puts BC in front of the Top Executive on a regular basis.

On the assumption that this new approach “holds water”  when publicly presented, I intend to explain and document it after the Conference. I have to admit that it’s not an approach that I’ve developed, I just stumbled across it. However, I’m so impressed by what I’ve seen that I believe that it needs to be properly put in front of Business Continuity professionals.

Just when you thought that the Business Continuity (BC) profession had grown up and stopped quoting bogus statistics about the effects of not having Business Continuity Plans along comes another report trying to scare management with fairy stories.

This time the story comes from non other that the Business Continuity Institute (the BCI), which has published a paper called “Counting the Cost” as part of Business Continuity Awareness Week, in which the author states that “Figures show that 40%-60% of businesses without a BC plan never reopen after a significant incident, and the response for the first 10 days are critical to survival”. These figures come from something published on a website called visual.ly, and are totally unsubstantiated, as are all such statistics.

The author cautions the reader that “This report aims to be descriptive rather than normative. The figures cited come from surveys conducted by the BCI and other organisations (eg. IBM, Ponemon Institute, etc.), which also acknowledge the same limitations. Hence, statistical inferences cannot be applied to this data.”

If you can’t make statistical inferences about data, then don’t use the data! Pretty simple really.

Maybe, just maybe, some time in the future, the BC profession will grow up and realise that you can’t just go around quoting unsubstantiated statistics about the benefits of BC.

This week I’m in the Cotswolds, in the UK, giving the Business Continuity Institute’s five day Good Practice Guidelines (GPG) course. As usual,the delegates all complain about the fact that GPG is a difficult book to read, and that if anything, it’s a great cure for insomnia. This is all a bit embarrassing for me as my name appears in the GPG as a contributor and one of the chief reviewers, but I can explain that is was written by committee and that it’s difficult to make such a document an exciting read.

I went on from this to explain that a new version is being written, and again, I am a contributor. On hearing this, one of the delegates asked me if the new version was going to be any better in terms of being easier to read. A very good question. I would hope so, but I couldn’t give a definitive answer as I’m just one of a number of contributors and the end document will be reviewed by a QA group. However, I will try and make it my business to see that we get a more readable version, if only for the sake of avoiding the inevitable criticism that I’ll be subjected to if it isn’t.