Skip navigation

Tag Archives: being prepared

A few days before the recent British Airways (BA) catastrophic IT failure I was in Kuala Lumpur, Malaysia, giving a talk at the second ASEAN Business Continuity Conference entitled “Building a Robust ITDR Plan”.

The main thrust of this talk was that as IT is at the heart of every organisation, ITDR is at the heart of Business Continuity, and that it is up to the organisation’s top management to ensure that its ITDR plans both meet the needs of the organisation and are known to work.

It appears that BA’s ITDR plans did not work, and although we don’t know whether the plans were appropriate for BA, the possibility is that they weren’t. In any event, the failure certainly came as a nasty surprise to BA’s top management.

I was asked to provide a closing thought to my talk on “Building a Robust ITDR Plan”, and I used a quote from Georges Clemenceau, the Prime Minister of France in the First World War, to sum up my ideas. For those of you who aren’t that aware of the catastrophe suffered by France in that war, it lost a generation of young men. Out of 8 million men conscripted, 4 million were wounded and 1 in 6 killed.

Georges Clemenceau said “War is too serious a matter to entrust to military men.”

I said “ITDR is too serious a matter to entrust to technologists.”

BA will have learnt that lesson, as France did, the hard way.

Advertisements

Cyber and terrorist attacks currently appear to dominate Business Continuity (BC) thinking, but over the weekend we had a classic example of a good old fashioned failure of a critical IT system causing major disruption and some resulting poor incident management that compounded the problem. The company involved was British Airways (BA), and I say poor incident management because this is what the public has perceived and what BA customers experienced. No doubt there will be an internal BA investigation into what went wrong, but as a BC professional I’d love to know about three aspects of the incident and BA’s response:

  1. How long did it take from the initial failure of the system for the IT support technicians to realise that they were dealing with a major incident, who did they escalate the incident to (if anyone), were the people designated to handle major incident contactable, and was the problem compounded by the fact that BA’s IT had been outsourced to India?
  2. The system that failed is so critical to BA’s operations that it must have had a Recovery Time Objective (RTO) of minutes, or at worst, a couple of hours. To achieve this, BA should have put in place a duplicate live version of the system (Active/Active). Either BA did not have such a recovery option in place (I’m guessing that they had a replica – Active/Passive), which implies that they failed to understand the need to have a very short downtime on the system, or it had not been properly tested and failed when required.
  3. Why were the communications with customers  (people who were booked on BA flights) handled so badly? BA must have a plan to communicate with passengers, but was this dependent on the very system that failed?

For me, even before the inquest takes place, the major lesson to be learned is that the effectiveness of an organisation’s BC and incident response plans can only be assured by actually using the plans and responding to incidents. If you don’t want to find this out in response to a real incident, then you need to run realistic and regular exercises so that every aspect of your response is tested and the people involved know what to do. It doesn’t matter how good your Business Continuity Management (BCM) process is, how closely aligned to ISO 22301 it is, how good the result of the latest BC audit, or how much documentation you have. It’s your ability to respond effectively and recover in time that matters.

BA have suffered damage to their reputation , how much is yet to be seen. They will have suffered financial damage, and when the London Stock Market opens for trading we’ll see how much it has affected their share price. Maybe BA do run realistic and regular exercises. If they do, they should have identified the issues with the systems and incident response that were encountered over the weekend and acted on the lessons learned.

 

 

Finally, at long last, there appears to be some real evidence that Business Continuity (BC) works. After years of effort trying to debunk the 80% myth (80% of organisations that don’t have a BC plan fail withing 18 months of suffering from a major incident – or something similar), I’ve now seen some real research that demonstrates that BC does, in fact, have a beneficial impact.

The research takes the form of a study from IBM Security (conducted by the Ponemon Institute), which analyses the financial impact of data breaches. According to the study, leveraging an incident response team was the single biggest factor associated with reducing the cost of a data breach: saving companies nearly $400,000 on average (or $16 per record).  The study also found that the longer it takes to detect and contain a data breach, the more costly it becomes to resolve.

Admittedly, the study covers only cyber security, but at least it’s a start. It confirms the long held assumption in BC circles that being able to quickly and effectively activate a response team to handle an incident is one of the most effective ways of reducing the impact of the incident on the organisation.

Now all we need is for someone to widen the research to cover all disruptive incidents. Anyone want to do a PhD is BC?

The report can be downloaded at http://www-03.ibm.com/security/data-breach/index.html.

Another day, another politician that thinks that contingency plans shouldn’t be developed. This time it’s the head of the European Commission, Jean-Claude Juncker, who has told his officials not to work on contingency plans for Greece’s possible exit from the euro. Why? Apparently it’s because the plans could be leaked and cause turmoil in financial markets.

In other words, Europe’s top politician has effectively told everyone that he believes that Business Continuity planning is a dangerous discipline and that Business Continuity Plans should not be developed just in case they are leaked to the media.

Trying to sell the benefits of investing in Business Continuity is hard at the best of times, but now we have Jean-Claude Juncker and his helpful ideas. It’s not as bad as the person who once told me that he didn’t want to develop a Business Continuity Plan as it was tempting fate, but it’s getting close.

The prevailing view of the Business Continuity (BC) community is that the only benefits of not having a Business Continuity Plan (BCP) are that you’ll be saving a small amount of time and money, but with huge downsides if you ever suffer from an incident that causes major disruption to your operations. But this may have to be revised as a result of fire at a Dogs’ Home in Manchester in the UK last Thursday evening.

The fire, which was tackled by more than 30 firefighters, was a tragic event that killed about 60 animals. Some 150 dogs were saved, and from all the reports it looks as if the staff did not have a pre-prepared BCP. However, the public rallied round after the Dogs’ Home asked for people to provide temporary foster care for the rescued dogs. Large numbers of people turned up to help, volunteers at the site began collecting dog food, bedding and other items donated by the public, and a JustGiving account set up by the Manchester Evening News raised more than £1.2m. In fact so many people tried to turn up to help that the Cheshire Police tweeted: “High Volume of Vehicles at Cheshire Dogs Home to adopt dogs following the recent tragic fire. Avoid area if travelling.”

Volunteers are saying they have been overwhelmed by the response and that they now have rooms full of dog food, blankets, crates and baskets, and although many members of staff say they’re devastated by the fire, there’s a sense of optimism and comradeship as as fosterers turn up to take dogs home.

The net result seems to be that the Dogs’ Home is far better off than if they had had a BCP that clicked seamlessly into operation and hadn’t had to ask for help. So, before you decide to spend time and money on developing a BCP, ask yourself if you should just wait until an incident happens and hope that help and assistance will be provided by the public. Maybe this would only happen in the UK and to a Dogs’ Home. I wouldn’t recommend that a bank tries it!

Yesterday I finally got round to doing a job that I’d been putting off for weeks – updating my company’s Business Continuity Plan (BCP). The system that we use to manage Business Continuity, Mataco, had been regularly sending me reminders that it needed to be reviewed, but I’d been ignoring them because it wasn’t my top priority and besides, it’s an extremely boring job.

Now, my role in Merrycon is to provide Business Continuity consultancy, and the need to keep BCPs up to date is one of the things that I keep telling my clients that they need to do. I seem to spend significant amounts of time and effort helping clients set up structures and procedures to ensure that BCP maintenance is carried out in a timely and effective way, and in training client staff in how to update their BCPs. To be fair, I do advise my clients that it’s a task that people don’t like doing, but I regularly find myself in the position of criticising clients for not keeping their BCPs up to date.

So, the question is, how do I make the task of keeping BCPs up to date exciting? How do I make people want to spend time checking through their BCP to see what needs to be updated, then spend time updating the BCP, and then to spend time making sure that everyone has a copy of the new version of the BCP? I need the answer to this question not only for my clients, but for me as well.

The motto of the scouting movement is “Be Prepared”, which could just as well be adopted by the Business Continuity industry. I have, for some time now, been looking for a good “party” definition of Business Continuity, and now I think I’ve found one.

A  “party” definition is where someone that you meet in a social setting asks “What is Business Continuity?” You then have a split second in which to think of an interesting and engaging answer. The best one that I had previously was “Making sure that the product gets to the customer”, now I have “Being Prepared”.

Has anyone else got any pithy definitions?