On January 17th, 2008, BA flight 38 approached landing at London Heathrow airport. During its final approach one of the engines stopped. Shortly after, the second engine stopped too. At the moment when the pilots needed to increase thrust for their final approach towards the runway, the huge flying metal object became a weight dropping from the sky. The plain crashed about 300 metres from the runway. Miraculously only 47 people were injured (the worst injury was a broken limb) and there were no fatalities. As in any flight incident, the crash was investigated, and possible reasons were eliminated one by one. Pilots fuel operation fault, shortage of fuel, fuel pipes blockage - all were overruled. Finally investigators had one last hypothesis: when flying over the arctic, a fairly new route at the time, fuel flow was blocked due to humidity in the fuel freezing into ice crystals. They had no proof, since by the time of the crash all ice was melted. But this was their best bet.
Although this is not the typical topic in a software team retrospective, the way airlines and aircraft makers handle incidents is indeed relevant for us.
Just as astonishing about the Flight 38 crash is the aftermath: Boeing came up with a new procedure for 777 pilots: If fuel is suspected to freeze over when flaying over arctic areas, reduce thrust to allow ice crystals to melt, and then re-increase thrust to resume fuel flow. The procedure worked. We know because this condition happened, pilots followed the checklist, and continued flying normally. As if nothing has happened. Hundreds of lives saved in an almost non-event.
Think of this: There are about 1,400 Boeing 777s in service (about 650 in 2008), in over 60 or so different operators. A new procedure is being published, in the form of a checklist, and all 777 pilots and co-pilots in all these airlines are updated, trained, and capable or reacting to aircraft-threatening situations in real-time. And this is just one example!
Can you imagine that in your organisation? A software development team finds a critical bug in a third-party that affects 1,400 classes. The bug is manifested in rear conditions, but has potentially catastrophic results for your clients. There are different 60 teams working on these classes. Following discovery of the bug, the team comes with a workaround until the 3rd party provides a fix. Within a week, all teams are familiar and trained on the new workaround. Moreover, another team has proof that their workaround has prevented such failure - the new procedure works!
Can your organisation do this?
Atul Gawande, author of The Checklist Manifesto, is a surgeon. He was approached by WHO (World Health Organisation) to help finding ways to reduce post-surgery illnesses and mortality. What he has found is groundbreaking. By devising and implementing an operating theatre checklist in 8 experiment hospitals, they scientifically proved that they can reduce fatality rate from over 400 to about 250 cases. Gawande himself and his crew found numerous cases where the checklist prevented unnecessary surgery complications and even death. The dead simple list of items to check at critical points of a surgery is helpful in literally reducing the number of dead people.
How is this related to your work in an Agile team?
Imagine that your Definition of Done includes also a list of simple checklists to carry out in certain conditions. For example:
Checklist for teammate prior to the daily standup:
- Check that my task is on the board
- Check that my tasks on the board are correctly updated with remaining time and column (todo/doing/done)
- List my blockers, if any
- Practice that I can update in <1 minute
Or another example:
Checklist for new bug discovered:
- Reproduce the bug with a teammate. Validate it’s a genuine bug
- Update the PO
- For story related, place a bug sticker on the story
- For non-story related, add to backlog
- On scrum board, add note: tell all teammates about the new bug
That’s it. A five items list that, if you don’t do it today, could increase visibility, teamwork, quality, knowledge sharing and what not in your team. And imagine how this can be applied for inter-team collaboration.
Now here’s my educated prediction: Even if you agree with this blog-post, there’s a 95% chance that you will do nothing with it. Not a single actionable result. Here’s how I can predict it:
- I’ve seen this happen over and over again.
- Gawande himself acknowledges: most surgeons do not change their behaviour. Not even when people’s lives depend on it: “It never happens at my operating theatre” attitude. There must be a strong, compelling, change to make checklists stick.
In the aviation industry it was the crash of Boeing Model 299, aka B-17, aka Flying Fortress. The aircraft crashed on it’s second flight, due to an over-complicated operation of the machine. Gawande includes an elaborate description of the aftermath of the crash, and its before and after effect on flying airplanes.
Similarly to flying airplanes, operating on patients and building skyscrapers - our software industry is highly complex, and it’s impossible to forecast each and every complication. But for the ones that we are aware of, especially critically important and notoriously error-prone ones - there is a simple and effective and insanely cheap approach - a Checklist.
Will you give it an honest try?