The Challenger and Holistic Monitoring: a true story

challenger-explosion

In January 1986, the technicians on the Space Shuttle project knew that one of the critical components of the Challenger was going to fail. The O-rings of the rocket engines would not be reliable if it was cold. And below zero temperatures were forecasted for the day of the launch. The problem was that they had only eleven hours to convince the NASA command to cancel the launch, which had already been postponed several times.

Unfortunately, they used the following picture as visual support to their explanation. And the launch was not cancelled, resulting in one of the biggest tragedies in space history, with the destruction of the spaceship within 73 seconds after takeoff, the loss of its crew and the shuttle project being grounded for more than two years.

What is the problem with this picture? Basically, it doesn’t make the relationship between the temperature and the possibility of gasket failure evident.

The following diagram is more revealing. Above, in red, the damaged O-rings are shown, and below, in green, the ones that worked. You can clearly see that as the temperature rises (to the right) the failure rate of the O-rings is low or zero, while as it descends (to the left) this percentage increases sharply.

Correlation_Diagram

Now it becomes obvious that there was no problem above 75°F (24°C) and that, on the other hand, below 63ºF (17°C) almost all O-rings failed. Even worse, the expected temperature for takeoff was 29ºF (-1°C), with ice menacingly covering the rocket side.

What is the Challenger doing in this blog? Well, basically, what we have here is a Visibility problem: as long as there is no Visibility, making the right decision is much more difficult. Visual correlation makes the problem become obvious.

During the last Monitoring Symposium at Temaikén I used this example (taken from the  doctoral thesis of  my friend Rogelio Adobbati; thanks, Roge) when presenting our concept of Holistic Monitoring.

Our visual perception is powerful: the combined visualization of business and activity outcomes (KPIs), user experience, specific controls and risk indicators (KRIs), and the status, availability and impact of the applications and technical infrastructure, in a single panel and in an appropriate manner, makes it much easier to govern a complex service from the correct point of view.

And what is the correct point of view? The view of the service receiver. Always.

Without Visibility, there is no agility.

Another example of the importance of Visibility would be the solution of the last week’s problem. But I will, cruelly, save that for another day…

3 thoughts on “The Challenger and Holistic Monitoring: a true story

Let me know what you think