This blog posting represents the views of the author, David Fosberry. Those opinions may change over time. They do not constitute an expert legal or financial opinion.

If you have comments on this blog posting, please email me .

The Opinion Blog is organised by threads, so each post is identified by a thread number ("Major" index) and a post number ("Minor" index). If you want to view the index of blogs, click here to download it as an Excel spreadsheet.

Click here to see the whole Opinion Blog.

To view, save, share or refer to a particular blog post, use the link in that post (below/right, where it says "Show only this post").

Boeing 737 Max - How Is Aircraft Safety Ensured?

Posted on 14th April 2019

Show only this post
Show all posts in this thread.

There has been a steady drip-feed of news about the safety of Boeing's 737 Max aircraft since the Ethiopian Airlines crash. This report, on the BBC, looks at the possible effect of the two crashes, on Boeing.

Some readers may not know so much about how aircraft designers ensure that their planes are safe. Having worked in the avionics industry, I thought that I would explain some of the basic techniques.

Part of the news piece states that "The new anti-stall mechanism on the Max relied on data from one single sensor at the front of the aircraft". This would be against policy and design guidelines. For safety critical systems, including flight control systems, redundant systems, including redundant sensors, are required: normally 3 systems or components (like sensors), so that in the event of an error or failure in one, the output of two correct systems will be selected by a voting system. Reports from other news sources suggest that the Max has multiple angle of attack sensors; the issue seems to be deciding what to do when the sensors disagree, which just seems to be bad design.

Given that design, coding and construction errors will always exist in complex systems, how do aircraft companies avoid crashes? The answer is by doing failure modes analysis. Failure modes analysis is a laborious process in which engineers imagine all the possible things that could go wrong (including multiple different failures) and then analyse how the systems will react and cope with those failures. This technique requires people (cannot be automated, even by AI) with good imagination, even paranoia, as well as an understanding of all the systems involved. It is expensive and complex, and sometimes things get overlooked, which often eventually leads to people dying or being injured.

If a proper failure modes analysis had been done for the Max's anti-stall system, the impact of one or more failed sensors would have been identified, and the necessary redesign would have been performed, this eliminating the issue. While no failure modes analysis is simple, what would be needed for the anti-stall system is far simpler than many on an aircraft like the 737 Max. The obvious conclusion is that either the analysis was not done, or more likely it was done badly.

There are, of course, many other ways that safety is assured in aircraft and other safety critical systems:

Peer review of requirements specifications. The creating of executable requirements specifications. Prototyping of the systems, involving creating a program, independently of the final design that will be put into the aircraft, that fulfills some of the requirements of the actual system, albeit not as fast nor as completely as the final system. Peer review of designs. Peer review of code, electrical design and of mechanical design. Various different kinds of testing of system components, and whole systems.

Many companies have also dabbled in formal methods: the use of mathematically based languages and methods to achieve "right first time" design. I have worked with such methods and languages; they are not yet good enough.

There are two different perspectives used in the above: validation (did I build the right thing?) and verification (did I build it right?). The inherent flaw with most of the methods listed above is that they depend on people, so things may be missed or misinterpreted; sometimes things are, therefore, missed or misinterpreted. This is the reason for the interest in formal methods, to take people out of the equation, to some extent.

For safety critical systems like aircraft, nearly all the quality assurance methods listed above are mandatory (mandated by certification authorities like the FAA), although not formal methods, executable specifications nor prototyping.

The bottom line is that, despite the huge effort, and therefore cost, applied to making systems safe, there is always a chance that a dangerous error finds its way into a product. The cost of trying to assure safety in systems is normally the majority of the cost of creating those systems, and even this is not always enough.

The other basic problem is that projects are always delayed, and over budget. When this happens, testing and other verification and validation activities get trimmed: less time, and fewer resources. The results of this are inevitable: failures and accidents.