with Martyn Thomas, co-author.
[A couple of weeks ago, Martyn Thomas and I were contacted by a journalist for the German weekly Der Spiegel. He asked me a question which I found hard to answer for non-specialists: what are “formal methods?” Here is the answer which Martyn and I supplied.]
There has long been a view held by prominent computer scientists, such as Turing-Award winner Professor Sir Tony Hoare, that software, a computer program running on a physical computer, is predominantly mathematics. This view opens up the possibility of proving mathematically that a program is free from errors.
For many years, this view has contrasted with the need of computer-programming companies to employ increasing numbers of people with an aptitude for writing “high-level” computer programs to get their large systems written and sold. This traditional method can be summarised as “write-what-you-think-will-work, test, and fix”. Its use often results in systems which exhibit unresolvable problems and are deployed very, very much later than planned, if at all – some estimates suggest that over 50% of SW which involves more than a few million lines of “high-level” computer source code is never put into service. That seems to many software engineers to be a terrible waste of human effort. The traditional method has no way of showing that a program is free from errors; indeed, it is expected that errors will be found, and then fixed, until no more errors show up. But it was estimated by work at IBM two decades ago, which looked at a specific highly-used complex system, that about a third of the errors that manifest themselves would only show up less frequently than once every billion hours or so. That means you see them once and they don’t turn up again – but they are nevertheless a third of the all errors you see, so even if you carefully eliminate all errors you see, including the more frequent ones, that third stubbornly remains.
General estimates of software quality suggest that one can expect about ten errors per one thousand lines of “high-level” computer code. For safety-critical systems, industry-insider estimates suggest a level of between one and three errors per one thousand lines of code, which is much better, but it is still not good enough for systems whose software could possibly induce dangerous behavior, because once a computer encounters and follows a software error, there are no good ways to predict how that computer will behave. This radical unpredictability following a fault is one of the major ways in which software-induced behavior differs from that of purely physical systems, which generally behave in more predictable ways when they fail.
The mathematics of statistical testing show that one can improve confidence that a program really does what is wanted – and, more importantly for safety, doesn’t do what is not wanted – with fault rates up to about one in one hundred thousand hours of operation. But that would be too high a fault rate for many critical systems – consider that the flight control system of the Airbus A320 aircraft, for example, has flown some 50-60 million operational hours, without any accident attributable to a fault in the control system. Given a desired ultra-low fault rate, say of that order, a testing regime to increase confidence that such a fault rate has been attained would have to test for at least the entire planned life of the system. That means, in practical terms, that the system’s operation and deployment would just be one big test – and what happens if it fails? We need to have the required confidence before a system is deployed that is of a certain quality. The mathematics says we can’t get that through testing if we want ultradependable systems, as most safety-critical systems are expected to be.
People who accept the view that computer programs are mathematics have been working on that mathematics, and developing tools to aid with it. This effort has been going on seriously, in avionics for example, for over 30 years – ever since the Californian consulting company SRI International was awarded the first contract by NASA to prove using mathematical logic that the operating system for the first digital-computer flight control system, SIFT, did exactly and only what its careful technical specification said it should do. That effort failed, but the lessons learned have informed progress in the mathematical view of computer software for three decades. Two decades ago, the traditionalist head of the U.S. Department of Defence Advanced Research Projects Agency’s Information Science and Technology Office – the same organisation responsible for the idea and technologies of the Internet – was reputed to have said, of the mathematical approach to software, “formal methods don’t work”. In 1994, a bug was found in Intel’s Pentium processor, which affected the correctness of certain division operations. The analysis that followed, the most significant of it pro bono work by academics and engineers, resulting in Intel replacing the chips, at a reputed cost of some $400 million or more. The company decided they needed better quality assurance in their designs, and started a formal methods group. Today, no complex chip may be effectively designed without using such methods.
The logic of chips is simpler than that of general software, and so it is easier to develop effective formal methods for chip design than it is for general software. However, the approach is the same. We write down in mathematics what we want the program or the chip to do – its requirements specification, as we call it – and if we design the program using design languages that are also a form of mathematics, then we have the possibility to prove that the final program does exactly what was required, no more and no less, and if we make any mistakes the mathematics will quickly point them out. Formal methods consists largely of the science and use of these mathematical languages for specifying, designing and analysing computer programs, and the various methods for proving that a program is free from various kinds of errors, or, with sufficient effort, that it is free from all. Nowadays, companies that use formal methods are able to produce very much higher quality software – software with far fewer, even no, errors in it – than is produced by the majority of companies. Data maintained by companies in the forefront of the use of formal methods show that they can supply code to quality levels of better than one error per twenty-five thousand lines of code, some twenty-five to seventy-five times better quality than by using traditional methods. And, crucially, formal methods can provide assurance that a certain quality of code has been achieved. Traditional methods cannot provide such assurance.
Consider, for example, the 100,000 lines of program code which controls the engines on the Boeing 777 aircraft that crashed at London Heathrow airport in January 2008, when the engines did not respond to commands to increase thrust shortly before the aircraft was due to land on the runway. If that code had been developed using careful, traditional methods of critical-software development, it seems we could expect about 100 errors in it. And what of the millions of lines of code in the flight control systems of a modern Boeing or Airbus computer-controlled aircraft? Would we be content flying with the thousands of errors that we might expect from traditional software code development? It is no wonder that these companies are steadily increasing use of formal methods in their systems development.
Indeed, nobody we know is content with levels of error of one in a thousand lines in critical software, not even those who develop it using traditional methods. If we want to improve the quality of such software, and to be assured of that quality, to the point at which there are no errors, then there is no alternative to using formal methods. These days, formal methods are well supported with computer-based tools that do much of the checking and proving automatically, which means that software written using formal methods can also be much cheaper to produce, as well as much higher quality, than software using traditional methods, because most of the cost of writing software using traditional methods comes from finding and fixing the mistakes that programmers make.