In a message to the York Safety-Critical Systems Mailing List, Tracy White recounted a discussion with someone from the field of “Risk Management” who was taking a course he was giving on system safety. There is apparently a series of international standards, designated ISO 31000, on “Risk Management” (so says Wikipedia ). Tracy says

The term ‘risk’ in 31000 is described as the ‘effect of uncertainty on objectives’ where one of the ‘effects’ can be ‘a deviation from the expected’ (4360 describes it more succinctly as: ‘a chance of something happening’). These ‘risk’ definitions differ markedly from…..

…the standard definition which has been around for 300 years and 10 months: Abraham de Moivre, De Mensura Sortis, or On the Measurement of Chance, Phil. Trans. Roy. Soc No. 329, January, February, March 1711, reprinted with a commentary by O. Hald in International Statistical Review 52(3):229-262, 1984, which may be retrieved from JSTOR. The definition given there is, in modern terms, that risk is the expected value of loss. “Expected value” is a technical term from probability. I give the word-for-word de Moivre definition below.

This definition is also that used for “risk” in finance. See Peter L. Bernstein, Against the Gods: The Remarkable Story of Risk, John Wiley & Sons, 1996/1998. Which book, as the publisher proudly proclaims on the cover, was a “*Business Week, New York Times Business, and USA Today Bestseller*” and includes praise from reviews by Galbraith, Heilbronner, the NYT, the WSJ and The Economist on its cover. (Indeed, Bernstein is where I got my original lead to Le Moivre).

The meaning of the term in system safety is always close to that of de Moivre, but usually avoids the explicit arithmetic of finance, expected value of loss, by saying “combination of” likelihood and severity. There are good reasons for being somewhat vague, namely that in many cases in system safety the numbers are not there to enable a calculation of expected value. Especially, for example, in a completely new type of system. (An example I am currently working on is the recharging systems for electric road vehicles. There aren’t many around, so in particular there are no reliable numbers on frequencies of untoward things happening.) In response to this common situation, engineers have developed “qualitative” and “semi-quantitative” methods for assessing risk.

One of the issues then becomes what you take the word to mean in technical contexts. Any definition which is not equivalent to the expected value of loss defines a different concept from that, but the same word, “risk”, is used. For good reason: most definitions are conceptually related and the main issue is to get “close” while not having all the numbers.

So what do you do when some branch of human activity, indeed apparently some standard, takes the same word, “risk”, and uses it to mean something different? (I don’t actually know what “effect of uncertainty on objectives” is supposed to mean. I don’t see how “objectives” can be affected by uncertainty. I can see how your chances of attaining them are.)

Well, maybe you cite de Moivre, the finance industry, and system safety use, and say to your correspondant “*you mean something different. I think that is unhelpful; and indeed our notion has historical precedence, so for the purposes of this conversation let’s use a different word for your new notion.*” Or heshe could say the same to you. In any case, you agree to use two different words.

And for good measure, you write a blog post about it, as here.

This is not a new issue. Here’s a story from six and a half years ago. In the May/June 2005 issue of IEEE Software, Richard Fairley proposed a definition of risk for the Software Engineering Glossary of the IEEE (which is supposed to be canonical, although it turns out that Prof. Fairley doesn’t think so):

(Richard Fairley, proposed IEEE Software Engineering Glossary): The probability of incurring a loss or enduring a negative impact.

So a risk is a to be a probability, which means all risks have values between 0 and 1. Tell that to Lehmann Brothers. Well, I guess you can’t any more. Try Bear Stearns and Morgan Stanley. But we’re talking software, not money.

In common use, someone talking to his teenager speaking of “the risk of your not catching the bus in time” is likely talking about the chances of that event. Someone talking of “the risk that Lehman Brothers will go under” is likely also meaning the chances. But someone talking of “the risk of Lehmann Brothers going under” is likely also thinking of the repercussions as well as just the chances. So much meaning can a relative pronoun versus a copula+gerund carry! As with any other term you wish to be a technical term, you need to decide which meaning (of, here, two) you are going to use. And stick with it. What should be clear is that software engineers working in safety-critical systems need to speak both of likelihoods or chances, and about expected levels of loss. It seems obvious to use “chance” or “likelihood” or “probability” for the former, and some other word for the latter. Since it has been called “risk” for 300 years, why not carry on doing so? And so it is. But some people choose differently. If one is then going to use “risk” to mean “likelihood”, what word does one choose to mean the combination of likelihood and severity? There is not an obvious candidate. But you do need a word for it.

I wrote to the author, Prof. Fairley, Richard Thayer, the person overall responsible for the SW Glossary, and Merlin Dorfman, I believe the IEEE editor responsible for the section, pointing out de Moivre’s definition, the definition from Nancy Leveson’s book Safeware (Addison-Wesley, 1995), and that from the standard for functional safety of E/E/PE systems, IEC 61508, which all cohere modulo the caveats above.

Here is de Moivre:

The Risk of losing any sum is the reverse of Expectation, and the true measure of it is, the product of the Sum adventured multiplied by the Probability of the Loss

Here is Nancy Leveson:

the hazard level combined with (1) the likelihood of the hazard leading to an accident… and (2) hazard exposure or duration…

[The notion of hazard level is] the combination of severity and likelihood of occurrence.

Here is IEC 61508:

combination of the probability of the occurrence of harm and the severity of that harm

I also copied my note to Fairley in this note to the York Safety-Critical Mailing List.

Dorfman agreed that the definition could be misunderstood, but that “*I believe the reader is given a fair, complete, and accurate picture of the use of terminology in this area.*”. “Accurate”?

What do you do if you are a sofware engineer working in safety-critical systems? Use the IEEE SE Glossary definition, or use the IEC 61508 definition? Use different definitions for different meetings, depending on who is there? And what happens if you misjudge your audience?

Thayer was dismissive. The entire content of his reply:

The overall title of the glossary is Software Engineering Glossary. This covers it I believe.

In other words, he doesn’t care much for the dilemma of the software engineer working in safety-critical systems. One could well wonder why he is editing this vocabulary if he doesn’t care about such issues.

I responded to Thayer and Dorfman:

The use in finance and in PRA of the notion of risk equates it to the expected value of loss. A partial list of standards that use some version of this notion is

* IEC 61508, the international standard on functional safety of E/E/PE

safety-related systems

* IEC 300, the international standard on dependability management, in

Part 3, Section 9, “Risk analysis of technological systems”

* IEEE 1228, the standard for software safety plans

* the American Institute of Chemical Engineers guidelines for safe

automation of chemical processes

* US DoD MIL STD 882C, System Safety Program Requirements

* USAF Systems Command, Software Risk Abatement

* CENELEC 50129, Railway applications: Safety related electronic systems

for signalling (the European norm for railways; derivative from IEC

61508)

* European Space Agency Glossary of Terms

* UK Ministry of Defence Standards 00-56, safety

management requirements for defence systems; and Def Stan 00-58,

HAZOP studies on systems containing programmable electronics

* German Standards Institute (DIN), DIN-V-VDE 0801, Principles for

computers in safety-related systems

In particular, I expressed my concern that the IEEE as an organisation had publically given two meanings for risk pertaining to software engineering: one in IEEE 1228 on software safety plans, and another in the Glossary proposed by Prof. Fairley. I got no response.

Prof. Fairley responded, inter alia:

Concerning my definition of risk: In most, if not all, situations encountered in software engineering, “risk” is the composite result of numerous factors. In the glossary, I characterize these as “risk factors,” each of which is assigned a probability and an impact (or a range of each). Risk factors are usually interrelated (e.g., an inaccurate size estimate affects schedule, budget, memory usage; an inaccurate schedule estimate affects product quality) so overall risk (i.e., probability of suffering loss) must be calculated using conditional probabilities or Bayesian analysis. It is not possible to characterize a situation by a simplistic pair of numbers, unless one is dealing with a narrow, well-defined situation such as a game of chance. It is dangerous and misleading to attempt to characterize a complex situation in this way.

Given the constraints of a glossary, it was not possible to explain the rationale for my definition or why it differs from the traditional definition; nor was it possible to explain the basis of definition for the other terms in the glossary.

Which to my mind is confused. If risk is “the composite result of a number of factors” each of which is “assigned a probability and an impact”, why ignore the impact and define it as a probability? Either it is a probability simpliciter, or it is the composition of a number of items, each of which exhibits a probability and an “impact”. It can’t be both.

That was it. End of story. The section editor thinks the definition is “accurate”; the Glossary editor is unconcerned; the author is confused. No one seems to worry about the IEEE proposing two incompatible definitions of risk in software contexts.

I wrote to some colleagues I thought might be interested: Dave Parnas, John Knight and Bev Littlewood (as well as a couple of German colleagues), explaining my dissatisfaction with this state of affairs.

Dave sympathised with my frustration, which was similar to his. He said he had seen lots of examples, and that he considered trying to write a glossary for SW terms a fool’s errand, and explained why. John thought this situation to be serious, the Fairley definition of risk wrong, and deserving of public correction. He also said that many people are concerned about a lack of precision and took Dave’s comments to reflect that. Bev strongly agreed with both John and Dave. He was particularly concerned about the dismissive response.

Continuing along the same lines, here is the definition of risk from the US National Research Council study Understanding Risk: Informing Decisions in a Democratic Society (National Academies Press, 1996), p215 (you can read this study on-line):

A concept used to give meaning to things, forces or circumstances that pose danger to people or to what they value. Descriptions of risk are typically stated in terms of the likelihood of harm or loss from a hazard and usually include: an identification of what is “at risk” and may be harmed or lost (e.g., health of human beings or of an ecosystem, personal property, quality of life, ability to carry on an economic activity); the hazard that may occasion this loss; and a judgement about the likelihood that harm will occur.

So descriptions include a likelihood of harm and an identification of what may be harmed or lost. Unless you are a software engineer using the IEEE Glossary (but not IEEE 1228), in which case it’s just a number between 0 and 1.

Here is the definition from a standard text, Probabilistic Risk Assessment and Management for Engineers and Scientists, Hiromitsu Kumamoto and Ernest J. Henley, IEEE Press (them again!) 1996, a book “sponsored by the IEEE Reliability Society”, p2:

Primary Definition of Risk: A weather forecast such as “30% chance of rain tomorrow” gives two outcomes together with their likelihoods: (30%, rain) and (70%, no rain). Risk is defined as a collection of such pairs of likelihoods and outcomes:

{(30%,rain), (70%, no rain)}

So they don’t even go for the *combination* of likelihood and outcome, nor do they designate certain outcomes as harmful. But if you do designate certain outcomes as harmful, then you can combine these values to calculate de Moivre risk and system-safety risk from this set.

The standard textbook Probabilistic Risk Analysis: Foundations and Methods, Tim Bedford and Roger Cooke, Cambridge University Press, 2001 (not the IEEE for a change 🙂 ), discusses the definition of risk over some three pages in Section 1.2. They base their notion on that of S. Kaplan and B.J. Garrick, On the Quantitative Definition of Risk, Risk Analysis 1:11-27, 1981.

A risk analysis tries to answer the questions

(i)What can happen?

(ii)How likely is it to happen?

(iii)Given that it occurs, what are the consequences?

Kaplan and Garrick … define risk to be a series of scenarios s_i, each of which has a probability p_i and a consequence x_i.If the scenarios are ordered in terms of increasing severity of the consequences, then a risk curve can be plotted [of severity against probability of at least that level of severity]. The risk curve illustrates what is the probability of at least a certain number of casualities in a given year. Kaplan and Gattrick…. further refine the notion of risk in the following way [to talk about frequency of an event instead of probability, and then uncertainty associated with a frequency]

Again, this concept is somewhat different from that of a number between 0 and 1.

John suggested I contact the then-editor of IEEE Software, Warren Harrison, which I did. Warren suggested that the appropiate action would be a letter to the editor, allowing the author and the section and glossary editors to respond if they wished.

I never did so. I regret it.

So six and a half years later, here I am writing a blog post on it. I doubt the issue will go away. Neither will this note. I do think the IEEE should work to get its definitional house in order.