Safety and “AI Safety”

Robert Schaefer just pointed me (via a mailing-list note) to a list of features of “AI safety”, via a reference in the blog of Victoria Krakovna at . The features of “AI safety” pointed to, from , are

Short-term: This work involves immediately practical safety risks in deploying machine learning systems. These include data poisoning, training set inference, lack of model interpretability, and undesirable model bias.1

Mid-term: This work targets potential safety risks of future AI systems that are more powerful and more broadly deployed than those used today. Relevant problems in this space include scalably specifying and supervising reward-based learning, preventing unwanted side effects, safely generalizing out of domain, and ensuring that systems remain under our control.

Long-term: This theoretical work addresses the risks posed by artificially engineered (super)intelligences. It asks, for example, how we might ensure that a system is aligned with our values, and proposes procedures for conserving this alignment while supporting recursive self-improvement.


Which leads to the list of supposedly “safety” phenomena:

  • data poisoning
  • training set inference
  • lack of model interpretability
  • undesirable model bias
  • scalable specifying and supervising reward-based learning
  • preventing unwanted side effects
  • safely generalising out of domain
  • ensuring that systems remain under our control
  • systems aligned with our values
  • procedures for conserving this alignment

It is unfortunately common for people not working in system safety to use the word “safety” to describe any desirable features of systems with safety-related aspects. For example (and often) their reliability. We have been making the distinction between safety and reliability for decades, to little avail it seems. Let us therefore go at it again.

Safety concerns avoiding harm to people or the environment, and/or damage to objects. The IEC defines it in a rather convoluted way as “freedom from unacceptable risk” (e.g., IEC  61508-4: subclause 3.1.11, referring to ISO/IEC Guide 51, the guidelines for items which need to be in any safety-related electrotechnical standard) and defines risk as “combination of the probability of occurrence of harm and the severity of that harm” (e.g., IEC  61508-4: subclause 3.1.6, also referring to ISO/IEC Guide 51) and harm as “physical injury or damage to the health of people or damage to property or the environment” (e.g., IEC  61508-4: subclause 3.1.1, also referring to ISO/IEC Guide 51).

This list of supposedly “safety” phenomena associated with so-called AI systems (really what is meant here is some form of system involving deep-learning neural networks in real-world situations) are also phenomena associated with the system doing what we want of it. That is reliability. Reliability is the “ability to perform as required, without failure, for a given time interval, under given conditions” according to the IEC (definition 192-01-24 of the International Electrotechnical Vocabulary, available at )

Reliability and safety of a system are not the same. Indeed, sometimes they are inverses of each other. Consider you are tied to a chair, with a gun pointed at your head, attached to a system which will fire the gun when you blink your eyes. The safety of this system concerns how likely you are not to die or be injured. This chance is inversely proportional to the reliability of the system.

Leave a Reply