Safety and “AI Safety”

Robert Schaefer just pointed me (via a mailing-list note) to a list of features of “AI safety”, via a reference in the blog of Victoria Krakovna at https://vkrakovna.wordpress.com/2018/11/01/discussion-on-the-machine-learning-approach-to-ai-safety/ . The features of “AI safety” pointed to, from http://www.foldl.me/2018/conceptual-issues-ai-safety-paradigmatic-gap/ , are

Short-term: This work involves immediately practical safety risks in deploying machine learning systems. These include data poisoning, training set inference, lack of model interpretability, and undesirable model bias.¹

Mid-term: This work targets potential safety risks of future AI systems that are more powerful and more broadly deployed than those used today. Relevant problems in this space include scalably specifying and supervising reward-based learning, preventing unwanted side effects, safely generalizing out of domain, and ensuring that systems remain under our control.

Long-term: This theoretical work addresses the risks posed by artificially engineered (super)intelligences. It asks, for example, how we might ensure that a system is aligned with our values, and proposes procedures for conserving this alignment while supporting recursive self-improvement.

Which leads to the list of supposedly “safety” phenomena:

data poisoning
training set inference
lack of model interpretability
undesirable model bias
scalable specifying and supervising reward-based learning
preventing unwanted side effects
safely generalising out of domain
ensuring that systems remain under our control
systems aligned with our values
procedures for conserving this alignment

It is unfortunately common for people not working in system safety to use the word “safety” to describe any desirable features of systems with safety-related aspects. For example (and often) their reliability. We have been making the distinction between safety and reliability for decades, to little avail it seems. Let us therefore go at it again.

Safety concerns avoiding harm to people or the environment, and/or damage to objects. The IEC defines it in a rather convoluted way as “freedom from unacceptable risk” (e.g., IEC 61508-4: subclause 3.1.11, referring to ISO/IEC Guide 51, the guidelines for items which need to be in any safety-related electrotechnical standard) and defines risk as “combination of the probability of occurrence of harm and the severity of that harm” (e.g., IEC 61508-4: subclause 3.1.6, also referring to ISO/IEC Guide 51) and harm as “physical injury or damage to the health of people or damage to property or the environment” (e.g., IEC 61508-4: subclause 3.1.1, also referring to ISO/IEC Guide 51).

This list of supposedly “safety” phenomena associated with so-called AI systems (really what is meant here is some form of system involving deep-learning neural networks in real-world situations) are also phenomena associated with the system doing what we want of it. That is reliability. Reliability is the “ability to perform as required, without failure, for a given time interval, under given conditions” according to the IEC (definition 192-01-24 of the International Electrotechnical Vocabulary, available at http://www.electropedia.org/iev/iev.nsf/display?openform&ievref=192-01-24 )

Reliability and safety of a system are not the same. Indeed, sometimes they are inverses of each other. Consider you are tied to a chair, with a gun pointed at your head, attached to a system which will fire the gun when you blink your eyes. The safety of this system concerns how likely you are not to die or be injured. This chance is inversely proportional to the reliability of the system.

The Abnormal Distribution

We distribute Thoughts