1. In 2019, Andrew Odlyzko published a paper in ACM Ubiquity in which he argued that cybersecurity was not as big a deal as some prognosticators had claimed http://www.dtc.umn.edu/~odlyzko/doc/cyberinsecurity.pdf There are a number of insights, as well as some questionable arguments. I find it worth commenting. Paragraphs are numbered. I abbreviate the author’s name to AO.
2. AO claims that there has been so far no big cybersecurity disaster, on the level of Fukushima, Hurricane Sandy, 9/11 or the 2008 crash. One might add to that list the Chernobyl accident and Hurricanes Maria, which devastated Puerto Rico in 2017 and Katrina, which devastated New Orleans and nearby parts in 2005 . AO suggests it will continue to be that cyberinsecurity incidents will not match such events in damage.
So, what is the comparative damage? And could it be that a cybersecurity incident could match it?
The IEC considers harm to be injury or death of persons, damage to property or damage to the environment. Recent “large” incidents are the Solarwinds SW compromise, and the Centennial Pipeline ransomware attack. The Centennial attackers are reported to have elicited a ransom payments of around $5m in cryptocurrency, much of which has been recovered by US law enforcement, despite the supposed tracelessness of such currency. I have found no estimates of the cost of the Solarwinds compromise. We can take ransom payments as proxy for the value to the organisations of having the residual parts of their systems restored that they could not accomplish themselves. However they are not necessarily an accurate proxy for damage, since the organisations will have put considerable resources into assessing their situation and restoring those parts of their systems which they could, as well as suffering loss of business for the time down, and various other costs which are generally not made public. To our knowledge, no persons were harmed or died directly as a consequence of either of these attacks. The NotPetya malware has been estimated to have cost organisations “over $10 bn” https://en.wikipedia.org/wiki/Petya_(malware) ; the Wannacry malware has been estimated at “hundreds of millions” up to $4 bn https://en.wikipedia.org/wiki/WannaCry_ransomware_attack .
Wikipedia estimates the damage of Hurricane Katrina to be just over 1,800 dead and $125bn https://en.wikipedia.org/wiki/Hurricane_Katrina ; of Hurricane Maria to be just over 3,000 dead and over $91 bn https://en.wikipedia.org/wiki/Hurricane_Maria ; of Hurricane Sandy to be 268 dead and over $68 bn https://en.wikipedia.org/wiki/Hurricane_Sandy . The Fukushima accident resulted in 1 official death so far, and has/will cost the Japanese government at least $187 bn in cleanup and restoration work https://en.wikipedia.org/wiki/Fukushima_Daiichi_nuclear_disaster#Compensation , but this does not take into account any long-term effects on the marine environment. It is generally accepted that the costs to human society of the 2007-8 financial crash in monetary terms are orders of magnitude higher than any of these disasters.
AO seems clearly to be right that cybersecurity “disasters” so far are dwarfed in terms of damage by Hurricanes, nuclear meltdowns and the collapse of trust in finance through the (mis)use of derivatives, especially CDOs, in 2007-8. Dwarfed.
Could that change?
Suppose something like the 2003 NA power outage is caused, which I take to be quite plausible https://en.wikipedia.org/wiki/Northeast_blackout_of_2003 . The outage lasted generally a few hours, but in some places rather longer. It resulted from people (not) reacting to two misleading pieces of information on computer systems that had not been regarded as critical to operations and serviced accordingly (Peter Bernard Ladkin, Understanding the System: Examples, talk given 2015-06-15 to the UK Safety Critical Systems Club, London UK; Peter Bernard Ladkin & Bernd Sieker, Resilience is an Emergend System Property: A Partial Argument, in Mike Parsons and Tom Anderson (eds), Developing Safe Systems, SCSC-13, 2016, available through https://scsc.uk/scsc-131 ). The two “official” reports are: North American Electric Reliability Council, Technical Analysis of the August 14, 2003, Blackout: What Happened, Why, and What Did We Learn? 2004-07-13, available through https://www.nerc.com/pa/rrm/ea/pages/blackout-august-2003.aspx . And U.S.-Canada Power Outage System Task Force, Final Report on the August 14, 2003, Blackout in the United States and Canada: Causes and Recommendations, available at https://www.energy.gov/sites/prod/files/oeprod/DocumentsandMedia/BlackoutFinal-Web.pdf . Neither of these documents highlight the criticality of the MISO state estimator. An analysis of the full System including a subsystem criticality analysis is not present in either.
It is quite possible that an adequate full-system-level analysis of the various parts of the North American electricity grid has still not been performed and the vulnerabilities mitigated. It is thus quite possible that another accidents, or a cyberattack, could have similar effects. If recovery were to be actively hindered, so that the outage lasted almost everywhere days, or even weeks, then one can conceive that many people could die in, for example, hospitals when the UPS runs out. Transport will be hindered when no gas can be pumped. We are a lot less resilient against electric-power outage than we were 50 years ago, as Roger Kemp’s observations and subsequent workshop on the Lancaster outage showed (Roger Kemp, Power cuts – a view from the affected area, December 2015, available at https://rvs-bi.de/publications/Reports/KempLancasterPowerCuts201512V3.pdf ; Royal Academy of Engineering, Living without electricity: One city’s experience of coping with loss of power, 2016, available from https://www.raeng.org.uk/publications/reports/living-without-electricity ). I don’t see an a priori argument that such an event would not compare in terms of death toll and damage with some of the Hurricanes mentioned above.
It could be argued that a large-scale power outage whose recovery was hindered is an act of war justifying retaliation. Well, maybe, but also maybe not. Maybe someone’s experiment just got out of control, as reputedly happened with the Morris Worm in 1988 https://en.wikipedia.org/wiki/Morris_worm .
I thus don’t see good reason to think that cybersecurity events will stay at their current relatively low level of damage compared with some other disasters.
2. AO considers the mantra “engineer our systems from the ground up to be truly secure” , nowadays often called “security by design”, and indirectly suggests this comes from Willis Ware’s 1970 observations. He suggests that it really can’t be done well: “we don’t know how to build secure systems of any real complexity [and if we did they would likely be unusable].”
I think we can agree with the latter sentiment, and also that the 1970 thinking was wishful. But that doesn’t entail that much cannot be done.
Many, even most of the vulnerabilities listed in ICS-CERT Advisories concern mismatch between actual input data and that expected by the subsystem. The first examples of such vulnerabilities were called buffer-overflow exploits, and their descendants prevail in by far the majority of the Advisories. Countermeasures, ensuring that the format of input data matches the format expected by the program, is known as strong data typing. Strong data typing has existed in some programming languages for over half a century. It is not there in many programming languages commonly used today, such as C and C++. It could be. In that case, most vulnerabilities in OT subsystems would simply go away. Extended strong data typing is not unachievable.
AO is quite aware that extended strong data typing is achievable. However, he suggests that the observation that many or most security vulnerabilities are violations of data typing, along with the general non-use of 2FA, are examples showing that cybersecurity isn’t really very important for us.
The corollary of that would surely be: where it is important, we use 2FA and strong typing. But in fact we don’t. The observations about ICS-CERT Advisories just above suggests that OT is vulnerable to cyberattack, even a very significant attack. The lack of attention to strong data-typing in OT kit is a phenomenon that requires explanation; it is not a phenomenon from which we can conclude that we really don’t care that much. I suggest below that this situation is explained by externalities (in the economic sense).
3. AO suggests we can in fact improve general cybersecurity by learning to love “spaghetti code” and “security through obscurity”. By “obscurity” he seems to be thinking mainly of code obfuscation.
The idea is that if it is very hard to figure out what the code does, then it is very hard to change it so that it does something else, so you put off most hackers. The counter would be that you generally can’t use spaghetti code and obfuscation in contexts in which you have to establish the reliability of your SW against a third party. Coding and documentation standards apply.
For example, IEC 61508, governing the functional safety of programmable electronic devices, requires considerable inspection of the code documentation, and does not allow you to qualify code just by running it and seeing what it does. The dispute that culminated in Bookout/Toyota showed that Toyota did not understand their code. Toyota claimed, but was unable to show, that their product was not defective in the way asserted by the plaintiffs, because, as the plaintiffs showed clearly, it was indeed defective in that way (Michael Barr, An Update on Toyota and Unintended Acceleration, 2013-10-26 https://embeddedgurus.com/barr-code/2013/10/an-update-on-toyota-and-unintended-acceleration/ . Phil Koopman, A Case Study of Toyota Unintended Acceleration and Software Safety, 2014-09-14 https://users.ece.cmu.edu/~koopman/pubs/koopman14_toyota_ua_slides.pdf ). What has happened since in Toyota product development is unknown to me (and presumably to many users of their products). Free market capitalism obviously did not ensure that such products are free of dangerous defects. The Toyota code did not fall under the IEC 61508 standard, as far as we know (the automobile industry has its own functional safety standard for electronic devices, ISO 26262, but this came to relevance later than the first editions of Toyota’s code, as far as we know).
Besides, as Martyn Thomas points out (private communication 2012-08-12) you don’t need to understand what code does to detect the possibility of and to exploit a buffer overflow. You fuzz to create the failure, which may be all you want to achieve. If you can accesss the binary of the resultant crash you can often see how to execute arbitrary code. Obfuscation therefore does not necessarily hinder exploitation. For the most common exploits, namely data-typing violations, you can search even spaghetti code for input statements and see/test whether they are protected.
4. AO remarks that “Linus’s Law”: that, “given enough eyeballs, all bugs are shallow” is fallacious. It is true that there are non-shallow bugs all over the place, including in the Linux kernel, and it also seems to be true that malfeasant actors in the Linux kernel maintenance can effectively insert vulnerabilities (some non-malfeasant actors recently tried and succeeded, before announcing what they had done before the release went live). AO also mentions the Heartbleed vulnerability as something that had been around in key, well-used SW for a while. I guess the key observation is, not that the law is fallacious, but that it equivocates on what is “enough”. In modern systems building and use there are rarely if ever “enough eyeballs”. And the eyeballs there are may not look hard, or may not recognise a problem.
5. AO thinks we have to learn how to live with certain vulnerabilities, such as Heartbleed and Spectre. I suppose we do, but if so it would help to have suggestions as to how. Except for a quote from Dr. Strangelove at the beginning, consonant with a popular 1988 song by Bobby McFerrin, “Don’t worry, be happy”, there are unfortunately no such suggestions.
Martyn Thomas and I disagree with this laissez faire. There are classes of vulnerability that could be abolished, such as violations of extended strong data typing. If we can do that, why not do so, before we concern ourselves with how we live with what’s left?
Generally, I am not sure the cost/benefit calculation for cybersecurity has the right weighting for externalities. If enhanced strong-data typing were enforced (say by law), then large numbers of vulnerabilities would simply disappear – they would either be fixed, or their perpetrators (the equipment suppliers) would lose business. One of the reasons that such measures don’t make it into law is that some suppliers have very effective lobbyists.
The conclusion to draw is surely not that we continue to have vulnerabilities because “it’s not worth it” to fix them, but that equipment suppliers have an engineering culture in which those vulnerabilities currently occur and the incentive is lacking for them to change that culture.
So the question arises how to deal with this phenomenon? How to change the culture?
AO seems to me to accept such externalities as given. Externalities in cybersecurity economics are ubiquitous. If you sell a company an Email client and say “make sure you’re safe“, when you can at the same time reasonably assume that some of their staff are vulnerable to phishing, then the costs of successful phishing are – currently – external to your transaction. You can make even more money if you have a Cybersecurity Division which can help such phishing victims, for a price. If there were, for example, SW product liability of the sort that there is for most products that are bought and sold, then such phenomena would no longer be external. You’d either rewrite your code to sandbox email messages firmly, or go out of business.
6. AO refers to a talk by Dan Geer (henceforth DG) in 1998, reprinted in Risks Forum Digest 20.06 (Dan Geer, Risk Management is where the money is, Risks Forum Digest 20.06, 1998-11-12, available at https://catless.ncl.ac.uk/Risks/20/06 ). Many of DG’s pithy observations are dated, as one might expect nearly a quarter-century on. But his key point is that managing trust is the key to security in commerce, and will be so for ecommerce (which wasn’t really going strong when he wrote). However, he notes that it is often easier to deal with a phenomenon through its obverse, which in this case is risk management (risk apparently being for Geer the obverse of trust). Hence his title.
DG observes that all major financial institutions concentrate on risk management rather than trust management. It is a worthwhile observation but we am not sure it is universally so – it might be more right in the US than, for example, in Germany. I live in Germany, and have done for a quarter century, with the same bank throughout, who also financed my purchase of the building in which I lives. In all of my transactions with the bank, that require multiple signatures, physical presence (notarial transactions, required for real estate) and so on, what is being directly managed is trust. I have dealt with a succession of bank people who hand off to one another, so I have a running reputation built up over a quarter century, which establishes appropriate trust for my current financial needs. They trust me because they trust their own internal processes. I recently sold part of my building as an apartment to a friend. The process took nearly 10 months from the time the sales contract was signed. (Such a contract includes specific surveying details, because ownership and liens are recorded centrally by the city authorities according to the survey which they hold. There was some negotiation with the city legal department about some surveying anomalies and how they were handled in the contract, and thus a modified contract had to be written). The entire process is legally engineered to be secure for all parties at every step of the way – there are many steps, and all are legally and relatively simply reversible. That surely is trust management, not risk management.
So risk management is not always the way to manage trust. Sometimes trust can be managed directly.