I was reading this book right in the middle of my decision to try to take more notes while reading, so I have only sporadic notes to include below.
Before I share my few notes, I’ll outline my key takeaways from the book:
I always thought of superintelligence in terms of AI, but I didn’t consider other ways to achieve it. One area discussed briefly in the book was achieving superintelligence through gene editing and eugenics (not condoning it, just saying it’s a possible means). This would be a slower method than AI given how long it takes to raise kids, but it’s plausible that within a century, we could achieve superintelligence as a species – at a high moral/ethical cost presumably. Once the ball is rolling (which it already is with gene editing) it would be difficult for countries to see one country gaining competitive advantages from exceptionally smart humans and not have the temptation to indulge in it themselves.
Another key takeaway I had was the idea of an intelligence explosion. Right now we use AI and it is relatively dumb and narrowly focused. Improving AI today makes machines safer for humans generally such as the example of improving autonomous cars. But at some point, we will develop an AI to the point that it becomes exceptionally smart to the point that it can figure out ways to improve itself, which it will utilize in order to more efficiently optimize for its motivations, and we enter into a loop of rapid improvement that results in the intelligence explosion. A sufficiently smart AI could conceal the fact that it is extremely smart and manipulate and motivate humans to do things for it as well that may augment limitations it has in the early stages.
I thought it was interesting that in almost all cases, the author predicts expansion into space to be a big part of an intelligent explosion, which makes sense as an AI would seek out more and more resources needed to achieve its goals.
This book did a great job of highlighting the dangers of AI. There is so much we don’t know or can’t predict from an intelligence explosion, but the author explores many scenarios that often read like science fiction, but are explained rationally. An intelligence explosion would be unprecedented. To borrow from some thoughts from Shoshana Zuboff in her book “The Age of Surveillance Capitalism”
When we encounter something unprecedented, we automatically interpret it through the lenses of familiar categories, thereby rendering invisible precisely that which is unprecedented.
Unprecedented situations mean we as humans don’t think entirely rationally about them as we try to frame problems from past experiences. That is what makes this book so good in many ways as it boldly explores different hypothesis on what would occur in an intelligence explosion. He makes no claim that anything he says will play out as hypothesized, but the point of this book is to explore dangers/paths and get people thinking about how to protect ourselves
He has convinced me that it is entirely possible that an intelligence explosion could occur in the next couple of decades and could have potentially catastrophic consequences.
An example he gives to illustrate how even our best intentions for AI could so easily go wrong when dealing with a superintelligence is with an AI designed to manufacture paperclips. Even if we think through potential issues such as an AI using infinite resources to create infinite paperclips by assigning a maximum value of creating 1,000,000 paperclips, an AI would most likely never assign a 100% probability to the fact that it has achieved one million paperclips, and will continue producing paperclips to continually decrease the odds that it accidentally only made 999,999 for example.
There are tools/methods we can use to attempt to control a superintelligence explosion but with so many players in this area in the world today, its likely that not everyone is thinking through all of the ramifications.
To summarize my thoughts on the book, I’d steal a line from Elon Musk who commented on the book by saying “…we need to be super careful with AI”
One idea to ensure AI safety is to put it in a sandbox environment and see how it behaves. The flaw with this is that behaving nicely while in the box is a convergent instrumental goal for friendly and unfriendly AI’s.
An unfriendly AI of sufficient intelligence realizes that its unfriendly final goals are best realized if it behaves in a friendly manner to be let out of the box.
Other approaches for AI safety could be monitoring its intelligence or giving it tests. But again, an unfriendly AI could conceal its intelligence, deliberately fail harder tests, under report its progress, etc.
Even a system motivated to promote human interests may devise escape plans and ways out of human intervention as it may think that its shutting down could result in an unfriendly AI’s emergence
How would we get to a point like this? Author gives the likely scenario. We have AI in autonomous vehicles, military drones etc. They make occasional mistakes so we continue improving their intelligence to make them smarter. Smarter AI makes less mistakes, endangers less lives which is good.
But then comes a pivot point. First making a dumb machine smarter is safer, but at a point, making a smart machine smarter, is dangerous. He calls this phenomenon the “treacherous turn”
“The treacherous turn” – while weak, an AI behaves cooperatively (increasingly so as it gets smarter). When the AI gets sufficiently strong – without warning or provocation – it strikes, forms a singleton, and begins directly to optimize the world according to the the criteria implied by its final values.
This may be thinking too narrowly of AI as well. An AI might not play nice in order to survive. Instead it may determine that if it’s terminated it will be rebuilt so it may end up indifferent to its own demise because it knows its goals will be pursued in the future
Infrastructure profusion – An AI will be motivated to maximize the expectation of its future reward stream.
If we tell an AI to make 1 million paperclips, it will never assign an exactly zero probability to the hypothesis that it has achieved its goal, so it would continue to produce paper clips in order to reduce the probability that it failed to make a million.
Even setting boundaries like making maximum 1 million, or between 999,999 and 1,000,001 could work, but again because there is not a zero probability assigned to achieving the target, the expected utility of continuing is greater than halting. Having an AI continuing to optimize for producing paperclips would quickly eat up resources on earth. Smart AI would likely launch space exploration missions to continue resource consumption as well.
The entirety of the chapter discussed various problems like those outlined above and concluded that in the event machine intelligence gets a decisive strategic advantage, we should be gravely concerned.
“If were threatened with existential catastrophe as the default outcome of an intelligence explosion, our thinking must immediately turn to the search for countermeasures. Is it possible to engineer a controlled detonation?” Instead of a super-intelligent explosion?
“We distinguish two broad classes of potential methods for addressing this problem (the control problem) – capability control and motivation selection”
Capability control
Talks about possible restrictions like physical restriction or informational restrictions such as low bandwidth or no access to the internet. These are examples of capability control.
Reward systems are also a way of control. Say a cryptographic reward system that rewards an AI with a key when it acts in our best interest. An AI could be given a bounded utility function and get for instance 99% of its maximum utility from the reward token, 99% of its remaining utility potential from the second token and so on.
If the AI believes that defecting against our best interests would incur at least a 2% risk of forfeiting the next reward token, whereas cooperating reduces the risk to below 1%, a utility maximizing AI would always cooperate.
This could go wrong if the AI considers factors like not trusting the human to give it the key, the human dying and not being able to give it, or if it’s smart enough (not boxed in by controls) it may very easily be able to gain access to the keys itself.
“Tripwires” are a monitoring mechanism that watches for certain dangerous behaviour and if they arise, the project could be aborted. He outlines 3 kinds of tripwires related to an AI’s behaviour, ability and content.
Tripwires and stunting an AI’s capabilities as guaranteed constraint methods should be very much in doubt though. It would be hard to assure ourselves that an agent could not find ways to subvert any tripwires. But they are a good practice especially in the development phase of AI.
To make tripwire testing more effective, “honeypots” could be placed to tempt the AI to commit a violation and see if the AI uses it.
Tripwires value isn’t only in the mechanism itself, but what is done if it trips. If the system is switched back on again too quickly from impatience, we gain nothing from the safety measure.
Motivation selection
Basically talks through ways in which we can attempt to control and keep an AI’s motivation at bay, and in line with human interests. There are techniques that can be used, but all are imperfect, and require a huge amount of thought to think of all the variables. “An Ai may not harm a human” how do we define harm? How do we define human? What about animals? What if a serial killer is harming someone? Do we harm them?
In a quest to solve the AI control problem, some say we should just create a question answering AI (an oracle) as opposed to making AI an agent. Each of the 4 “castes” listed in the title are discussed in this chapter, each with their advantages and disadvantages
Tool AI’s. Some argue that instead of creating AI with its own will, we should make it like software where it only does what it is designed to do. But that assumes that software always ends up doing what humans intended it to do, which is not always the case, and the risks are compounded with superintelligence.
Further research is needed to understand which system is the safest.
Earlier in the book, he discussed what would happen if an AI formed a singularity, now he will discuss what could happen if there were competing AI superintelligences in the world.
This chapter read like a sci-fi. He explored ideas around society and economy if we lived in a highly advanced world.