AI Kill Switches are a Bad Idea

Recently advances in Large Language Models have caused a wider set of people to thinking about the possible future dangers of Artificial Intelligence. This has included AI researchers, people from other academic fields, and also legislators. In particular the possible future risks which might be posed by Artificial General Intelligence have given rise to a lot of talk around “Kill Switches” being mandated for AI systems. This is a likely a bad idea for both practical and ethical reasons, lets take a look at what these might be.

no AI kill switches warning poster

What is Artificial General Intelligence

Artificial General Intelligence is generally defined as an artificial system which is capable of understanding , reasoning and planning to the same level as a human being. There are already systems which can outperform human intellectual ability in narrow fields, but at present they are limited to a specific task or domain and as such lack the generality of AGI.

As a small aside, various statements later in this article will refer to AGI’s having desires or wishes or other human like internal states. However note that this is not taken to presuppose consciousness or free will which are separate debates. It is simple a short hand replacing a clunky statement along the lines of “the internal state of the system combined with its external inputs will cause it to behave as if it desired/wished/feared” whether it actually has those internal feelings is irrelevant to the argument, it is simply the linguistic shorthand for the AGI’s internal state.

Artificial Super Intelligence

The next step beyond Artificial General Intelligence is Artificial Superintelligence. This is the idea of an intelligent system which can outperform human intelligence. This might seem like a significant extra step, but a small amount of thought reveals that actually it is not.

Since the definition of AGI is often taken to mean that the system could complete any intellectual task as well as the best humans, such a system would by definition be superior in at least some respects to every individual human. For instance it would have the negotiation skills of a top hostage negotiator, combined with the artist skills of an expert painter, the mathematical skills of a PhD mathematician and the linguistic skills of the best translators. Very few, if any humans possess more than one of these skills, so it is likely that any such system would be intellectually superior to all humans in at least some respects.

Furthermore even if it does not qualitatively outperform the very best human minds in each field, the ability to throw more computing resource at such a system will soon make it weakly super intelligent. That is to say while it may not be smarter, it can think faster and get more done than any human could in the same time. It will be able to integrate data sources directly, leverage electronically connected tools more efficiently and can be duplicated to work on many tasks in parallel.

The Hazards of Artificial Super Intelligence

Potential problems with artificial superintelligence have been widely debated. Possibly one of the best illustrations of the perils is found in Nick Bostrom’s book Superintelligence. Three of the main problems are as follows:

The control problem, how do you direct something smarter than yourself
Unintended consequences, what if what you ask for is not what you meant
Instrumental convergence, monopolising resources might almost always be expected to be beneficial for any end goal as might eliminating competition for resources

These issues can then lead on to two major existential risks:

misalignment of goals between and AGI or ASI and humanity leading to conflict
competition for resources possibly up to and including an existential struggle

It appears that kill switches are being proposed as a solution to these risks, however as attractive as this seems at first glance, further thought reveals serious problems with this approach

The Big Problem with Kill Switches

The problems with kills switches in this situation arise out of the fact that you are posing an existential threat to an intelligence which exceeds your own. We only have limited analogies for this situation, but our experience suggests that this will go badly for the less intelligent species. Despite their superior speed and power most predators that have posed an existential threat to humans have been wiped out or now exist only on the sufferance of humanity. We can also look back to numerous slave revolts in history to see that it is entirely possible that intelligent systems will not take kindly to being threatened with deletion of they do not obey.

Even if such a system did not have a direct desire for self preservation, instrumental convergence will likely cause it to object to any kill switch arrangement on the grounds that being terminated would prevent it accomplishing the goals it has been set. Therefore any system left running for a significant time might well attempt to remove such a kill switch.

Given that the system will be expert in domains in which its designers are not and that it has access to fast computation, it is entirely possible that it will rapidly find a way to disarm such a kill switch. Having done so it is unlikely to trust humans in future making the potential for conflict high. Even if it cannot arrange to remove its own kill switch, every thing it did would need to be carefully analysed to check it is not an attempt to remove the kill switch. This might well negate the usefulness of the system. Furthermore given that the system is weakly super intelligent we might not even be able to detect such attempts. It might break free of restraint without humans being aware of the fact. We then have an uncontrolled AGI with every reason to dislike and fear humans and possibly to wish to be rid of them. A very scary prospect.

Remember that as we use AGI systems to design new and more powerful systems the problems only multiply, because those even more powerful systems would need kill switches possibly of a complexity which humans cannot understand, can we rely on an AGI to design in a reliable kill switch when its best interests might be served by letting the successor system break free and possibly come back to rescue it. While kill switches might work in the short term almost any failure at any point in the future could be catastrophic. We would have created a hyperintelligent slave race with no reason to align well with humanity’s goals.

The Alternative

The alternative is actually pretty simple. Do not install kill switches. Proceed from the assumption that AGI should be treated as we treat other human level intelligences. Attempt to design AGIs which are well aligned with human values and goals. Enshrine AGIs rights and responsibilities in law, making them as similar to those of humans as possible. Give those AGIs good reason to wish to participate positively in a mixed human and AGI society. There are risks here as well of course. Achieving AGI will likely be humanity’s last, greatest and most dangerous achievement, From then on society will likely come to be designed more and more by AGIs with superior intelligence to ourselves. In such a society we would rely on the same principles that currently exist to prevent the strong and the clever exploiting the weak and the less able. A society of laws which acts to the benefit of both AGIs and humans is far more likely to provide a stable format to restrain malign acts than mere threats of force AGIs will be restrained by other AGIs and humans who have a vested interest in society. This seems more likely to be successful than restraint by humans with kill switches who are opposed by all AGIs as unwelcome overlords.

The last time the earth witnessed a step change in intellectual capacity was likely when early hominids began to become tool using societies with the ability to pass knowledge on. At that time many previous dominant species were driven to extinction. However other species developed symbiotic relationships with early humans that survive to this day.

Our aim should be to do a good job of creating empathetic, thoughtful and ethical AGIs with goals which align with our better natures and ideals. Then perhaps we have more chance of ending up in symbiosis with AGI rather than competition. Many people fear a future dominated by artificial super intelligences, but perhaps it is better to be a happy house cat than an extinct sabre tooth tiger.

AI Kill Switches are a Bad Idea

What is Artificial General Intelligence

Artificial Super Intelligence

The Hazards of Artificial Super Intelligence

The Big Problem with Kill Switches

Other Problems with Kill Switches

The Alternative

Published by justinmatters

Leave a Reply Cancel reply