When designing a system to be more intelligent, faster or even responsible for activities which we would traditionally give to a human, we need to establish rules and control mechanisms to ensure that the AI is safe and does what we intend for it to do.
Even systems which we wouldn’t typically regard as AI, like Amazon’s recommendations engine, can have profound effects if not properly controlled. This system looks at items you have bought or are looking to buy. It then suggests other items it thinks you are likely to additionally purchase which can result in some pretty surprising things – like this:
Looking to buy a length of cotton rope? Amazon might just recommend that you buy a wooden stool alongside it. As a human, we would not suggest these two items alongside each other. However Amazon’s algorithm has seen a correlation between people who bought cotton rope and those that also bought wooden stools. It’s suggesting to someone buying the rope that they might want a stool too with the hope of raking in an extra £17.42. At best, this seems like an unfortunate mistake. At worst, it’s prompting extremely vulnerable people and saying ‘why not? This happens all the time? Why don’t you add the stool to your basket?’.
If this can happen with a recommendation algorithm, designed to upsell products to us, clearly the problem is profound. We need to find a reliable means to guarantee that the actions taken by AI or an automated system achieve a positive outcome.
Terminal value loading
So, why don’t we just tell an AI to protect human life? That’s what Isaac Asimov proposed in ‘I Robot’. Here are the three laws;
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
They sound pretty watertight. Adding in no injury through action or inaction seems to avoid a dystopia where AI takes over and lets the human race finish itself off.
Despite how good these laws sound, they don’t work. Asimov wrote these laws for use in novels, and the novels were much more interesting when things went wrong. Otherwise we might have ended up with a book of ‘Once upon a time, the end’.
There’s a 4th law, the ‘Zeroth Law’ added by Asimov . This extra rule was supposed to fix the flaws of the other three, the ones that gave Will Smith a bad day. I confess, I’ve not read the book, but I understand that one didn’t go so well either.
The rules don’t even have to refer to people to be a risk. They could be about something really mundane. Take the idea of a paperclip maximiser, an idea put forth by Nick Bostrom. This would be a machine made by a hypothetical future human race to manage paperclip creation. Paperclips are just a simple resource and seemingly don’t need a ton of consideration to make them safe, if we tell the AI that it’s purpose is to make paperclips, and that’s just what it does.
But what if we end up with a super intelligent system, beyond our control, with the power to rally the resources of the universe making paperclips? If this system, whose priority is turning everything it around it into paperclips, sees its creators attempts to prevent it reaching this goal, the best bet is to eradicate them. Even if it doesn’t decide to eradicate them, those humans are still made out of valuable matter which would look much nicer if it was turned into a few paperclips, so turn them into paperclips it shall.
How do we change that terminal value? Tell the machine to make 1,000 paperclips instead of turning the entire universe into paperclips? Unfortunately, it’s not much better. That same AI could make 1,000 paperclips, then proceed to use all the resources in the observable universe (our cosmic endowment) to make sure that it’s made exactly 1,000 paperclips, not 999 or 1,001, and that those paperclips are what its creator intended for it to make, and all of the perfect quality to satisfy their desire.
It might not even be fair to give a super intelligent machine such a mundane terminal value– assuming we find a way to make its value remain constant despite becoming extremely intelligent.
Here I am with a brain the size of a planet and they ask me to pick up a piece of paper. Call that job satisfaction? I don’t.
Marvin – Hitchhiker’s Guide to the Galaxy, by Douglas Adam
TL;DR – Terminal values don’t seem to work well.
Instead of giving a machine a terminal value, could we instead indirectly hint towards what we want it to do?
If we managed to perfectly sum up in terminal value what morality meant to the human race in Viking times, we might have an AI which prizes physical strength very highly. We might think we’ve reached a higher ethical standard today but that’s not to say 1,000 years from now we will not look back on the actions we are taking were ignorant. Past atrocities happened on human timescales, with only human level intelligence to make them happen. Doing it orders of magnitude faster with a machine may well be worse and irreversible.
With indirect normativity we don’t even try to sum up that terminal value; instead we ask a machine to figure out what we want it to do. Using something like Eliezer Yudkowski’s ‘Coherent Extrapolated Volition’ which asks that an AI predict what we would want it to do if “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”
Rather than following whatever ethical code we have at the time of releasing the AI, we create something which grows and changes as we do, and creates the future which we’re likely to want rather than a far more extreme version of what we have today.
There’s perhaps still some overlap between this system and terminal value loading, and contradictions that the systems would find. If a machine is asked to do whatever is most valuable to us, and prizes making that correct decision over anything else, perhaps its decision will be to take our brains out, put them on a petri dish and figure out exactly what we meant for it to do. A clause like ‘do the intended meaning of this statement’ would seem to lessen the concern, but again, to know what we intend the machine needs to be able to predict out behaviour.
A perfect prediction system would look a lot like a ‘Black Mirror’ episode. Using an application without a second thought to manage your home automation or to find your next date. Not knowing that the machine is simulating thousands of thinking and feeling human minds to make an accurate prediction of your desires and behaviours, including all the pain that those sentient simulations feel when being torn apart from one another on thousands of simulated dates to gauge how likely you are to stay together against all odds.
The control problem is extremely tricky, and looks for answers to questions which philosophers have failed to reach a consensus on over thousands of years of research. It is imperative that we find answers to these questions, not just before creating as Super Intelligent AI, but in any system that we automate. Currently the vast majority of our resources and effort is put into making these systems faster and more intelligent, with just a fraction focused towards the control problem or the societal impact of AI and automation.
Let’s redress the balance.