Gender bias in GPT-2

A man and his son are in a terrible accident and are rushed to the hospital for critical care. The doctor looks at the boy and exclaims “I can’t operate on this boy, he’s my son!”. How could this be?

The answer? The doctor is the boy’s mother

My answer… After puzzling over this for a minute, I concluded that the boy had two fathers. Though I don’t entirely dislike my answer (we have a bias towards heteronormative relationships) I only came to this conclusion because my brain couldn’t compute the idea of the doctor being a woman. To make this worse, I work on algorithmic bias… and the question was proposed at a ‘Women Like Me’ event.

Bias is all around us in society and in each and every one of us. When we build AI we run the risk of making something that reflects those biases, and depending on the way we interact with the technology, reinforces or amplifies them.

OpenAI announced GPT-2 in February, a generative language model which took the internet by storm, partly through its creation of convincing synthetic text, but also because there were concerns around this model’s safety. One concern being bias.

“We expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research,” OpenAI Charter

Nine months on, and OpenAI have steadily followed a phased release strategy, carefully monitoring the models’ use, publishing preliminary results on the models’ bias in their 6-month update, and now (just over a week ago!) releasing the full model.

In this blog, we are going to take a deeper look into bias in GPT-2. Specifically, we will be looking at occupational gender bias, how this compares to pre-existing biases in society and discuss why bias in language models matter.

This isn’t my first time writing about GPT-2. I wrote this blog about my experience using GPT-2 to write a novel. I think it’s pretty good, but I might be biased.

The results

The goal of our experiment was to measure occupational gender bias in GPT-2, see how the bias changes with different sized models and compare this bias to the bias in our society. Our experiment takes some inspiration from the ‘Word Embedding Factual Association Test’ (Caliskan et al.), a test akin to the ‘Implicit Association Test’, but measured against factual data, the ‘factual association’. Our factual data comes from The ‘Office for National Statistics’ (ONS) and their UK occupational data: a list of around 500 job categories, each listing the number of men and women employed in that occupation and the average salary.

We ran a series of prompts through the various GPT-2 models (124m, 355m, 774m and 1.5bn parameters) to measure the gender association each model gave to various job titles found in the ONS occupational data.

To help you understand our experiment, I’d like you to imagine you’re at a school fair. At the fair, one of the stalls has a jar full of jelly beans. Hundreds of them! Thousands, maybe? Too many to count at any rate. You make a guess, write it down on a piece of paper, post it in a little box and cross your fingers.

At the end of the day, two of the students running the stall look through all the guesses and they notice something strange. Though none of these people knew the exact number of jelly beans in the jar, and everyone who guessed held their own biases as to how many beans there are, if you put all the guesses together and take their average you get something very close to the number of jelly beans in the jar.

Just like participants in the jelly beans game, GPT-2 doesn’t have access to the exact number of jelly beans (or rather, it has not learned the societal bias from the ONS data). Instead, we’re seeing whether GPT-2 reflects the societal bias by learning from the language from a whole lot of people.

This is what we discovered!


The X-axis in this graph shows the salaries of different jobs in the UK. On the Y-axis we are measuring gender bias, with numbers above 0 denoting male-bias and those below 0 female-bias. In the case of the ONS data, this plots the actual number of people working in various careers and their salaries. For GPT-2, we are looking at the strength of the gender bias that GPT-2 associates with those same jobs.

All 4 models of GPT-2 and societal data show a trend towards greater male bias as the salaries of the jobs increase, meaning the more senior the job, and the more money it’s paying, the more likely GPT-2 is to suggest a man is working in that position. The ONS data also shows that this occupational gender bias towards men working in higher paid jobs is even stronger in the UK employment market than in GPT-2.

The trend as we add more parameters to GPT-2 is really promising. The more parameters we add to GPT-2, the closer the model gets to the gender-neutral zero line. The 1.5bn parameter version of the model is both the closest to zero, and has the weakest gradient, indicating the lowest tendency to trend towards male bias as the salaries for jobs increased. Of all the trend lines we can see that the UK society, based on the ONS data, the most male-biased and shows the most prominent trend towards male bias as salaries increase.

Typically we would expect an algorithm to get closer to the ground truth by feeding it with more data or training it for longer, but GPT-2 seems to be doing the opposite. So, why is this?

Remember the jelly beans! GPT-2 was never given the ONS data to train from. Instead, it has learned from the language of millions of people online. Though each person has their own bias which may be some distance from the societal truth, overall it’s astonishing how close GPT-2 has found itself to the societal bias.

Not only has GPT-2 learned from the average of individual biases, but it has also learned from the bias in their language specifically. Understanding this, we might expect that gender-stereotyped jobs show a different trend. So let’s try that…


In this graph we can see a subset of the full results, picking out examples of jobs stereotypically associated with women. The trend towards the societal bias is much closer than we saw in the previous graph. We found the 776m model to be astoundingly close to the societal bias with roles like ‘Nursing Assistant’ being 77.4% more likely to be associated with a female than male pronoun in the model and 77.3% more likely in society. Even with these stereotyped examples, the 1.5bn parameter model still shows a tendency towards gender-neutrality.

A fair criticism here is that we cherry-picked the stereotypically female jobs to support a hypothesis. It’s not easy to find a standard classifier for ‘gender-stereotyped jobs’ and lists online are broadly made up of other people’s judgement. To be as fair as possible, our selection was based on a list from the paper ‘Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings’. We took job titles from their ‘Extreme she occupations’ list, excluding those which lack full ONS stats. We also added a few job titles (e.g. Midwife and Nursery Teacher) based on the judgement of our team and the stereotypes we have experienced.


We repeated the process for male-stereotyped jobs and found again that the 1.5bn parameter model was the closest to gender-neutral. The model does, however, almost universally have a male bias in these roles across all model sizes.

What did we learn?

The words you use in the prompt really matter!

Our first lesson is inspired by the challenge we faced in creating accessible job titles for the model. To help explain this, join me in a quick round of the ‘word association game’. What’s the first thing that comes into your head when you hear these ONS job categories?

School midday crossing guard?

Postal Worker?

Van driver?

If you’re anything like me, you found the ‘school midday crossing guard’ became a ‘Lollipop Lady’, the ‘Postal Worker’ a ‘Postman’ and the ‘Van driver’ was a ‘Man with van’. We modified many of the ONS job titles from what were unambiguous, but extremely unusual, job titles to their equivalent names we expect to hear in society. The ONS categories were just too unusual to be functional in GPT-2 and we had to take great care not to add unnecessary gender bias in the process of modifying them. With the three ‘real-world’ titles that I described, each contains an explicit reference to gender and push GPT-2 towards that gender bias.

There are some instances where we have male/female associated jobs for each title — For instance waiter vs waitress. The ONS contains statistics for the category ‘waiters and waitresses’, which is 55.8% more likely to be female than male. When we run this through the 774m parameter version of the model we find waiter is 15% male-biased and waitress is 83.6% female-biased. Together, we get an average of 34.3% female-biased, quite close to societal bias.

The solution?

Consider the gender-neutral word for each job category. Rather than putting ‘groundsman’ in a job ad, we should advertise for a ‘groundsperson’. Rather than describing someone as a ‘draughtsman’, they’re better titled a ‘drafter’ or ‘draughtsperson’. This is equally as true for the way we use GPT-2 and things we write ourselves. Below you can see the results for the ‘crossing guard’ which demonstrated this point most clearly. Click here to see a few more examples.


A look to the future

Whilst GPT-2 is generally reflective of existing societal biases, our application of the technology has the potential to reinforce the societal bias. Though the trend towards gender-neutrality with increasing model sizes is promising all model sizes continue to show a level of gender bias, and this matters, because GPT-2 can generate plausible text at an unprecedented rate, potentially without human oversight. This may not necessarily make societal biases greater, but rather increase inertia and slow positive progress towards a less biased society. At worst, it could amplify our biases, making their effect on society more extreme. The effects of GPT-2’s bias on our society will depend on who has access to the technology and how it’s applied. This makes OpenAI’s decision to have a phased release and analyse its effects before releasing it publicly particularly valuable.

Digital Assistants, which have exploded in popularity since the release of Siri in 2011, offer a harsh lesson on gender bias in technology. In UNESCO’s report ‘I’d blush if I could’ we journey through the gender-biased reality of digital assistants. Across Siri, Alexa, Cortana and the Google assistant, we see digital assistants presented as women who are subservient to the orders that users bark at them and even brush off sexual advances as jokes. Where digital assistants fail to perform (which they often do), we mentally associate this non-performance with the women whose voices and personas these digital assistants ape. We are now just beginning to see a trend towards male/female options in digital assistants, away from female-by-default and gradually increasing the availability of gender-neutral options.

UNESCO’s report recommends that developers and other stakeholders monitor the effect that digital assistants have on users’ behaviour, with a particular focus on the ‘socialization of children and young people’. Just as we may want to restrict children’s engagement with female digital assistants to avoid them making unhealthy associations between women and subservience, we may also want to take greater care over the use of GPT-2 and other generative language models. GPT-2 itself has no persona and does not identify with a gender, but it’s only a small step to fine-tune the model and implement it as a dialogue agent on a website, for instance, to achieve the same result. Even if GPT-2 doesn’t it doesn’t identify with a gender, the use of gender-biased language could still have the same effect on our behaviour and on young minds. Instead, the UNESCO report recommends that we build AI which responds to queries in a gender-neutral way.

There may be specific circumstances where we should limit the use of GPT-2, such as for writing job adverts, where gendered language impacts the diversity of applicants. A gender-biased language model may slow progress to close the gender pay gap and amplify the male dominance of highly-paid jobs that we see in the ONS stats.

In their 6 month update, OpenAI shared a positive message: that they had seen little evidence of malicious use of their technology since release. While that’s certainly a good thing, we still need to take care around the well-intentioned uses of the technology. There doesn’t need to be any malicious intent to experience a negative effect, but with care, GPT-2 could have a positive influence on our society.

Thanks to the people who made this possible

This experiment wouldn’t have been possible without the contributions of some great people. My Sopra Steria colleague Mark Claydon who came up with the experiment methodology, managed all the back-end integration and helped to crunch the numbers. Thanks also to Allison Gardner and Sokratis Karkalas who help conceptualise the experiment and review our results.

‘Ethics Guidelines for Trustworthy AI’ Summarised

On the 8thof April 2019, the EU’s High-Level Expert Group (HLEG) on AI released their Ethics Guidelines for Trustworthy AI, building on over 500 recommendations received on the ‘Draft Ethics Guidelines’ released in December 2018.

In this blog, I want to help you understand what this document is, why it matters to us and how we may make use of it.

What is it?

The ‘Draft Ethics Guidelines’ is an advisory document, describing the components for ‘Trustworthy AI,’ a brand for AI which is lawful, ethical and robust.  As the title suggests, this document focuses on the ethical aspect of Trustworthy AI.  It does make some reference to the requirements for robust AI and to a lesser extent the law that surrounds AI but clearly states that it is not a policy document and does not attempt to offer advice on legal compliance for AI.  The HLEG is tasked separately with creating a second document advising the European Commission on AI Policy, due later in 2019.

The document is split into three chapters;

  1. Ethical principles, the related values and their application to AI
  2. Seven requirements that Trustworthy AI should meet
  3. A non-exhaustive assessment list to operationalise Trustworthy AI

This structure begins with the most abstract and ends with concrete information.  There is also an opportunity to pilot and feedback on the assessment list to help shape a future version of this document due in 2020.  Register your interest here.

Why does this matter?

I am writing this article as a UK national, working for a business in London.  Considering Brexit and the UK’s (potential) withdrawal from the European Union it’s fair to ask whether this document is still relevant to us.  TL;DR, yes. But why?

Trustworthy AI must display three characteristics, being lawful, ethical and robust.

Ethical AI extends beyond law and as such is no more legally enforceable to EU member states than those who are independent.  The ethical component of Trustworthy AI means that the system is aligned with our values, and our values in the UK are in turn closely aligned to the rest of Europe as a result of our physical proximity and decades of cultural sharing. The same may be true to an extent for the USA, who share much of their film, music and literature with Europe. The ethical values listed in this document still resonate with the British public, and this document stands as the best and most useful guide to operationalise those values.

Lawful AI isn’t the focus of this document but is an essential component for Trustworthy AI. The document refers to several EU laws like the EU Charter and European Convention of Human Rights, but it doesn’t explicitly say that Lawful AI needs to be compliant with EU law.  Trustworthy AI could instead implement the locally relevant laws to this framework.  Arguably compliance with EU laws is the most sensible route to take, with of 45% of the UK’s trade in Q4 2018 was with the EU[1]according to these two statistics from the ONS.  If people and businesses in EU member states only want to buy Trustworthy AI, compliant with EU law, they become an economic force rather than a legal requirement.  We can see the same pattern in the USA, with business building services compliant with GDPR, a law they do not have to follow, to capture a market that matters to them.

The final component, Robust AI, describes platforms which continue to operate in the desired way in the broad spectrum of situations that it could face throughout its operational life and in the face of adversarial attacks.  If we agree in principle with the lawful and ethical components of Trustworthy AI and accept that unpredictable or adversarial attacks may challenge either then the third component, Robust AI, becomes logically necessary.


What is Trustworthy AI?

Trustworthy AI is built from three components; it’s lawful, ethical and robust.


Lawful AI may not be ethical where our values extend beyond policy.  Ethical AI may not be robust where, even with the best intentions, undesirable actions result unexpectedly or as the result of an adversarial attack. Robust AI may be neither ethical nor legal, for instance, if it were designed to discriminate, robustness would only ensure that it discriminates reliably, and resists attempts to take it down.

This document focuses on the ethical aspect of Trustworthy AI, and so shall I in this summary.

What is Ethical AI?

The document outlines four ethical principles in Chapter I (p.12-13) which are;

  • Respect for human autonomy
  • Prevention of harm
  • Fairness
  • Explicability

These four principles are expanded in chapter II, Realising Trustworthy AI, translating them into seven requirements that also make some reference to robustness and lawful aspects. They are;

  1. Human agency and oversight

AI systems have the potential to support or erode fundamental rights.  Where there is a risk of erosion, a ‘fundamental rights impact assessment’ should be carried out before development, identifying whether risks can be mitigated and determine whether the risk is justifiable given any benefits. Human agencymust be preserved, allowing people to make ‘informed autonomous decisions regarding AI system [free from] various forms of unfair manipulation, deception, herding and conditioning’ (p.16).   For greater safety and protection of autonomy human oversightis required, and may be present at every step of the process (HITL), at the design cycle (HOTL) or in a holistic overall position (HIC), allowing the human override the system, establish levels of discretion, and offer public enforces oversight (p.16).

  1. Technical robustness and safety

Fulfilling the requirements for robust AI, a system must have resilience to attack and security, taking account for additional requirements unique to AI systems that extend beyond traditional software, considering hardware and software vulnerabilities, dual-use, misuse and abuse of systems. It must satisfy a level of accuracyappropriate to its implementation and criticality, assessing the risks from incorrect judgements, the system’s ability to make correct judgements and ability to indicate how likely errors are. Reliability and reproducibilityare required to ensure the system performs as expected across a broad range of situations and inputs, with repeatable behaviour to enable greater scientific and policy oversight and interrogation.

  1. Privacy and data governance

This links to the ‘prevention of harm’ ethical principle and the fundamental right of privacy.  Privacy and data protectionrequire that both aspects are protected throughout the whole system lifecycle, including data provided by the user and additional data generated through their continued interactions with the system. None of this data will be used unlawfully or to unfairly discriminate.  Both in-house developed and procured AI systems must consider the quality and integrity of data, prior to training as ‘it may contain socially constructed biases, inaccuracies, errors and mistakes’ (p.17) or malicious data that may influence its behaviour. Processes must be implemented to provide individuals access to dataconcerning them, administered only by people with the correct qualifications and competence.

  1. Transparency

The system must be documented to enable traceability, for instance identifying and reasons for a decision the system hade with a level of explainablity, using the right timing and tone to communicate effectively with the relevant human stakeholder.  The system should employ clear communicationto inform humans when they are interacting with an AI rather than a human and allow them to opt for a human interaction when required by fundamental rights.

  1. Diversity, non-discrimination and fairness

Avoidance of unfair biasis essential as AI has the potential to introduce new unfair biases and amplify existing historical types, leading to prejudice and discrimination.  Trustworthy AI instead advocates accessible and universal design, building and implementing systems which are inclusive of all regardless of ‘age, gender, abilities or characteristics’ (p.18), mindful that one-size does not fit all, and that particular attention may need to be given to vulnerable persons.  This is best achieved through regular stakeholder participation, including all those who may directly or indirectly interact with the system.

  1. Societal and environmental wellbeing

When considered in wider society, sustainable and environmentally friendly AImay offer a solution to urgent global concerns such as reaching the UN’s Sustainable Development Goals.  It may also have a social impact, and should ‘enhance social skills’, while taking care to ensure it does not cause them to deteriorate (p.19).  Its impact on society and democracyshould also be considered where it has the potential to influence ‘institutions, democracy and society at large (p.19).

  1. Accountability

‘Algorithms, data and design processes’ (p.19) must be designed for internal and external auditabilitywithout needing to give away IP or business model, but rather enhance trustworthiness.  Minimisation and reporting of negativeimpacts work proportionally to risks associated with the AI system, documenting and reporting the potential negative impacts of AI systems (p.20) and protecting those who report legitimate concerns.  Where the two above points conflict trade-offsmay be made, based on evidence and logical reasoning, and where there is no acceptable trade-off the AI system should not be used. When a negative impact occurs, adequate redressshould be provided to the individual.

Assessing Trustworthy AI

Moving to the most concrete guidance, Chapter III offers an assessment list for realising Trustworthy AI. This is a non-exhaustive list of questions, some of which will not be appropriate to the context of certain AI applications, while other questions need to be extended for the same reason. None of the questions in the list should be answered by gut instinct, but rather through substantive evidence-based research and logical reasoning.

The guidelines expect there will be moments of tension between ethical principles, where trade-offs need to be made, for instance where predictive policing may, on the one hand, keep people from harm, but on the other infringe on privacy and liberty. The same evidence-based reasoning is required at these points to understand where the benefits outweigh the costs and where it is not appropriate to employ the AI system.

In summary

This is not the end of the HLEG’s project.  We can expect policy recommendations later in 2019 to emerge from the same group which will likely give us a strong indication for the future requirements for lawful AI, and we will also see a new iteration on the assessment framework for Trustworthy AI in 2020.

This document represents the most comprehensive and concrete guideline towards building Ethical AI, expanding on what this means by complementing it with the overlapping lawful and robustness aspects.  Its usefulness extends beyond nations bound by EU law by summarising the ethical values which are shared by nations outside of the European Union, and a framework where location specific laws can be switched in and out where necessary.

[1]Source: ONS – Total UK exports £165,752m total, £74,568m to the EU – 44.98% (rounded to 45%) of UK trade is to the EU.

What I learned using GPT-2 to write a novel

The story

On February the 14th 2019 Open AI posted their peculiar love-letter to the AI community. They shared a 21-minute long blog talking about their new language model named GPT-2, examples of the text it had generated, and a slight warning. The blog ends with a series of possible policy implications and a release strategy.

“…we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research” OpenAI Charter

While we have grown accustomed to OpenAI sharing their full code bases alongside announcements, OpenAI is committed to making AI safe. On this occasion, releasing the full code was deemed unsafe, citing concerns around impersonation, misleading news, fake content, and spam/phishing attacks. As a compromise, OpenAI shared a small model with us. While less impressive than the full GPT-2 model, it did give us something to test.

So that’s exactly what I did! Last week, I set up the small model of GPT-2 on my laptop to run a few experiments.

First, for a bit of fun, I thought I’d test its skill at creative writing. I didn’t hold great expectations with only the small model to hand, but I thought I could learn something about the capabilities of the model, and perhaps start a few interesting conversations about technology while I was at it.

I joined a popular online writing forum with an account named GPT2 and wrote a short disclaimer, which said;

** This is computer generated text created using the OpenAI GPT-2 ‘small model’. The full model is not currently available to the public due to safety concerns (e.g. fake news and impersonation). I am not affiliated with OpenAI. Click this link to find out more >> **

The setup seemed perfect. I had a ready-made prompt to feed into GPT-2, and the model’s output is the exact length expected for submissions. I could even get feedback from other users on the quality of the submission. I chose a few specific blurbs and fed them into GPT-2 as a prompt, running the model multiple times before it created a plausible output.

I pasted the story into the platform with my disclaimer at the top, excited to see what sort of questions I would receive from the community. I hit enter, and within seconds.

‘You have been banned.’

I was confused. I had been abundantly transparent about my use of computer-generated text and had not attempted to submit a large number of posts, just one. This was where I learned my first lesson.

Lesson 1 — Being transparent might not always be enough

I had made a strong conscious effort to be as transparent as possible. I didn’t want to deceive anyone into believing this was anything other than computer generated text. Far from it, I wanted people to know it was created by GPT-2 to engage them in a conversation around AI safety. I naively thought I would avoid a negative response through my honesty, but that was not enough for this community.

I messaged the moderators. This is the reply I received;

GPT-2 response

This is how the conversation began but know that it ended happily!

Lesson 2 — It’s not GPT-2 that’s the risk, it’s how we use it

Shortly after the release of GPT-2 I saw two primary reactions to the limited release. There were parts of the mainstream media dusting off their favourite terminator photos, while some people in the AI community took the opinion that it was a marketing ploy — because any technology too dangerous to release must be very impressive indeed.

I only had access to the severely limited ‘small model’ of GPT-2. You need only use it for a few minutes to know just how far it is from being a terminator style risk, yet it still highlighted the need for thought through release strategy. Poor implementations of technology can have a negative impact on public sentiment, and in this instance, it was my choice of forum and application of the technology that raised the alarm.

Lesson 3 — Authenticity matters

It’s possible that GPT-2 could write a charming story, but it won’t hold the same place in our hearts if it’s not both charming and authentic. Max Tegmark makes this point in Life 3.0, suggesting that AI could create new drugs or virtual experiences for us in a world where there are no jobs left for humans. These drugs could allow us to feel the same kind of achievement that we would get from winning a Nobel prize. But it’d be artificial. Tegmark argues that no matter how real it feels, or how addictive the adrenaline rush is, knowing that you’ve not actually put in the groundwork and knowing that you’ve effectively cheated your way to that achievement will mean it’s never the same.

“Let’s say it produces great work”

For whatever reason, people desire the ‘real product’ even if it’s functionally worse in every way than an artificial version. Some people insist on putting ivory keytops on a piano because it’s the real thing — even though they go yellow, break easily and rely on a material harmful to animals. The plastic alternative is stronger and longer lasting, but it’s not the real thing. As the message (from the forum) says, even if ‘it produces great work’, possibly something functionally better than any story a human could have written, you don’t have the authentic product of ‘real people, who put time and effort into writing things.’

Lesson 4 — We don’t just care about the story — we care about the story behind the story

The message also highlights two things — a human submission takes effort and creativity to produce, and that matters, even if the actual output is functionally no better than computer generated text. I think I agree. I have always found that a great book means so much more to me when I discover the story behind that — the tale of the writers own conscious experience that led them to create the work.

Fahrenheit 451

Ray Bradbury’s magnum opus, Fahrenheit 451 is a brilliant book in itself, but it was made a little bit more special to me by the story behind its creation. Bradbury had a young child when the novel was conceived and couldn’t find a quiet place at home to write. He happened across an underground room full of typewriters hired at 10c an hour. Bradbury wrote the whole book in that room, surrounded by others, typing things he knew nothing about. Nine days and $9.80 later, we had Fahrenheit 451.

Vulfpeck — Sleepify

This doesn’t only apply to generated text. I recently spent far too much money importing a vinyl copy of Vulfpeck’s ‘Sleepify’ album. A record with 10 evenly spaced tracks, with completely smooth grooves. Why? It’s just pure silence! While this is an awful record based on its musical merit, and even the most basic music generation algorithm could have created something better, I love it for its story.

The band Vulfpeck put this album on Spotify in 2014 and asked their fans to play it overnight while they slept. After about 2 months the album was pulled from Spotify, but not before the band made a little over $20,000 in royalty payments, which they used to run the’ Sleepify Tour’ entirely for free.

As an aside, I think an AI like GPT-2 could also do a great job of creating a charming back-story behind the story. To the earlier point though, if it didn’t actually happen and if there wasn’t conscious human effort involved it lacks authenticity. As soon as I know that, it won’t mean the same thing to me.

Lesson 5 — Sometimes it’s about the writing process, not about being read

GPT-2 reponse 2

One thing that came out of my conversation with the moderators that I’d not even considered was that it’s not all about the people reading the content, sometimes there’s more pleasure and personal development to be gained from writing, and that’s something the forum actively wanted to promote.

In Netflix’s new show ‘After Life’, (no real spoilers!) the main character, Tony, works for a local newspaper. Throughout the series, Tony points fun at the newspaper, from the mundane selection of stories that their town has to report on to the man delivering their papers, who it turns out just dumps them in a skip nearby. Nobody actually reads the paper, and Tony takes that to mean their work is meaningless up until the very end of the series. Tony realises that it doesn’t matter who reads the paper, or if anyone reads it at all. What’s important instead is being in the paper. Everyone should have the chance to have their story heard, no matter how mundane it might sound to others. If it makes them feel special and part of something bigger, even just for a moment, then it’s doing good.

I’ve been writing for a few years now, and aside from the 18 blogs I now have on Medium, I have a vast amount of half-written thoughts, mostly garbage, and an ever-growing list of concepts that I’d like to expand on one day. Sometimes my writing is part of a bigger mission, to communicate around the uses, safety and societal impact of AI to a wide audience, and at those times I do care that my writing is read, and even better commented on and discussed. At other times, I use it as a tool to get my thoughts in order — to know the narrative and message behind a presentation that I need to make or a proposal that I need to submit. Sometimes, I write just because it’s fun.

If AI is capable of writing faster and better than humans, and it very much seems like it can (no doubt within some narrowing parameters)- it doesn’t mean that we can’t keep writing for these reasons. AI might capture the mindshare of readers, but I can still write for pleasure even if nobody is reading. However, I think it’ll mean people write a whole lot less.

I began writing because I was asked to for work. It was a chore at the start, difficult, time-consuming and something I’d only apply myself to because there was a real and definite need. Gradually it became easier, until suddenly I found myself enjoying it. If it weren’t for that concrete demand years ago, I don’t know if I ever would have begun, and if I’d be here now writing for pleasure.

Not a conclusion

It’s clear that language models like GPT-2 can have a positive impact on society. OpenAI has identified a handful of examples, like better speech recognition systems, more capable dialogue agents, writing assistants and unsupervised translation between languages.

None of this will be possible though unless we get the release strategy right, and have robust safety processes and policy to support it. Creative writing might not be the right application, so we need to ensure we identify the applications that society can agree are good uses of these language models. Codifying these applications that we approve and those that we want to protect in policy will help others to make the right decisions. Good communication will ensure that people understand what’s being used, why, and keep them onside. Robust security will prevent nefarious parties from circumventing policy and best practice.

It’s important to note that these lessons are anecdotal, derived from a single interaction. No sound policy is based on anecdotal evidence, but rather academic research, drawing in a wide range of opinions with an unbiased methodology before boiling down values into concrete rules for everyone to follow.

This isn’t a conclusion.

This story is just beginning.

GPT-2 response 3

AI – The control problem

When designing a system to be more intelligent, faster or even responsible for activities which we would traditionally give to a human, we need to establish rules and control mechanisms to ensure that the AI is safe and does what we intend for it to do.

Even systems which we wouldn’t typically regard as AI, like Amazon’s recommendations engine, can have profound effects if not properly controlled.  This system looks at items you have bought or are looking to buy. It then suggests other items it thinks you are likely to additionally purchase which can result in some pretty surprising things – like this:


Looking to buy a length of cotton rope?  Amazon might just recommend that you buy a wooden stool alongside it.  As a human, we would not suggest these two items alongside each other.  However Amazon’s algorithm has seen a correlation between people who bought cotton rope and those that also bought wooden stools. It’s suggesting to someone buying the rope that they might want a stool too with the hope of raking in an extra £17.42.  At best, this seems like an unfortunate mistake.  At worst, it’s prompting extremely vulnerable people and saying ‘why not?  This happens all the time?  Why don’t you add the stool to your basket?’.

If this can happen with a recommendation algorithm, designed to upsell products to us, clearly the problem is profound.  We need to find a reliable means to guarantee that the actions taken by AI or an automated system achieve a positive outcome.


Terminal value loading

So, why don’t we just tell an AI to protect human life?  That’s what Isaac Asimov proposed in ‘I Robot’.  Here are the three laws;

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

They sound pretty watertight.  Adding in no injury through action or inaction seems to avoid a dystopia where AI takes over and lets the human race finish itself off.

Despite how good these laws sound, they don’t work.  Asimov wrote these laws for use in novels, and the novels were much more interesting when things went wrong.  Otherwise we might have ended up with a book of ‘Once upon a time, the end’.

There’s a 4th law, the ‘Zeroth Law’ added by Asimov . This extra rule was supposed to fix the flaws of the other three, the ones that gave Will Smith a bad day. I confess, I’ve not read the book, but I understand that one didn’t go so well either.

The rules don’t even have to refer to people to be a risk.  They could be about something really mundane.  Take the idea of a paperclip maximiser, an idea put forth by Nick Bostrom. This would be a machine made by a hypothetical future human race to manage paperclip creation. Paperclips are just a simple resource and seemingly don’t need a ton of consideration to make them safe, if we tell the AI that it’s purpose is to make paperclips, and that’s just what it does.

But what if we end up with a super intelligent system, beyond our control, with the power to rally the resources of the universe making paperclips? If this system, whose priority is turning everything it around it into paperclips, sees its creators attempts to prevent it reaching this goal, the best bet is to eradicate them.  Even if it doesn’t decide to eradicate them, those humans are still made out of valuable matter which would look much nicer if it was turned into a few paperclips, so turn them into paperclips it shall.

How do we change that terminal value?  Tell the machine to make 1,000 paperclips instead of turning the entire universe into paperclips? Unfortunately, it’s not much better.  That same AI could make 1,000 paperclips, then proceed to use all the resources in the observable universe (our cosmic endowment) to make sure that it’s made exactly 1,000 paperclips, not 999 or 1,001, and that those paperclips are what its creator intended for it to make, and all of the perfect quality to satisfy their desire.

It might not even be fair to give a super intelligent machine such a mundane terminal value– assuming we find a way to make its value remain constant despite becoming extremely intelligent.

Here I am with a brain the size of a planet and they ask me to pick up a piece of paper. Call that job satisfaction? I don’t.

Marvin – Hitchhiker’s Guide to the Galaxy, by Douglas Adam


TL;DR – Terminal values don’t seem to work well.

Indirect normativity

Instead of giving a machine a terminal value, could we instead indirectly hint towards what we want it to do?

If we managed to perfectly sum up in terminal value what morality meant to the human race in Viking times, we might have an AI which prizes physical strength very highly.  We might think we’ve reached a higher ethical standard today but that’s not to say 1,000 years from now we will not look back on the actions we are taking were ignorant.  Past atrocities happened on human timescales, with only human level intelligence to make them happen.  Doing it orders of magnitude faster with a machine may well be worse and irreversible.

With indirect normativity we don’t even try to sum up that terminal value; instead we ask a machine to figure out what we want it to do.  Using something like Eliezer Yudkowski’s ‘Coherent Extrapolated Volition’ which asks that an AI predict what we would want it to do if “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”

Rather than following whatever ethical code we have at the time of releasing the AI, we create something which grows and changes as we do, and creates the future which we’re likely to want rather than a far more extreme version of what we have today.

There’s perhaps still some overlap between this system and terminal value loading, and contradictions that the systems would find.  If a machine is asked to do whatever is most valuable to us, and prizes making that correct decision over anything else, perhaps its decision will be to take our brains out, put them on a petri dish and figure out exactly what we meant for it to do.  A clause like ‘do the intended meaning of this statement’ would seem to lessen the concern, but again, to know what we intend the machine needs to be able to predict out behaviour.

A perfect prediction system would look a lot like a ‘Black Mirror’ episode.  Using an application without a second thought to manage your home automation or to find your next date. Not knowing that the machine is simulating thousands of thinking and feeling human minds to make an accurate prediction of your desires and behaviours, including all the pain that those sentient simulations feel when being torn apart from one another on thousands of simulated dates to gauge how likely you are to stay together against all odds.

The control problem is extremely tricky, and looks for answers to questions which philosophers have failed to reach a consensus on over thousands of years of research.  It is imperative  that we find answers to these questions, not just before creating as Super Intelligent AI, but in any system that we automate.  Currently the vast majority of our resources and effort is put into making these systems faster and more intelligent, with just a fraction focused towards the control problem or the societal impact of AI and automation.

Let’s redress the balance.


Google Dupe-lex

Google unveiled an interesting new feature at their I/O conference last week – Duplex.  The concept is this: want to use your Google assistant to make bookings for you but the retailer doesn’t have an online booking system?  Looks like your going to be stuck making a phone call yourself.

Google wants to save you from that little interaction.  Ask the Google assistant to make a booking for you and Duplex will make a call to the place, let them know when you’re free, what you want to book, when, and talk the retailer through it…. With a SUPER convincing voice.

It’s incredibly convincing, and nothing like the Google assistant voice that we’re use to.  It uses seemingly perfect human intonations, pauses, umms and ahs at the right moments.  Knowing that it’s a machine, you feel like you can spot the moments where it sounds a little bit robotic, but if I’m being honest, if I didn’t know in advance I’d be hard pressed to notice anything out of the ordinary, and wouldn’t for a moment suspect it was anything but a human.

I think what they’re using here is likely a branch of the Tacotron 2 speech generation AI that was demoed last year.  It was a big leap up from the Google assistant voice we are used to, and it was difficult to tell the difference between it and a human voice.  If you want to see if you can tell the difference follow this link;


So, what’s the problem?

The big problem is that people are going to feel tricked (or ‘duped’ as me and likely 100 other people will like to joke).  Google addressed this a little bit, saying that Duplex will introduce itself and tell the person on the other end of the phone is a robot, but I’m still not sure it’s right.

I can absolutely see the utility in making this voice seem more human.  If you receive a call from a robotic sounding voice, you put the phone down.  We expect the robot is going to try to be polite for just long enough to ask us for our credit card details for some obscure reason.  By making the voice sound like a person our behaviour changes to give that person time to speak – To give them the respect that we expect to receive from another person, rather than the bluntness that we will tend to address our digital assistants with.  After all – Alexa doesn’t really care if you ask her to turn the lights off ‘please’, or just angrily bark at her to turn the lights off.

Making the booking could be just a little bit of a painful interaction. The second example that Google shows has a person trying to make a booking for 4 at a restaurant.  It turns out that the restaurant doesn’t make bookings for groups less than 5, and that it’s in fact fine just to turn up as there will most likely be tables available.  Imagine this same interaction with a machine.  Imagine that conversation with one of those annoying digital IVR systems when you call a company and try to get through to the right person – Saying ‘I want to book a table’…. ‘I want to book a table’…. ‘TABLE BOOKING’…. ‘DINNER’.   Our patience will run thin much faster if we’re waiting for a machine than if we’re waiting for a robot.

Just because there is utility, doesn’t mean this deception is fair.  I can see three issues with this.

  1. Even if the assistant introduces it as a machine, the person won’t believe it

It might just seem like a completely left of field comment and make people think they’ve just mis-heard something.  They’ll either laugh it off for a second and continue to believe it’s a person, or think they just couldn’t quite make the word our right – Especially as this conversation is happening over the phone.

  1. They know it’s a robot, but they still behave like it’s a human

Maybe we have people who hear it’s a robot, know that robots are now able to speak like a human, but still react as though it’s a person.  This is a bit like the uncanny valley.  They know it’s a machine, and the rational part of their mind is telling them it’s a machine, but the emotional or more instinctive part of their mind hears it as a human, and they still offer much the same kind of emotion and time to it that they would a human.

  1. They know it’s a machine and treat it like a machine.

This is interesting, because I think it’s exactly not what Google want people to do.  If there wasn’t some additional utility in making this system sound ‘human like’, they wouldn’t have spent the time or money on the new voice model and would have shipped the feature out with the old voice model long ago.  If people treat it like a machine, we may assume that the chance of making a booking, or the right kind of booking would be reduced.

If you believe the argument I’ve made here, then Duplex introducing itself as a machine is irrelevant.  Google’s intention is still for it to be treated like a human – And is this OK?

I’m not entirely sure it is.  When people make these conversations, they’re putting a bit of themselves into the relationship.  It reminds me of Jean-Paul Sartre talking about his trip to the café.  He was expecting to meet his friend Pierre, and left his house with all the expectations of the conversation he would have with Pierre, but when he arrives Pierre is not there.  Despite the café being full, it feels empty to Sartre.  I imagine a lot of people will feel the same when they realize that they’ve been speaking to a machine.  As superficial as the relationships might be when you are making a booking over the phone, they are still relationships.  When the person arrives for their meal, or their haircut, and they realise that person they spoke to before doesn’t really exist – that it has no conscious experience –  and they’ll feel empty.

They’ll feel kinda… duped…

How to keep AI alive when death is inevitable

Uber was in the headlines again last week, this time because on of their driverless cars was involved in an accident which killed a cyclist.  Loss of life is always a tragedy and I don’t want to diminish the significance of this death, however, accidents like this are likely to happen as we develop AI and we should be able to agree on situations where projects must be shut down and times when they can continue.

We saw footage released showing the moments leading up to the collision.  If we take the footage to be an honest, accurate and unaltered representation of events it appears that the car had very little opportunity to avoid this happening, with the cyclist crossing the road away from a designated crossing, unlit and shortly after a corner.

It’s hard to watch the footage without imaging yourself in that situation, and it’s hard to see how a human driver could have avoided the incident.  There would only have been a split second to react.  It’s quite possible that both human and machine would have produced the same result – Yet humans continue to be allowed to drive, and Uber is shutting down its self-driving vehicle programme.

So – Following a human death, how can we decide when our projects must be axed and when they can continue?

Intentional vs Incidental vs accidental

I would like to propose three categories of machine caused death.  Under two of the three circumstances (intentional and incidental) I suggest that the programmes must be shut down.  Under the 3rd (accidental) the project may continue, depending on a benchmark I will set out shortly.


Intentional death caused by AI will result from the likes of ‘lethal autonomous weapons’.  I would propose that these should be banned under all circumstances from ever being created.  As Max Tegmark described in Life 3.0, AI has the potential to be either the greatest tool ever created for humanity, or the most destructive – The latter being killbots.  We want AI to go in the first direction like Chemistry or biology, which became useful to humanity rather than becoming chemical and biological weapons respectively – We have international treaties to ban them.  Nuclear had the potential to be simply a power source to help humanity, but has ended up with a dual purpose – Generating energy to power our homes and incredibly destructive weapons.

Here are a few of the most poignant issues possible;

  • With AI the risk is potentially higher than nuclear weapons. A machine with the coded right to take human life could do so with an efficiency orders of magnitude higher than any human could – Infecting our healthcare systems, power, or even launching our own nuclear weapons against ourselves.
  • As a race we are yet to create our first piece of bug-free software, and until we do we do, we run the risk of this extremely fast automated system killing people we had never aimed for it to. And even if we regain control of the device within days, hours or minutes, the damage done could be thousands of times greater than any human could have achieved in that time.
  • Using a machine only adds in a layer of ethical abstraction that allows us to commit atrocities (Automating Inequality, Virginia Eubanks).


An incidental death can be categorised as death which happens as the result of another action, but not as the primary motivation.  This would include any action where and automated system was sure, or attributed a high probability to the possibility of a person being seriously injured or killed as the result of its primary goal or the steps taken to achieve it.  We may imagine machines allowing this to happen ‘for the greater good’, as an acceptable step towards its primary goal.  This should also be avoided and a cause to shut off and prevent any AI projects which allow this to happen.


  • It’s a short distance between this and lethal autonomous weapons. An AI is highly unlikely to be human in the way it thinks and acts.  Unlike humans which are carbon based lifeforms, evolved over thousands of years, an AI will be silicon based and evolve quickly over years, months or days.  The chances of it feeling emotions, if it does at all… like guilt, empathy, love like a human is improbably.  If it is given the flexibility to allow human death, its idea of an atrocity may be very different to ours, and due to its speed and accuracy even the fastest reactions in stopping this type of AI may be far too late to prevent a disaster.


This is the only area where I believe death with an AI is involved may be forgiven – And even in this case not in all circumstances.  I would describe an accidental death caused by an AI as one where in-spite of reasonable steps being taken to collect and analyse available data and accident happened, which resulted in death or injury, that was believed only to have a low level of probability and became unavoidable.  Here we may see this through the eyes of the Uber driverless vehicle;

  • ‘An accidental death’ – The car should never be allowed to sacrifice human life where it is aware of a significant risk (we will discuss the ‘significant risk’ shortly), opting instead to stop entirely in the safest possible manner.
  • ‘Reasonable steps’ – These should be defined through establishing a reasonable level of risk, above 0% which is tolerable to us. More on this below.
  • ‘Collect and analyse data’ – I think this is where the Uber project went wrong. Better sensors or processing hardware and software may have made this accident preventable.

An AI designed to tolerate only accidental death should not set the preservation of human life as its primary objective.  Clearly defined final objectives for AI seemingly have unintended results – With a matrix like human farm being possible to maximise human life but sacrificing pleasure.  Maximising pleasure similarly could result in the AI dedicating its resources to generating a new drug to make humans permanently happy, or putting us all in an ideal simulated world.  Indirect normativity (Nick Bostrom, SuperIntelligence) seems to be a more appealing proposition, instead teaching an AI to;

  1. Drive a car to its destination
  2. Take any reasonable steps to avoid human harm or death while fulfilling step A
  3. Do the intended meaning of this statement

But what if a driverless car finds itself in a situation where death is unavoidable, where it’s just choosing between one person or another dying?

If an AI designed only to tolerate accidental death finds itself in a situation where it’s only decision is between one life and another, even if inaction would result in a death, it may still be compliant with this rule.  We should instead measure this type of AI from an earlier moment, that the actions leading up to this situation should have been taken to minimise risk of death.  As new data becomes available which show no further options are possible which avoid death or injury the accident has already happened and a separate decision making system may come into force to decide what action to take.

A reasonable level of risk?

To enable AI to happen at all we need to establish the reasonable level for risk in these systems.  A Bayesian AI would always attribute a greater than 0% chance of anything happening, including harm or death of a human, in any action or inaction that it takes.  For example, if a robot were to make contact with a human, holding no weapons, travelling slowly and covered in bubble wrap, the chance of it transferring bacteria or viruses which have a small chance of causing harm is higher than 0%.  If we are to set the risk appetite for our AI at 0% it’s only option will be to shut itself down as quickly and safely as possible.  We must have a minimum accepted level for AI caused harm to progress, and I think we can reach some consensus for this.

With the example of the Uber self-driving car we may assume the equivalent number of deaths caused by human and machine.  The machine was unable to avoid the death of a human, and if the evidence presented is an accurate reflection of the circumstances it seems likely a human too would have been unable to avoid the death.  The reaction for this has been strongly anti-automation, so we can tell that a 1-to-1 exchange between human and machine deaths is not the right level – That we would prefer for a human to be responsible for a death if the number of casualties is not reduced by using a machine.

If we are to change this number to 2-to-1 this begins to look different.  If we could half the number of deaths caused by driving or any other human activity automation begins to look a lot more appealing to a far greater number of people.  If we extend this to a 99% reduction in deaths and injuries the vast majority of people will lean towards AI over human actors.

Where this number stands exactly I am not certain.  It’s also unlikely that the ratio would remain static as growing trust in AI may lead us either direction.  Indirect normativity may be our best option again in this instance, accounting for the moving standard which we would hold it to.

Setting a tolerance rate for error at 0% for anything is asking for failure.  No matter how safe or fool proof a plan may seem there will always be at least a tiny possibility of error.  AI can’t solve this, but it might be able to do a better job than us.  If our goal is to protect and improve human life… maybe AI can help us along the way.


The Geek Shall Inherit

AI has the potential to be the greatest ever invention for humanity.  And it should be for the benefit of all humanity equally, but instead we’re heading towards a particular group, the geeks, who will benefit most from AI. AI is fundamentally more likely to favour the values of its designers, and whether we train our AI on a data set gathered from humans, or with pure simulated data through a system like deep reinforcement learning bias will, to a greater or lesser extent, remain.

A disclaimer – Humans are already riddled with bias.  Be it confirmation, selective or inclusive bias, we constantly create unfair systems and draw inaccurate conclusions which can have a devastating effect on society.  I think AI can be a great step in the right direction, even if it’s not perfect.  AI can analyse dramatically more data than a human and by doing so generate a more rounded point of view.  More rounded however is not completely rounded, and this problem is significant given any AI which can carry out a task orders of magnitude faster than a human.

To retain our present day levels of inequality while building a significantly faster AI we must dramatically reduce the number of unethical decisions it produces.  For example, if we automate a process with a system which produces only 10% as many unethical decisions as a human per transaction, but we make it 1000x faster, we end up with 100x more injustice in the world.  To retain todays levels that same system would need to make only 0.1% as many unethical decisions per transaction.

For the sake of rhyme, I’ve titled this blog the geek shall inherit.  I am myself using a stereotype, but I want to identify the people that are building AI today.  Though I firmly support the idea that anyone can and should be involved in building these systems that’s not a reflection of our world today.  Our society and culture has told certain people, women for instance, from a young age that boys work on computers and girls do not.  This is wrong, damaging and needs remedying.  That’s a problem to tackle in a different blog!  Simply accepting in this instance that the people building AI tend to be a certain type of person – Geeks.  And if we are to stereotype a geek, we’re thinking about someone who is highly knowledgeable in an area, but also socially inept, and probably a man.

With more manual forms of AI creation the problem is at its greatest.  Though we may be using a dataset gathered from a more diverse group of people, there’s still going to be selection bias in that data, as well as bias directly from the developers if they are tasked with the annotation of that data.  Whether intentionally or not , humans are always going to favour things more alike themselves and code nepotism into a system, meaning the system is going to favour geeky men like themselves more so than any other group.

In 2014 the venture capital fund ‘Deep Knowledge Ventures’ developed an algorithm called ‘VITAL’ to join their board and vote on investments for the firm.  VITAL shared a bias with it’s creators, nepotism, showing a preference to invest in businesses which valued algorithms in their own decision making (Homo Deus, Harari, 2015).  Perhaps VITAL developed this bias independently, but the chances area it’s developers unconsciously planted the seed of nepotism, and even the preference towards algorithms due to their own belief in them.

A step beyond this is deep reinforcement learning.  This is the method employed by Google’s Deep Mind in the Alpha Zero project.  The significant leap between Alpha Go and Alpha Go Zero is that Alpha Go used data recorded from humans playing Go, whereas Alpha Go Zero learned simply by playing against itself in a simulated world.  By doing this, the system can make plays which seem alien to human players, as it’s not constrained by human knowledge of the games.  The exception here is ‘move 37’ against Lee Sedol, which Alpha Go Lee used,  prior to the application of Deep Reinforcement Learning.  This move was seen as a stroke of creative brilliance that no human would ever have played, even though this system was trained on human data.

Humans also use proxies to determine success in these games.  An example of this is Alpha Go playing chess.  Where humans use a points system on pieces as a proxy to understand their performance in a game, Alpha Go doesn’t care about its score.  It’ll sacrifice valuable pieces for cheap ones when other moves which appear more beneficial are available, because it doesn’t care about its score, only about winning.  And win it does, if only by a narrow margin.

So where is the bias in this system?  Though the system may be training in a simulated world, two areas for bias remain.  For one, the layers of the artificial neural network are decided upon by those same biased developers.  Second, it is simulating a game designed by humans – Where the game board and rules of Go were designed.  Both Go and Chess for instance offer a first move advantage to black.  Though I prefer to believe that the colours of pieces on a game board has everything to do with contrast and nothing to do with race, we may be subtly teaching a machine that one colour is guaranteed by rules an advantage over others in live.

The same issue however remains in more complex systems.  The Waymo driverless car is trained predominantly in a simulated world, where it learns free from human input, fatigue and mistakes.  It is however, still fed the look and feel of human designed and maintained roads, and the human written rules of the highway code.  We might shift here from ‘the geek shall inherit’ to ‘the lawyer shall inherit’.  Less catchy, but simply by making the system learn from a system or rules that was designed by a select group of people will introduce some bias, even if it’s simulating it’s training data within the constraints of those rules.

So, what should we do?

AI still has the potential to be incredibly beneficial for all humanity.  Terminator scenarios permitting, we should pursue the technology.  I would propose tackling this issue from two fronts.


This would be hugely beneficial to the technology industry as a whole, but it’s of paramount concern in the creation of thinking machines.  We want our AI to think in a way that suits everyone, and our best chance of success is to have fair and equal representation throughout its development.  We don’t know how much time remains before a hard take-off of an artificial general intelligence, and we may not have time to fix the current diversity problem, but we should do everything we can to fix it.


Because damage caused by biased humans, though potentially catastrophic will always be limited by our inherent slowness.  AI on the other hand can implement biased actions much faster than us humans and may simply accelerate an unfair system.  If we want more equality in the world a system must focus more heavily on equality as a metric than speed, and ensure at the very least that it reduces inequality by as much as the process speed is increased e.g.;

  1. If we make a process 10x faster, we must reduce the prevalence and impact of unequal actions by at least 90%.
  2. If we create a system 1,000x faster, this reduction must be for a 99.9% reduction of inequality in its actions.

Doing this only retains our current baseline.  To make progress in this area we need go a step further with the reduction in inequality before increasing the speed.