Google unveiled an interesting new feature at their I/O conference last week – Duplex. The concept is this: want to use your Google assistant to make bookings for you but the retailer doesn’t have an online booking system? Looks like your going to be stuck making a phone call yourself.
Google wants to save you from that little interaction. Ask the Google assistant to make a booking for you and Duplex will make a call to the place, let them know when you’re free, what you want to book, when, and talk the retailer through it…. With a SUPER convincing voice.
It’s incredibly convincing, and nothing like the Google assistant voice that we’re use to. It uses seemingly perfect human intonations, pauses, umms and ahs at the right moments. Knowing that it’s a machine, you feel like you can spot the moments where it sounds a little bit robotic, but if I’m being honest, if I didn’t know in advance I’d be hard pressed to notice anything out of the ordinary, and wouldn’t for a moment suspect it was anything but a human.
I think what they’re using here is likely a branch of the Tacotron 2 speech generation AI that was demoed last year. It was a big leap up from the Google assistant voice we are used to, and it was difficult to tell the difference between it and a human voice. If you want to see if you can tell the difference follow this link;
So, what’s the problem?
The big problem is that people are going to feel tricked (or ‘duped’ as me and likely 100 other people will like to joke). Google addressed this a little bit, saying that Duplex will introduce itself and tell the person on the other end of the phone is a robot, but I’m still not sure it’s right.
I can absolutely see the utility in making this voice seem more human. If you receive a call from a robotic sounding voice, you put the phone down. We expect the robot is going to try to be polite for just long enough to ask us for our credit card details for some obscure reason. By making the voice sound like a person our behaviour changes to give that person time to speak – To give them the respect that we expect to receive from another person, rather than the bluntness that we will tend to address our digital assistants with. After all – Alexa doesn’t really care if you ask her to turn the lights off ‘please’, or just angrily bark at her to turn the lights off.
Making the booking could be just a little bit of a painful interaction. The second example that Google shows has a person trying to make a booking for 4 at a restaurant. It turns out that the restaurant doesn’t make bookings for groups less than 5, and that it’s in fact fine just to turn up as there will most likely be tables available. Imagine this same interaction with a machine. Imagine that conversation with one of those annoying digital IVR systems when you call a company and try to get through to the right person – Saying ‘I want to book a table’…. ‘I want to book a table’…. ‘TABLE BOOKING’…. ‘DINNER’. Our patience will run thin much faster if we’re waiting for a machine than if we’re waiting for a robot.
Just because there is utility, doesn’t mean this deception is fair. I can see three issues with this.
- Even if the assistant introduces it as a machine, the person won’t believe it
It might just seem like a completely left of field comment and make people think they’ve just mis-heard something. They’ll either laugh it off for a second and continue to believe it’s a person, or think they just couldn’t quite make the word our right – Especially as this conversation is happening over the phone.
- They know it’s a robot, but they still behave like it’s a human
Maybe we have people who hear it’s a robot, know that robots are now able to speak like a human, but still react as though it’s a person. This is a bit like the uncanny valley. They know it’s a machine, and the rational part of their mind is telling them it’s a machine, but the emotional or more instinctive part of their mind hears it as a human, and they still offer much the same kind of emotion and time to it that they would a human.
- They know it’s a machine and treat it like a machine.
This is interesting, because I think it’s exactly not what Google want people to do. If there wasn’t some additional utility in making this system sound ‘human like’, they wouldn’t have spent the time or money on the new voice model and would have shipped the feature out with the old voice model long ago. If people treat it like a machine, we may assume that the chance of making a booking, or the right kind of booking would be reduced.
If you believe the argument I’ve made here, then Duplex introducing itself as a machine is irrelevant. Google’s intention is still for it to be treated like a human – And is this OK?
I’m not entirely sure it is. When people make these conversations, they’re putting a bit of themselves into the relationship. It reminds me of Jean-Paul Sartre talking about his trip to the café. He was expecting to meet his friend Pierre, and left his house with all the expectations of the conversation he would have with Pierre, but when he arrives Pierre is not there. Despite the café being full, it feels empty to Sartre. I imagine a lot of people will feel the same when they realize that they’ve been speaking to a machine. As superficial as the relationships might be when you are making a booking over the phone, they are still relationships. When the person arrives for their meal, or their haircut, and they realise that person they spoke to before doesn’t really exist – that it has no conscious experience – and they’ll feel empty.
They’ll feel kinda… duped…