How to win a Turing Test: the Loebner Prize
Posted: 6 October 2017 | By Charlie Moloney
The 27th annual Loebner Prize, the world’s oldest Turing Test competition, was held Saturday 16th of September, and I was honoured to be one of four judges standing between the finalist chatbot creators and a $25,000 prize.
During the competition I faced a computer screen with two chat boxes, where I received messages from a human and a bot simultaneously. At the end of a 25 minute round, I had to choose which was which.
The bots tried to hide themselves as humans. The humans I spoke to, known as ‘confederates’, were told not to pretend to be robots, but just to type as they normally would.
There were four rounds, so I spoke to all four finalist bots and all four human ‘confederates’. Let me show you some of the key moments where I felt the bots shined, and also where things clearly DID NOT COMPUTE.
Note: My messages are green, the bot/human messages are blue. I’ve also included the name of the bot I faced in each round, for anyone who may be interested.
Round 1 — UberBot
This was the hardest round, because one of the chat boxes didn’t reply to me at all.
There were two options: either the chatbot had crashed, or the human confederate had gone for an inopportune toilet break.
This gave me the toughest time in the competition, because as I chatted to the one answering chat box I had nothing to compare with.
Two things helped me make the right call:
– The human confederate I was talking to (yes, it was a human) started to overthink what it means to type ‘normally’ and started having a mini existential crisis.
– Ultimately I just believed, from all I’d heard and read about bots, that they could not be that good yet to imitate a human so convincingly. So I took a leap of faith, and of course I was right — for another year at least.
Round 2 — Midge
This round was smooth sailing because the bot and the human were both sending responses, and I got to see for the first time how patently obvious it is which is which.
Pretty much straight away the bot fell out from behind the curtain by spewing definitions of the words I typed to it, and random chunks of songs and poems.
What made it even more obvious was that the bot made it’s responses lightning-fast, faster than any human could type.
It had some nice tricks, and witty sentences it deployed, but the speed it replied and the dictionary style responses were a dead giveaway.
Plus, the human I was speaking to made spelling mistakes. Note to chatbot makers: remember the words of Alan Turing “If a machine is infallible, it cannot also be intelligent”.
Round 3 — Rose
The bot in round 3 crashed — hard. It was a real shame, because it’s answers were actually pretty good.
The bot was able to deliver a nice put-down to me when I tried to confuse it by speaking French. It also insisted I couldn’t trust whether I was human, because my senses could be forged electrical signals.
But there was no hiding the malfunction that was occurring when the bot just began to say ‘huh, huh, huh’ to everything that I said.
By spamming random keys and clicking enter I overloaded it and made it crash out completely. Radio silence ensued until it comically rejoined with ‘nice to meet you’.
I should also mention this bot replied so quickly that it had responded before my finger let go of the enter key. All the easier to make it crash when it got stuck in it’s ‘huh’ spiral of doom.
Round 4 — Mitsuku
I immediately knew that this bot would be the winner. First and foremost, it replied at a nice meditative and slightly more convincing pace. Beyond that it was confidently able to handle my queries.
I could talk to this bot about Star Wars, what’s on TV, Christianity; I could ask questions like ‘how long can you hold your breathe underwater?’ and ‘how long is a piece of string?’ and it could fire back a witty and relevant response.
It made two key errors: it offered up the dictionary definition of Mormon for me, and also I asked it if we could sing a song and it offered to search one up on Youtube (a feature of Mitsuku’s online version which you can visit here).
It also goofed when I asked it to describe Youtube and it said, “song off youtube — off youtube = youtube = It’s a website where you can watch videos and upload your own.”
However, it was noticeably superior to the others and so at the end, when we chose our favourites, I picked this chatbot.
— Access AI (@AccessAiNews) 16 September 2017
Enough of the judges agreed with me for Steve Worswick, Mitsuku’s creator, to win the bronze prize for the third time. You can read his account of the days events inside the chatbot creators’ room here.
At the post-Turing Test trip to the pub, the creator of Uberbot, Will Rayner expressed to me that he wished there’d been more time to have a practise run. He thought his bot crashing in round one could have been avoided if so.
We discussed the pros and cons of programming your chatbot to purposefully make mistakes, and wait for a while before responding. He felt it was more important that the bot sent relevant responses.
As a judge, I’m not sure I agree. If you want to win the Loebner Prize, you need to think about how to trick the judges. But I take Will’s point: a chatbot creator only has so much time. It’s probably important to focus on fundamentals.