The Turing Test of Computer Intelligence Is Too Easy
To better test our computer programs’ intelligence, we should to ask them for stories and drawings
Alan Turing, an English mathematician who played a key role as a code breaker during WWII, is sometimes called the father of computer programming. He is famous for proposing a test for computer intelligence that was based on a parlor game in which a questioner tried to deduce the gender of two people behind a curtain. The parlor game was simple: Written questions and answers were passed back and forth so that tone and pitch of voice did not betray the people behind the curtain. Turing’s proposed test switches one of the hidden people for a computer. The game is to figure out which curtain hides a human and which conceals artificial intelligence.
We now call this the Turing Test, and it’s typically played out on a computer through a conversation with a chatbot. This past summer, a program that emulates the personality of a 13-year-old Ukrainian boy, "Eugene Goostman," convinced 33 percent of judges that it was human, reports Dan Falk for Smithsonian.
Turing optimistically predicted that by the year 2000 a computer program would trick judges 30 percent of the time. But as yet, most chatbots struggle to truly "sound" human. This year’s feat was met with criticism, because Eugene can hide its nature behind the mistakes a teenage who doesn’t speak perfect English might make. Falk writes:
In one of my conversations in 2012, I typed in a simple joke – and the entity I was conversing with instantly changed the subject to hamburgers. (Computer scientist Scott Aaronson recently had a similar experience when he chatted with Eugene via the bot’s website. Aaronson asked Eugene how many legs a camel has; it replied, “Something between 2 and 4. Maybe, three? :-)))” Later, when Aaronson asked how many legs an ant has, Eugene coughed up the exact same reply, triple-smiley and all.)
If people can be fooled by a computer spouting nonsense—not exactly a sign of intelligence—then we should probably come up with a better way to test A.I. Enter the Lovelace Test, named for Ada Lovelace, the first computer programmer.
Lovelace wrote in 1843 that computers can’t be considered intelligent until they can create something original, something they weren’t programmed to do, reports Jordan Pearson for Motherboard. The Lovelace test was first proposed in 2001, but Mark Riedl, an A.I. researcher, explains that that, as originally conceived, this test isn’t perfect either. "I'm not sure that the test actually works because it's very unlikely that the programmer couldn't work out how their A.I. created something," he told New Scientist.
His update, the Lovelace 2.0 test, would simply ask the computer to create something original and creative: a story, poem or picture. He writes:
If the judge is satisfied with the result, they make another, more difficult, request. This goes on until the A.I. is judged to have failed a task, or the judge is satisfied that it has demonstrated sufficient intelligence. The multiple rounds means you get a score as opposed to a pass or fail. And we can record a judge's various requests so that they can be tested against many different A.I.s.
The test serves more as a comparision tool between A.I. systems, the New Scientist says. But at least it seems like it cannot be stymied by tricks the way the Turing Test can. Also worth noting: the aesthetics of the creation don’t matter. After all, not all living, breathing humans can paint masterpieces. But most can play Pictionary, Riedl points out.