When Will Siri Become Samantha From Spike Jonze’s ‘Her’?

In the Oscar-winning movie by Spike Jonze, a man named Theodore Twombly, played by Joaquin Phoenix, falls in love with his intelligent operating system, an entity who names herself Samantha and is voiced by Scarlett Johansson. Through the movie, Samantha demonstrates, or approximates, the ability to learn, to interpret a vast variety of informational cues, and perhaps even the ability to love. Fans, critics, and tech writers alike have emerged from the experience of watching the film with a few questions: How far would we have to push artificial intelligence to approximate Samantha? How big of a leap would that technology constitute from current versions, like Apple’s Siri? What kind of technological capability would we need to make Samantha a reality?

In January, Stephen Wolfram, whose Wolfram Alpha “computational knowledge engine” drives the artificial intelligence-like component of the Siri virtual assistant on Apple’s iPhone, told the Wall Street Journal’s Robin Kawakami that he thinks the technology that would enable an operating system like Samantha isn’t that far off.

“The mechanics of getting the AI to work — I don’t think that’s the most challenging part. The challenging part is, in a sense: Define the meaningful product.” In contrast to the plethora of roles and tasks that Samantha takes on in the film, the artificial intelligence of the future is likely to be built to complete specific tasks. Though Wolfram previously thought that it would be possible to create “a general purpose AI that is kind of human-like that has a super version of exact human attributes,” he notes that that isn’t the direction he sees the area moving in anymore.

Instead, he predicts that we’ll soon see more capable personal assistants to read and analyze our email, dividing them by subject matter. Email management systems like Sanebox or even Gmail’s system of inbox tabs are early examples. While Wolfram thinks that it wouldn’t be difficult to build a talking assistant similar to Samantha, he questions the practicality of a voice assistant when visual presentations are more prevalent for conveying information.

Kawakami also spoke to Peter Norvig, a director of research at Google, who pointed out that perceptions play a large role in our interaction with technology. “Humans are pretty good at deceiving themselves. If you ask Siri the right questions, it does a good job. If you ask it the wrong question, it makes it look silly—the same thing with [IBM’s] Watson.”

In Norvig’s view, elements of artificial intelligence surround us, visible in Netflix and Amazon’s recommendation engines, or in Siri and the Wolfram Alpha software that underpins it. Norvig explained, “I think of AI as figuring out how to do the right thing when you don’t know what the right thing is. We don’t know how to write down the rules for what’s the difference between a face and something else, and so AI is answering that question.”

To Wolfram, the definition of artificial intelligence is a little more nebulous. While many computers can reproduce functions of the human brain, they complete those tasks in a completely different way than the brain does, and that makes defining the difference between intelligence and computation difficult.

He told Kawakami, “I used to think that there was some sort of magic to brain-like activity.” But he noted that, years of research later, he found that there was no “bright line distinction” between what would be considered intelligent, and what would be considered “merely” computational, and shared human experience is what distinguishes human intelligence from pure computation.

That line between intelligent and computational is further blurred by the introduction of computer chips that complete artificial intelligence tasks and claim to work in the same way that the human brain does. As John Markoff reported for the New York Times in August, IBM has developed a computer chip, or processor, called TrueNorth, that tries to mimic the way that the brain recognizes patterns, using webs of transistors similar to the brain’s neural networks. In an article published in the journal Science, a group of researchers explained that the chip was built with 4,096 neurosynaptic cores, integrating 1 million programmable “spiking neurons,” which can encode data as patterns of pulses, and 256 million configurable synapses.

TrueNorth’s electronic neurons are able to signal to each other when a type of data passes a certain threshold, such as when light is growing brighter, or changing color or shape. That ability could enable the processor to recognize acts that current computers and robots struggle to interpret. As an example, Markoff notes that the chip could be able to recognize that a woman in a video picking up a handbag — something that humans can do easily, but current computers can’t.

The chip is an important achievement in terms of scalability and efficiency. TrueNorth contains 5.4 billion transistors, but consumes only 70 milliwatts of power, versus Intel processors for personal computers, which might have 1.4 billion transistors and consume anywhere between 35 to 140 watts. Wired noted at the time that the chip was unveiled that IBM had tested the it with common artificial intelligence tasks, like recognizing images, and it could handle those tasks with usual speed, but much less power than traditional chips would require.

In those tests, TrueNorth reportedly recognized people, cyclists, cars, buses, and trucks with 80 percent accuracy. But some question whether the technology is significantly different from what’s already available, and if the approach will really bring the significant advances that IBM claims it will.

A traditional processor separates the data storage and data crunching parts of the computer — the memory and the CPU — and neuromorphic chips represent a departure from that architecture in that the memory and computational parts of the computer are placed in small modules that process information locally, but communicate with each other.

But the tasks that the chip can complete so far aren’t robust enough to impress many who are researching the area of machine learning, a subfield of artificial intelligence encompassing systems that can learn from data, and act without being explicitly programmed. It remains to be seen how the technology develops and scales, and how well TrueNorth will perform when put to work on large problems, like recognizing many types of objects.

Wired notes that while the chip has performed well on the simple image detection and recognition tasks using DARPA’s NeoVision2 Tower dataset, that dataset includes only five categories of objects. Conversely, the software used at Baidu and Google is “trained”on the ImageNet databse, which includes thousands of categories of objects. For many, neurochips like IBM’s need to demonstrate the ability to learn to break current computing paradigms.

And it’s not only the fact that approximating human intelligence would require incredible storage and computation capacity that makes Samantha a difficult technology to replicate. Tim Tuttle, chief executive of Expect Labs, told New York Magazine’s Kevin Roose that while current computers are good at imitating common, predictable behaviors — like what we’re going to type into the Google Search bar or what items we’re going to buy on Amazon, given our browsing and purchase history — being able to understand and respond to the unpredictable, original input is what differentiates Samantha from Siri. Today, computers can recognize words, match them against a database, and find the information that they think we want. But a virtual assistant that could learn, teach, and identify and interpret nonverbal cues is several steps removed from the technology that’s currently available.

Considering Her and the technology which, the film seems to say, could be waiting in the not-so-distant future, it’s the ability to learn that enables Samantha to far surpass Theodore’s expectations for her, and seemingly her expectations for herself. But the fact that Samantha was originally created to do one thing — to manage emails, assist with schedules, and keep the user’s electronic life running cleanly — and fluidly learned to do so many other things — learning and developing based on her interaction with Theodore and with the rest of the world, virtual and physical — pushes the concept of machine learning to a new level, one which isn’t matched by today’s technology or the trajectory it’s expected to take.

Roose learned from D. Scott Phoenix, co-founder of machine learning company Vicarious, that computers assist users by matching what we say with a list of stored commands. But the problem with that is that it isn’t the same thing as understanding language, and humans understand the world and language through a “sensory universe.” In a concept called the symbol grounding problem, computer scientists theorize that you could load a robot’s database with every symbol in the universe — everything on the internet, everything ever printed in a book, every word ever spoken by a human — but the robot would still not be able to act fully human, because it wouldn’t have a way to connect those symbols to the objects and concepts that humans experience in the real world.

Similarly, IBM’s Watson, which can read quickly, process natural language, and add to its knowledge base, isn’t able to “think” about problems and situations the same way that a human can, and lacks the ability to process many simple situations that humans can understand without much thought. In reality, the virtual assistants that we’ll see in the near future will likely be more specialized, more mundane, and distinctly less human-like than fans of Her would like to believe.

In a piece for Variety in January, Dag Kittlaus, the co-creator of Siri, noted that Siri was built “to get things done.” But the virtual assistant became a cultural phenomenon “overnight,” not because the assistant made it easier to use a phone, but because Siri was fun, and felt a little bit human.

Kittlaus points out that Samantha has more emotional intelligence than Siri, and from a technological standpoint, building a system that was capable of all of the things that Samantha said and did and understood “would entail massively scaled real-time image recognition, spatial understanding, facial and mood recognition — as well as understanding the subtleties of thousands of social scenarios in order to predict that the couple sitting at the table were on a first date.” Pondering the question of whether Siri can catch up, Kittlaus concludes, “Maybe, but don’t hold your breath.”

Personal assistants that can understand and use natural language, learn complex concepts, and express human emotion likely won’t be available anytime soon. And even if researchers are able to build intelligent computers like Samantha, there’s still the problem that, given all the information in the world, even the most intelligent, human-like computer can never act truly and totally like a real human.

More from Tech Cheat Sheet: