In a few years, the Internet could look like a very different place than it is today. We’ll be able to complete searches by voice, more efficiently search for images by the people or objects they contain, or use video analysis tools to see videos or ads that are more relevant to the content we’re viewing.
The technologies that will enable all of these improvements are part of one of the most interesting areas of artificial intelligence research: deep learning, a type of artificial intelligence that involves processing data through a network of simulated “neurons” that have been trained using data derived from audio, images, and other types of input. Read on to learn about three important ways that artificial intelligence will change the Internet, giving rise to more advanced and innovative tools to help users around the world find the information they’re looking for.
1. Voice recognition
In September, VentureBeat reported that Andrew Ng of Baidu, the second-largest web search provider in the word, said that “In five years, we think 50 percent of queries will be on speech or images.” Ng offered the projection during a Gigaom meetup on deep learning. Web search providers like Google, Microsoft, and Baidu are making efforts to integrate deep learning into a variety of applications, and at Baidu, it has informed speech recognition, images search, web ranking, and advertising systems.
Making Baidu’s neural networks more accurate could lead to more effective search products in countries where significant percentages of the population are illiterate. Ng says, “Speech and images are, in my view, a much more natural way to communicate [than text].” One out of the ten inquiries Baidu receives already comes through speech, and he envisions how pointing a smartphone camera at a product like a handbag could enable a search engine to more quickly identify the model then phrasing and rephrasing a typed inquiry. The technology that enables search providers to handle voice or image requests will be increasingly important, and Ng says, “I think that whoever wins AI will win the Internet.”
Even as tech giants like Baidu, Apple, and Google build their own voice recognition technologies that are capable from constantly learning from huge amounts of data, others are putting voice recognition within reach of smaller players. As Rachel Metz reported for MIT’s Technology Review last year, a startup called Wit.ai is making it easier for hardware manufacturers and software developers to add custom voice controls to their products with a natural language service it offers free to those who who agree to share their data to improve the technology.
2. Image search
Deep learning has also given rise to dramatic advances in processing images, with Technology Review reporting that Facebook researchers used the technique to build a system that can determine almost as well as a human whether two different photos show the same person, and Google used it to create software that describes complicated images in short sentences.
Technology Review recently reported that a face detection algorithm that capitalizes on the advances made on a type of machine learning known as a deep convolutional neural network is poised to revolutionize image search. The algorithm, developed by Sachin Farfade and Mohammad Saberian at Yahoo Labs in California and Li-Jia Li at Stanford University nearby, trains a many-layered neural network on a database of annotated examples of pictures of faces from many angles.
The researchers created a database of 200,000 images, including faces at various angles and orientations, and 20 million images without faces. They trained the neural net in batches of 128 images over 50,000 iterations, and built a single algorithm that can detect faces from a wide range of angles, even when partially occluded, and can spot many faces in the same image. The team calls the algorithm the Deep Dense Face Detector, and explains that it compares favorably to other existing algorithms. “We evaluated the proposed method with other deep learning based methods and showed that our method results in faster and more accurate results.”
The Deep Dense Face Detector improves on an earlier algorithm, created by computer scientists Paul Viola and Michael Jones in 2001, that looks for vertical bright bands in an image that could be noses, then for horizontal dark bands that could be eyes, and then looks for other general patterns associated with faces. While the Viola-Jones algorithm was a revelation for detecting faces seen from the front, it can’t accurately detect faces from any other angle, which severely limits how the algorithm can be used by face search engines.
The promise of algorithms like the Deep Dense Face Detector lies in image search. Technology Review notes that it’s straightforward to search for images taken at a specific time or place, but it’s difficult to find images of specific people. The algorithm takes a step in that direction, and towards a future when archives of photos and videos will be searchable.
3. Video analysis
Deep learning research enables computers to go beyond image recognition. VentureBeat recently reported that Clarifai, a startup with a service for recognizing objects in images, recently announced that its technology can point out the objects that appear in videos. As Technology Review explains, Clarifai’s software can analyze video clips to recognize 10,000 different objects or types of scenes. It can analyze video faster than a human can watch it, and in a demo given at a conference on deep learning, the software analyzed a 3.5 minute clip in just 10 seconds.
Clarifai is offering the technology as a service, and expects that it will be used for things like matching ads to the content of online videos, or developing new ways to organize video collections or edit footage. Anyone can test Clarifai’s service on the startup’s website, and the different types of objects that the analysis might detect include cars, trees, or people, or more abstract concepts like “fun” or “togetherness.” In the future, the software will be able to summarize what happens in a video, and recognize when a specific activity has occurred in a video.
The technology could lead to a smarter way of serving advertising alongside the videos you watch online. The software can identify which point in the video is the best to place an ad. Companies can already pay to position their ads next to videos of a particular type, or ones about a specific subject. Being able to automatically match ads with particular moments in videos could be even more attractive to advertisers — and perhaps more tolerable for consumers currently annoyed by irrelevant or repetitious ads.