While tech industry giants like Apple and Microsoft have popularized the personal digital assistant by enabling smartphone users to ask Siri or Cortana to set alarms or find answers to their questions, now other developers and smaller companies can implement their own version of such assistants with new open-source software called Sirius.
Sirius was developed by researchers at Clarity Lab at the University of Michigan. Anyone can contribute to the project on GitHub, and the code is released under the BSD license, which means it’s free for anyone to use or distribute. Vice reports that the program is partially backed by Google, the Defense Advanced Research Projects Agency and the National Science Foundation. The researchers aim to democratize the virtual assistant, of which Amazon, Apple, Google, and Microsoft all have their own versions.
VentureBeat reports that the researchers recently gave presentations on the personal assistant software at the International Conference on Architectural Support for Programming Languages and Operating Systems in Turkey. (And stateside, Sirius made an appearance on Product Hunt, as well.) In an academic paper on the Sirius project (PDF), the researchers explain that Sirius is “an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language.”
Is an open-source Siri a smarter Siri?
Vice reports that, so far, Sirius has only been tested on Ubuntu desktops. The researchers hope that one day, it will make it onto phones and other devices. Jason Mars, the researcher who headed up the project, describes Sirius as a Linux-like version of Siri. Sirius already has capabilities that its commercial counterparts can’t match. For example, you could take a picture, input it in Sirius, and then ask a question about it — a process that Siri doesn’t yet enable. But, in turn, unlike Siri, Sirius isn’t yet an elegant solution. It’s cobbled together by integrating services built with other, well-established open-source projects that together give Sirius its range of capabilities.
These projects integrate techniques and algorithms that are representative of those found in commercial systems, and include Carnegie Mellon University’s Sphinx, which represents the Gaussian Mixture Model of speech recognition and is used in combination with Caffe, an open-source framework for deep neural network; Microsoft Research’s Kaldi and RWTH’s RASR, which represents the industry’s trend toward deep neural network-based speech recognition; OpenEphyra, which represents a question and answer system based on IBM’s Watson; and SURF implemented with OpenCV, which approximates the image matching algorithms used in production applications.
To get Sirius to work, you’ll also need to download each of these programs, but the researchers have made them all available as a download suite. However, Vice notes that downloading the suite and actually getting all of the components to work are two very different things, so the researchers behind Sirius have put together a tutorial on how to work with Sirius.
A university statement refers to the demo version of Sirius as a “talking Wikipedia.” Researchers loaded it with a static version of the site, and users can ask it factual questions. But that knowledge base could be swapped for any type of information that researchers or startups think will prove useful; developers could even create personal assistants for specialized areas, like medicine or auto repair. Researchers at the University of Michigan are working with IBM to develop an assistant that could help with academic advising.
Is open-source the key to improving digital assistants?
The researchers behind Sirius identify Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana as examples of intelligent personal assistants that use input — including the user’s voice, images, and contextual information — to provide assistance by answering questions in natural language, making recommendations, and performing actions. The researchers characterize the area of such personal assistants as “one of the fastest growing Internet services,” and note that the growing market for wearable devices, combined with the fact that the design of wearables is often heavily reliant on voice and image input, “indicates that rapid growth in user demand for IPA services is on the horizon.”
But they explain that, in contrast to the queries of traditional browser-based services, intelligent personal assistants stream queries through software components. Due to the computational intensity of these components and the data-driven models they use, service providers perform the required computation in datacenter platforms instead of performing it on the mobile devices themselves. This kind of “offloading” approach is used by both Siri and Google Now, which send compressed recordings of voice queries to datacenters for speech recognition and semantic extraction.
But the datacenters they use are designed for traditional web services, and personal assistant queries require a significant amount of computing resources as compared to traditional text-based services — the process can be more than 100 times more computationally intensive than a simple text web search — and they determined that if voice were to broadly supplant text for web queries, data-center infrastructure would need to grow by 165 times. So they built Sirius to investigate the viability of various acceleration strategies, and offer insight into future datacenter and server designs for the emerging workload from intelligent personal assistants.
They ran tests to determine the best types of server chips for handling the complex workload of assistants like Sirius, and they conclude that “GPU- and FPGA-accelerated servers can improve the query latency on average by 10x and 16x. Leveraging the latency reduction, GPU- and FPGA-accelerated servers can reduce the TCO [total cost of ownership] by 2.6x and 1.4x, respectively.”
It’s not clear yet if Sirius will prove a success, and other programmers have tried to build an open-source version of Siri and failed. But in a university statements, Mars notes, “Now the core technology is out of the bag, and we all have access to it. Instead of making an app to run on the Apple Watch, for example, maybe I could make my own watch. We’re very excited to see what the world comes together to build and learn with Sirius as a starting point.”
Mars’s reference to Sirius as the Linux of intelligent personal assistants brings up some interesting facts about the research. Linux is a free computer operating system, and competes with commercial operating systems like Apple’s OS X or Microsoft Windows. Though Linux is used in a very small number of computers, it’s been regarded as a revolutionary force in computing. It’s become the most common way to configure and run servers and mainframes, and it forms the foundation of Google’s Android, which has become the most common operating system for smartphones and tablets around the world.
Open-source projects can be modified because their code and design are publicly accessible. They benefit both developers and consumers because they promote the open exchange of ideas and the faster, more collaborative development of software. In the same way that Android has formed the basis of countless different smartphone models from manufacturers around the world, the researchers behind Sirius want it to give developers and device manufacturers a head start on smarter digital assistants.