Why the Internet Needs to Overcome Its Language Barriers
Seven thousand languages are currently spoken around the world, but the Internet speaks surprisingly few of them. That’s a problem both for new Internet users, who log on to the Internet for the first time and need content in a language that they can understand, and the websites, apps, and services that are trying to reach new users in emerging markets around the world.
Tech giants like Google and Facebook realize that they don’t speak nearly enough languages, and are working to identify the languages that they’ll need to support as the next billion users come online around the world. As Quartz reported recently, Google announced the Indian Language Internet Alliance, which aims to get half a billion Indians online by 2017 by showing them content in local languages.
Facebook is also reportedly defaulting to local languages in India, and Iris Orriss, director of internationalization and localization at Facebook, recently wrote about “the internet’s language barrier” in an edition of Innovations, a quarterly journal published by MIT Press, subtitled “Digital Inclusion: The Vital Role of Local Content” (PDF). Orriss wrote:
There are many barriers to connectivity in different parts of the world. For the majority of people not yet connected, the main obstacles are social and economic. The cost of data and devices is too high, and demand for Internet services may be low among people who have yet to understand their value. For a smaller population, mostly in remote regions, it is the absence of basic Internet infrastructure that holds back the spread of the internet — cell towers have yet to be constructed and communities don’t yet have electricity.
Orriss notes that while “these are enormous problems, and they rightly deserve a great deal of attention from those working to close the digital divide,” there is another, often-overlooked challenge that is “just as critical to getting more people to use the Internet and participate in the global knowledge economy. It’s the language barrier.”
She explains that while more than 7,000 languages are spoken around the world, Facebook is available in 75 languages, with another 40 in translation. Mobile devices are available in even fewer languages, and most don’t support the fonts or keyboard input that would be required to display and process non-Roman scripts.
For instance, Android, the most popular mobile operating system in the world, only recently added support for the fonts and input methods that display characters for Hindi, a language with more than 250 million native speakers. Orriss explains, “The result is that people in emerging markets are disinclined to connect to the Internet because they have to use a foreign language to navigate it. This is the language barrier.”
Supporting more languages on mobile devices is important for emerging markets, particularly in India, Africa, and South East Asia, where language diversity is high. Orriss explains, “While we don’t know for sure how many languages it will take to connect everyone, we do know it will take many more than are currently available online.” She says that in its efforts to overcome the language barrier, Facebook has identified three problem areas which pose significant challenges to reaching global populations.
The first is the problem of enabling everyone to use the right language from the start, which apps and websites can do either by predicting the preferred language or by enabling users to select the language they understand for the registration process. Service providers also need to ensure a good user experience in different languages by accounting for cultural norms in terms of multiple scripts or color associations. And multiple parties must work together to build a thriving online ecosystem in every language, so that mobile operating systems and important information on the Internet are available in languages that all users understand, and device makers and service providers all support as many languages as possible.
Quartz’s Leo Mirani notes that while language barriers are hardly a new issue, they’re becoming an increasingly visible problem for mobile operators and web giants who are looking to expand their reach to emerging markets and to bring more people online. Additionally, the imbalance of content on the Internet has become “too stark to avoid,” with the majority of the content about Western Europe, Japan, Korea, and North America, and originating in those places as well, according to Mark Graham, an associate professor at the Oxford Internet Institute.
According to a white paper published by Mozilla and GSMA, titled “Unlocking relevant Web content for the next 4 billion people” (PDF), English-language content continues to dominate the Internet — in spite of the fact that a small proportion of the world’s population speaks English as a first language.
The paper explains the disproportion of English-language internet content to the number of people in the world who actually speak English as a first language:
Just over half (55.8%) of Web content is estimated to be in English despite the fact that less than 5% of the world’s population speak it as a first language, with only 21% estimated to have some level of understanding. By contrast, some of the world’s most widely spoken languages, such as Arabic or Hindi, account for a relatively small proportion of the Web’s content (0.8% and less than 0.1% respectively). Those designing content have a clear imperative to deliver material that is relevant, understandable, and meets the demands of its audience. With some notable exceptions, this is not something that has yet taken place in much of the world.
As Quartz reports, discrepancies in Internet access don’t fully explain the imbalance. Graham says that the Middle East scores much lower in related web content than would be expected, based on the number of people online in the region. 80% of the web is dominated by 10 languages, as the World Bank reported over the summer, with at least 80% of the Internet’s content available in English, Chinese, Spanish, Japanese, Portuguese, German, Arabic, French, Russian, or Korean.
Further, the World Bank report explains that nearly half of the world’s languages could die out by the end of the century, given that 96% of these languages are spoken by a mere 4% of the world’s population.
“A vernacular language is the native language or native dialect of a specific population, region or country that is more the language of ordinary speech than formal writing,” the report explains. “Every day, a dozen of these vernacular languages disappear.”
According to research by Andras Kornai, called “Digital Language Death” and published by PLOS, only about 250 languages could be considered “well-established” online, with another 140 considered borderline. Of the 7,000 languages currently spoken around the world, approximately 2,500 will survive for another century — but many fewer will make it on to the Internet. Additionally, some 3,535 languages have no writing system whatsoever, and thus stand little chance of being included on the Internet.
According to UNESCO’s page on endangered languages, a language disappears “when its speakers disappear or when they shift to speaking another language – most often, a larger language used by a more powerful group,” and many think that the lack of language diversity on the Internet makes it more likely that local languages will fall out of use and become extinct.
The World Bank’s report explains that the Internet has contributed to the trend toward reduced language diversity. African languages, for example, are represented on the Internet, “but not as a widespread communication medium and often with minimum content in the languages themselves.” The report posits that in countries like Gabon, where daily communications largely take place in vernacular languages in rural areas but are in decline in urban areas, broadband access should be provided in areas where vernacular languages are spoken, and online information should be made available in those languages — especially information that relates to agriculture, education, and health.
In their whitepaper, Mozilla and GSMA note that if operators keep investing in network and capacity and the price of mobile phones continues to drop, more people around the world will be able to become Internet users, but “the next billion users will find a less welcoming content landscape, which is effectively closed to their contributions except for a handful of private content silos.”
The whitepaper posits that enabling local content is the key to unlocking the value of the Internet for the next billion users. Mozilla and GSMA advocate for “a more dispersed digital content ecosystem, in terms of how content is created, distributed and monetized,” to counter the way that digital content creation currently centers around a few geographic locations and languages.
The whitepaper also points to the increasing dominance of the “Google and Apple duopoly” in mature markets as one of the consequences of the shift to mobile and developers and service providers opting to produce content for a “global” audience instead of local ones. However, the paper explains that Mozilla and GSMA envision ways that “the arrival of the Web through smartphones in emerging markets represents an opportunity for challengers to this duopoly, through the arrival of open, collaborative solutions that allow for interoperability across multiple
platforms and that ensure healthy participation from all players across the mobile ecosystem.”
Mozilla and GSMA want to create local alternatives to Android and Apple, enabling “a coalition of mobile operators, device manufacturers, educators, international development donors, and NGOs” to positively shape the future of the web.
The whitepaper adds that “For those willing to address this issue, including mobile operators, this could represent a host of new revenue streams across a broad range of areas, such as health, education and e-commerce, as those currently underserved in those sectors connect to the Web for the first time.”
That statement demonstrates that tackling the Internet’s language barrier is as much about making knowledge and online communities open to more users who speak a diversity of languages as it is about companies gaining a foothold in the emerging markets where they must expand if they want to continue to grow. For the Googles and Facebooks of the world, finding a way to develop local-language-friendly products and services is mandatory. The faster that tech giants figure out ways to support local content and vernacular languages, the better — both for speakers of local languages and for the continued use of the languages themselves.