Do Data Studies Leave Out the Poor and Only Count the Rich?
Data science is the new, hip thing when it comes to marketing, business strategy, and even government decision making. Data scientists appear unstoppable, attempting to predict everything from which movie you might like to watch next (think Netflix, Amazon) to where and when the next crime might occur in a given city borough.
But critics of big data have argued that with the emergence of data science is also a phenomenon called “data poverty,” in which certain demographics of society are under-represented in the data. It’s this under-representation which then leads to improvements and advantages for those living in data-rich environments, while those living in data poverty experience none of the benefits big data can bring to the table.
To get a better picture of what data inequality looks like, we should take a step back. What, exactly, is big data, and what’s the big deal? The term “big data” is incredibly broad, but one way to interpret it is as any data set that is so large it is abnormally challenging to manage, process, and analyze. Think of data sets on the order of the entirety of Wikipedia, or all search traffic through Google.
The great thing about big data analysis is that it can provide extremely valuable insight that would be otherwise be impossible to gain. Big data can help reveal trends, patterns, and associations, particularly when it comes to human behavior and interactions. It’s an extremely valuable industry with applications in everything from marketing to medicine.
Big data has its drawbacks, though. For one, there’s the eerie, Big Brother-esque nature of it all. Privacy and information security concerns have been ringing loudly ever since the Snowden leaks. Many people have become uncomfortable with the idea of sharing too much personal information with businesses, and activists are trying to unwind the government’s aggressive position on data collection. But even if consumers and businesses, or citizens and the government, could find common ground on and develop a cogent set of rules for information collection, management, and use, there is still the issue of data poverty.
Daniel Castro, who directs the Center for Data Innovation, has written on the phenomenon of unequal representation in data, which he terms “data poverty.” Castro notes that, “Already, gaps are appearing where certain groups of individuals do no have data collected about them because of where they live,” and adds that, “If this trend towards a ‘data divide’ continues, we might even see the rise of ‘data deserts’ — areas of the country characterized by a lack of access to high-quality data that may be used to generate social and economic benefits.” As policymakers and others in government begin to rely more and more heavily on big data, Castro argues, we need to take steps to insure that everyone is fairly represented.
Some of you might be wondering: What, if any, are the consequences of living in data poverty? Well, unfortunately, as big data continues to grow, living in a data-poor environment could potentially have a very profound impact on your life, and its likely to garner even more influence as the science continues to progress and as researchers, data scientists, and policymakers further develop and understand its applications.
Castro notes that individuals who grow up in data-rich environments — that is, environments where data is routinely taken about themselves and their environment — have many potential advantages over those who did not grow up in such an environment. For example, individuals from data-rich backgrounds could have improved health outcomes, increased access to financial services, enhanced educational opportunities, and even more civic participation. “If certain groups are routinely excluded from data sets, their problems may be overlooked and their communities held back in spite of progress elsewhere,” he adds.
“Communities that are poor in data, as well as the individuals living in those communities, may fail to thrive. Rather than being the new oil, data may be the new oxygen,” Castro says.
A good example of the controversy currently surrounding the collection and analysis of big data is an ongoing initiative on the part of the Oregon Department of Transportation in Portland, Oregon. Portland is a city known for its bike friendliness. “Hawthorne Bridge, one of the city’s five bike-friendly bridges, averages 1.7 million trips a year, and the city boasts 300 miles of bike lanes,” according to The Verge, who first published the story.
Yet, up until recently, the city was still utilizing the most rudimentary methods for collecting data on cyclists and the trips they make. The city was sending out volunteers to count riders at various intersections around the city; as a result, the department’s data on cyclists has historically been piss poor. Without better information, notes The Verge, “it’s difficult to improve upon what already exists.”
That is, until Margi Bradway, who serves as ODOT’s active transportation policy lead, noticed that all of her bike enthusiast friends were using an app to track their routes. That app was Strava, an app that users can download onto any GPS device (including their smartphones) and use to track their running or cycling activities. Strava describes itself as “a community of athletes from all over the world … Strava lets you track your rides and runs via your iPhone, Android, or dedicated GPS device and helps analyze and quantify your performance.”
Many cycling advocates, however, say that licensing Strava data to make Portland more bike-friendly is leaving too many people out of the equation. Commuters, for one, who are perhaps less likely to utilize an app that describes itself as being for “athletes,” as well as those people who don’t have or can’t afford a smart phone, “a technology that still isn’t affordable to everyone.”
Elly Blue, a Portland resident, bike activist, and author of Bikenomics: How Bicycling Can Save the Economy, says that the “people being counted by Strava are those who already have a powerful voice in bicycle advocacy and whose needs are already well on their way to being met.”
Bradway, of ODOT, says she understands that utilizing Strava isn’t a panacea for the city’s data deficit when it comes to cyclists. “Don’t let perfect be the enemy of good,” she said, in an interview, per The Verge. Meanwhile, other cities are following Portland’s example; Strava says it has already partnered with 15 other cities around the world in order to implement similar programs.
The emergence of big data has given policymakers and consumers alike a lot to think about, but perhaps one of the most profound challenges facing those who might act on the information Big Data provides is representation and data poverty. Castro notes that in order to avoid data deserts, “policymakers should begin a concerted effort to address the ‘data divide’ — the social and economic inequalities that may result from lack of collection or use of data about individuals or communities.”
But Castro notes that while disparities in data are certainly a problem within the burgeoning field of data science, local governments and businesses shouldn’t take that too mean that data-driven solutions should be avoided. “The solution to the data divide is not to take a steps back from using data” — instead, he says, businesses, policymakers, and others who seek to utilize data-driven solutions need to utilize their creativity to insure that no groups are systematically excluded. If addressed, everyone will benefit from the social and economic improvements data has the power to inform.