In light of the U.S. Department of Justice’s report describing racism in the Ferguson, Mo., police department and the recent publicized deaths of unarmed citizens at the hands of police, American police are under much scrutiny. The president even launched a Task Force on 21st Century Policing in December, looking for, among other information, more data on the number of people killed by police. Unfortunately, this is a number that no one knows.

The Bureau of Justice Statistics’ latest report found that more than 25% of killings by police are not recorded in either of two federal databases (its own Arrest-Related Deaths database and the FBI’s Supplementary Homicide Reports). The report suggested that the actual number of annual police killings is probably around 930, which is twice of what the government has estimated. And FiveThirtyEight, using data from Killed By Police, suggests it’s even more that that. It estimates about 1,240 annual killings, “if you assume that local law enforcement agencies that don’t report any killings have killed people at the same rate as agencies that do.”

So what’s the deal with the conflicting numbers? Why isn’t such an important set of data accounted for?

Why are these deaths so hard to account for?

The BJS report showed the clear lack of data, noting that the FBI tally “is estimated to cover 46% of officer-involved homicides at best” for the years 2003-2009 and 2011. And the FBI’s published data covers even less — more like 41% — in part because it doesn’t include data from Florida. Meanwhile the BJS’s Arrest-Related Deaths database recorded about at best 49% of law enforcement homicides during the same years.

To get a more accurate number for its new report, the BJS merged the two databases of killings by police, its Arrest-Related Deaths database and the FBI’s Supplementary Homicide Reports. It estimated the number of cases that were repeated in both databases, so it wouldn’t count any cases twice and to estimate the number of cases that were in neither database. But these are big assumptions it had to make, and the researchers had no way of knowing whether the data sets it was looking at were independent or where they overlapped. And the BJS was only working with two data sets. With more data, coming to a more accurate estimate would’ve been easier.

Researchers estimate the gaps in data like this by using a method called capture-recapture. Here’s a description of how that process works from FiveThirtyEight:

One way to do that is to compare more than one database. Suppose that the name, age and hometown of each person killed by police in the U.S. last year were written on separate pieces of paper and then each piece of paper was folded up and placed in the middle of a small ball. Then all the balls, representing all the people killed by police, were placed in a large barrel. Suppose we didn’t have the time and resources to count all the balls. So we try another way. First, we send an FBI analyst to the barrel. She collects 10 of the balls at random, removes the pieces of paper and makes a list with each of the victim’s information. Then she puts the balls back in the barrel and stirs them in. Now a BJS analyst approaches. He removes 10 balls randomly and takes down the information of all those victims before returning the balls.

The more overlap seen on the lists, the fewer balls are in the barrel (the fewer people killed by police). If the lists are the same, then there are only 10 victims. But if there are five victims that appeared on both lists, there are about 20 victims, and if only one victim appears on both lists, there are about 100 victims. “And if there is no overlap — if each analyst’s list is entirely different from the other’s — then we can’t say much about the number of victims, although because of the way the math works out, there are probably at least 200,” FiveThirtyEight concludes.

How can we improve this data collection?

The fact that the BJS attempted such an estimation was a step in the right direction, according to Patrick Ball, co-author of a critique on the BJS’s report and executive director of the Human Rights Data Analysis Group. Ball told FiveThirtyEight that it’s not the BJS’s fault that it underestimated the number — it’s normal that government agencies don’t audit their own data or reviewing these records that are often missing information.

Clearly, looking at the method for estimated the data, there’s a strong need for more data. President Barack Obama’s task force recently issued recommendations for better data collection. “There was a great emphasis on the need to collect more data,” Obama said. “Right now, we do not have a good sense, and local communities do not have a good sense, of how frequently there may be interactions with police and community members that result in a death, result in a shooting.”

FiveThirtyEight suggests that the BJS and FBI databases could be cross-referenced with other data collections, citing again the independently operated database, which uses media reports of killings and counts more than 1,000 killings annually, as well as other government databases that include police killings, like the CDC.

