The coronavirus pandemic has spurred interest in big data to track the spread of the fast-moving pathogen and to plan disease prevention efforts. But the urgent need to contain the outbreak shouldn’t cloud thinking about big data’s potential to do more harm than good.
Companies and governments worldwide are tapping the location data of millions of internet and mobile phone users for clues about how the virus spreads and whether social distancing measures are working. Unlike surveillance measures that track the movements of particular individuals, these efforts analyze large data sets to uncover patterns in people’s movements and behavior over the course of the pandemic.
In the US, mobile advertising companies are reportedly working with the Centers for Disease Control and Prevention and state and local governments to analyze how people’s movements have changed and where they are still congregating based on cell phone location data. Google has launched Community Mobility Reports based on the location data of Google Maps users to provide insights into how Covid-19 measures such as social distancing are working. Under its revamped Disease Prevention Maps initiative, Facebook is providing its research partners with data on population movement and friendship patterns to predict disease spread and compliance with public health measures.
As attractive as these projects might seem, companies and governments should ask whether they will deliver the public health benefits they promise, or misdirect government efforts in ways that endanger the rights of the poorest and most vulnerable people.
The 2014–2016 Ebola epidemic in West Africa offers a cautionary tale on big data. During the outbreak, Harvard-based computational epidemiologists obtained the call records of mobile phone users across the region in a bid to predict the spread of the virus and help public health authorities better target disease-prevention measures. However, this analysis may have been based on the wrong assumption that people’s movements were the primary vector of Ebola transmission, when in fact the virus was primarily spread through caring for the sick and during funeral preparations.
Research on cell phone usage patterns also casts doubt on the theory that call detail records are reliable for tracking people’s movements, even at an aggregate level. In West Africa, many cell phone users own multiple phones to manage various professional, social, and personal roles, and they may share them widely with family, friends, or even entire neighborhood.
These miscalculations illuminate a broader problem: Big data can obscure or misrepresent complex social realities, with dangerous consequences for both public health and human rights.
In the US, lower social media and cell phone penetration rates among older people and rural populations may distort efforts to divine people’s movements from mobile data, and end up providing a flawed basis for understanding how disease spreads within communities and the measures required to slow transmission. Environmental factors that degrade the accuracy of location data, such as the presence of high-rise buildings, could further undermine this analysis.
Mobility patterns captured by such data also reveal little about why people are moving despite shelter-in-place orders and other restrictions on movement. While it may be tempting to tighten the enforcement of social-distancing measures in low-income areas with stubbornly high levels of traffic, this might disproportionately penalize those looking for shelter, traveling to food banks, or seeking reprieve from dangerously cramped quarters.
Big data’s blind spots could lead public health authorities astray, diverting critical resources from proven containment methods such as aggressive testing. They could also lead to draconian restrictions that disproportionately impact the rights of those under- or misrepresented by the data. In Israel, the government’s cell phone location-tracking program has caused complaints that the authorities are erroneously confining people to their homes based on inaccurate location data.
While the capacity of big data to help curb the coronavirus outbreak is, at best, uncertain, its risks to privacy are immense. Governments and companies have cited the anonymization of personal data as a key privacy safeguard, but multiple studies show that this may only delay rather than prevent the person’s re-identification. Location data is particularly vulnerable, since it can be combined with public and private records to create an intricate and revealing map of a person’s movements, associations, and activities.
Google and Facebook say that their initiatives merely disclose aggregated insights into people’s behaviors, not detailed location histories. While data aggregation may be better for privacy, it should be accompanied by other safeguards, such as limits on who has access to data and for what purpose, deletion requirements, and sunset clauses. However, data-sharing practices in the technology sector historically have lacked transparency, making it difficult for data subjects and the broader public to determine whether these safeguards exist, or how stringently they are enforced.
Judicious reliance on data-driven technologies in the current crisis can improve our understanding of the disease, broaden access to health care, and help us stay connected. But the impulse to harness data for good should not be a license to conduct risky experiments that sacrifice privacy and civil liberties without any clear payoff–harmful results that could persist well after the crisis is over.