Last week, when news broke (again) that Cambridge Analytica had allegedly misused 50 million Facebook users’ data, it immediately raised a difficult question: When a company possesses information about some 2 billion people, is its chief obligation to share that information, or protect it?
The answer’s not as obvious as you might think. To social and computer scientists, Facebook is arguably the most valuable data repository on earth. Insight into many of the most pressing issues of our time, from social media’s role in political processes to technology’s impact on individual wellbeing, could well reside on the social network’s servers—a fact that has led many scientists and policymakers to call for more permeable borders between public researchers and Facebook’s private data hoard.
But then Cambridge Analytica happened, and gave a lot of researchers a scare: Tapping into Facebook’s data is already more onerous than many of them would like. How would the company’s reaction to one of its most devastating public disasters to date affect their access going forward?
On Wednesday, they got their first trace of an answer. In a post published to his Facebook wall, Mark Zuckerberg acknowledged his company’s failure to protect its users’ data and outlined how the social network would protect that data going forward. Of relevance to researchers is the second step in Facebook’s three-part plan:
…we will restrict developers’ data access even further to prevent other kinds of abuse. For example, we will remove developers’ access to your data if you haven’t used their app in 3 months. We will reduce the data you give an app when you sign in – to only your name, profile photo, and email address. We’ll require developers to not only get approval but also sign a contract in order to ask anyone for access to their posts or other private data. And we’ll have more changes to share in the next few days.
To understand how these changes could affect research, it helps to understand the ways scientists can currently gain access to Facebook’s user data.
Method one is obvious: Work at Facebook. It used to be companies would farm out data analysis to third parties, but Facebook can afford to employ psychologists, sociologists, and data scientists full time. That’s not to say company employees are given carte blanche: No Facebook employee has access to all the company’s data all the time. To its credit, Facebook has an internal ethics review process for vetting company research, which means access to information is granted on a need-to-use basis and subject to auditing. (This policy was created in reaction to Facebook’s notorious emotional contagion study, another of the company’s missteps.) According to a former member of Facebook’s Core Data Science team who declined to be named for this story, this process is rigorously policed. “If you went digging around in anything you clearly weren’t supposed to, you’d be fired.”
Method two is to partner with Facebook as a collaborator in some official capacity. Which can mean a number of different things. For example, Facebook recently granted Stanford economist Raj Chetty and a team of researchers access to a trove of de-identified user data, to aid his investigations on income inequality in America. (Those shocking New York Times infographics on the fate of black boys in America? They’re based on Chetty’s work.)
But Chetty’s relationship with Facebook is pretty unique. One of the more typical ways that Facebook collaborates with academia is to hire PhD candidates as paid interns, making them fixed-term, full time employees. That also means they sign all the things employees sign, including intellectual property and nondisclosure agreements. The PhD student may spend a lot of time with raw data, but any and all analysis happens entirely on Facebook infrastructure. And when the candidate’s internship comes to an end, Facebook’s privacy and policy teams ensure any data they take with them—to continue working towards a publication for example—exists only in aggregate form. (Another related mode of collaboration is to hire researchers as contractors; they, too, get access—and NDAs.)
The third level of access is to tap into Facebook’s data through its API. This is how psychologist Aleksandr Kogan acquired information on some 50 million Facebook users, via an app-based quiz called thisismydigitallife. Researchers still use apps to collect data from users who agree to participate in studies—albeit far less than they could when Kogan did, back in 2014. The language in Zuckerberg’s post indicates Facebook will restrict app-based data access even further going forward, though it doesn’t designate the extent of that restriction, or how it will apply to researchers, specifically.
“I think the primary concern for Facebook with that sort of thing would be more with respect to app developers and corporations, but it could certainly affect scientists as a byproduct,” says MIT social scientist Dean Eckles, whose research often incorporates data from Facebook. It could also mean more paperwork for anyone looking to access data through the API—which isn’t necessarily a bad thing; legitimate researchers, Eckles says, “will be largely willing to jump through those hoops.”
Less clear is how Facebook’s restrictions will impact researchers not conducting app-based research. Solomon Messing, director of Pew Research Center’s Data Labs, relies on Facebook’s API to study the effects of congressional rhetoric on the platform. “We’re interested in constituent communication, how their audience responds to different forms of messages, and the API gives us the ability to get the text of what members say and user engagement data—likes, comments, shares, stuff like that.” He says whether Facebook’s crackdown affects his research will depend on specifics—how the company chooses to restrict data access, and what kinds of approval processes he’ll need to go through moving forward. “But I would hope that academic researchers will be able to access whatever is necessary for them to do their research in an ethical and legal way.”
Oh, and some transparency would be nice, too, says Catherine Brooks, director of the Center for Digital Society and Data Studies at the University of Arizona. “There’s an entire world of knowledge inside Facebook, and questions we can’t answer unless we have access to the data,” she says. “But the public also needs more information about how that data is being collected and used, and an opportunity to consent to provide it.”
The challenge for Facebook will be allocating the resources necessary to make it happen. The company currently possesses some of the best social data in the world. Is its chief obligation to share it or protect it? Responsible researchers think the answer is yes.