Less Than You Think: Prevalence and Predictors of Fake News Dissemination on Facebook from Science Advances

We cannot definitively rule out the possibility that there is an omitted variable biasing our estimates, although we have included controls for many individual-level characteristics theoretically related to acceptance of misinformation and willingness to share content online. Even if our models are correctly specified, we use observational data that cannot provide causal evidence on the determinants of fake news sharing. This study takes advantage of a novel and powerful new dataset combining survey responses and digital trace data that overcomes well-known biases in sample selection and self-reports of online behavior (8, 9). However, we are still limited in our ability to collect these data unobtrusively. Despite our high response rate, half of our respondents with Facebook accounts opted not to share their profile data with us. Any inferences are therefore limited insofar as the likelihood of sharing data is correlated with other characteristics of interest.

In addition, while our approach allows for enhanced measurement of online sharing behavior, we lack data on the composition of respondents’ Facebook News Feeds. It is possible, for instance, that very conservative Facebook users were exposed to more fake news articles in their networks and that the patterns we observe are not due to differential willingness to believe or share false content across the political spectrum. While other evidence suggests the limits of the “echo chambers” narrative (23), we cannot rule out this possibility. Similarly, it may be the case that the composition of older Facebook users’ News Feeds differs in systematically important ways from that of younger users; while we lack the data in the present work to test this proposition, future research with access to these data could prove illuminating. These concerns aside, the evidence we have presented is strongly suggestive of an emerging relationship between not only ideological affiliation but also age and the still-rare tendency to spread misinformation to one’s friends on social media platforms. If the association with age holds in future studies, there are a whole host of questions about the causes, mechanisms, and ramifications of the relationship that researchers should explore. First, how much of the effect can be attributed to lack of digital or media literacy as opposed to explanations rooted in cognitive abilities and memory? Answering this question will require developing measures of digital literacy that can be behaviorally validated. Second, what is the role of the (currently) unobserved News Feed and social network environment on people’s tendency to see, believe, and spread dubious content? How are consumption and spreading, if at all, related? How does social trust over networks mediate the relationship between age and the sharing of misinformation?

Last, if media literacy is a key explanatory factor, then what are the interventions that could effectively increase people’s ability to discern information quality in a complex, high-choice media environment replete with contradictory social and political cues? Both theory and existing curricula could serve as the basis of rigorously controlled evaluations, both online and in the classroom, which could then help to inform educational efforts targeted at people in different age groups and with varying levels of technological skill. These efforts leave open the possibility that simple interventions, perhaps even built into online social environments, could reduce the spread of misinformation by those most vulnerable to deceptive content. Developing these innovations would be further aided by increased cooperation between academic researchers and the platforms themselves (24).

MATERIALS AND METHODS

Survey data

We designed and conducted a panel survey (fielded by online polling firm YouGov) during the 2016 U.S. presidential election to understand how social media use affects the ways that people learn about politics during a campaign. In addition to including a rich battery of individual-level covariates describing media use and social media use, we were able to match many respondents to data on their actual Facebook behavior (see below). The survey had three waves. Wave 1 was fielded 9 April to 1 May 2016 (3500 respondents), wave 2 was fielded 9 September to 9 October 2016 (2635 respondents), and wave 3 was fielded 25 October to 7 November 2016 (2628 respondents).

Facebook profile data

We were able to obtain private Facebook profile data from a substantial subset of our survey respondents. Starting 16 November 2016, YouGov e-mailed all respondents with a request to temporarily share information from their Facebook profiles with us. That request read, in part: “We are interested in the news people have read and shared on Facebook this year. To save time and to get the most accurate information, with your permission we can learn this directly from Facebook. Facebook has agreed to help this way. Of course, we would keep this information confidential, just like everything else you tell us.”

Those who consented to do so were asked to provide access to a Facebook web application and to specifically check which types of information they were willing to share with us: fields from their public profile, including religious and political views; their own timeline posts, including external links; and “likes” of pages. We did not have access to the content of people’s News Feeds or information about their friends. Respondents read a privacy statement that informed them that they could deactivate the application at any time and that we would not share any personally identifying information. The app provided access for up to 2 months after respondents who chose to share data agreed to do so. This data collection was approved by the New York University Institutional Review Board (IRB-12-9058 and IRB-FY2017–150).

Of 3500 initial respondents in wave 1, 1331 (38%) agreed to share Facebook profile data with us. The proportion rises (49.1%) when we consider that only 2711 of our respondents said that they use Facebook at all. We were successfully able to link profile data from 1191 of survey respondents, leaving us with approximately 44% of Facebook users in our sample. (See the “Sample details” section for a comparison of sample characteristics on various demographic and behavioral dimensions. Linked respondents were somewhat more knowledgeable and engaged in politics than those who did not share data.) For the purposes of this analysis, we parsed the raw Facebook profile data and identified the domains of any links posted by respondents to their own timelines.

We did not have access to posts that respondents deleted before consenting to temporarily share their data with us. It is theoretically possible that some respondents posted fake news articles to their profiles and then deleted them before we had the opportunity to collect the data. To the extent that this activity reflects second-guessing or awareness of how fake news posting is perceived by social connections, we interpret the sharing data that we were able to gather as genuine—posts that our respondents did not feel compelled to remove at a later time. There may be an additional concern that some types of people were more likely to delete fake news articles that they posted, leading us to biased inferences. However, to the extent that these characteristics are negatively correlated with the characteristics that predict posting in the first place, such deletion activity (which is likely very rare) should reduce noise in the data that would otherwise be generated by split-second sharing decisions that are immediately retracted.

Defining fake news

The term fake news can be used to refer to a variety of different phenomena. Here, we largely adopted the use suggested in (25) of knowingly false or misleading content created largely for the purpose of generating ad revenue. Given the difficulty of establishing a commonly accepted ground-truth standard for what constitutes fake news, our approach was to build on the work of both journalists and academics who worked to document the prevalence of this content over the course of the 2016 election campaign. In particular, we used a list of fake news domains assembled by Craig Silverman of BuzzFeed News, the primary journalist covering the phenomenon as it developed (7). As a robustness check, we constructed alternate measures using a list curated by Allcott and Gentzkow (2), who combined multiple sources across the political spectrum (including some used by Silverman) to generate a list of fake news stories specifically debunked by fact-checking organizations.

The Silverman list is based on the most-shared web domains during the election campaign as determined by the analytics service BuzzSumo. Silverman and his team followed up their initial results with in-depth reporting to confirm whether a domain appeared to have the hallmark features of a fake news site: lacking a contact page, featuring a high proportion of syndicated content, being relatively new, etc. We took this list and removed all domains classified as “hard news” via the supervised learning technique used by Bakshy et al. (23) to focus specifically on fake news domains rather than the more contested category of “hyperpartisan” sites (such as Breitbart). (The authors used section identifiers in article URLs shared on Facebook that are associated with hard news—“world,” “usnews,” etc.—to train a machine learning classifier on text features. They ultimately produced a list of 495 domains with both mainstream and partisan websites that produce and engage with current affairs.) The resulting list contains 21 mostly pro-Trump domains, including well-known purveyors such as abcnews.com.co, the Denver Guardian, and Ending the Fed. In analyses using this list, we counted any article from one of these domains as a fake news share. (See below for details on these coding procedures and a list of domains in what we refer to as our main BuzzFeed-based list.)

The Allcott and Gentzkow list begins with 948 fact checks of false stories from the campaign. We retrieved the domains of the publishers originating the claims and again removed all hard news domains as described above. Then, we coded any article from this set of domains as a fake news article. For robustness, in table S9, we used only exact URL matches to any of the 948 entries in the Allcott and Gentzkow list as a more restrictive definition of fake news, but one that does not require assuming that every article from a “fake news domain” should be coded as fake news. Since the list contains the researchers’ manual coding of the slant of each article, we also presented models analyzing pro-Trump and pro-Clinton fake news sharing activity only.

Additional lists

In addition to these primary measures, we report (below) analyses using three supplementary collections of fake news articles produced after the election. Two lists were also produced by Silverman and his team at BuzzFeed (26), and the third is a crowdsourced effort headed by Melissa Zimdars of Merrimack College. Our key results are essentially invariant to whatever measure of fake news we use.

Pages: 1 · 2 · 3 · 4

Money and Computing, News and Issues, History, Culture Watch, The Internet, Issues, Politics, Government, Media, Learning, Grandparenting, Senior Women Web, Articles, Sightings, What's New Add comments