Facebook Research Critiques

Reminds me of when you had to write FB posts in third person.

Engineers at Facebook have worked to continually refine the ‘Edgerank‘ algorithm over the last five or six years or so. They are addressing the problem of how to manage the 1500+ pieces of content available at any moment from “friends, people they follow and Pages” into a more manageable 300 or so pieces of content. Questions have been asked about how Edgerank functions from two related groups. Marketers and the like are concerned about ‘reach’ and ‘engagement’ of their content. Political communication researchers have been concerned about how this selection of content (1500>300) relies on certain algorithmic signals that potentially reduces the diversity of sources. These signals are social and practice-based (or what positivists would call ‘behavioral’). Whenever Facebook makes a change to its algorithm it measures its success in the increase in ‘engagement’ (I’ve not seen a reported ‘failure’ of a change to the algorithm), which means interactions by users with content, including ‘clickthrough rate’. Facebook is working to turn your attention into an economic resource by manipulating the value of your attention through your News Feed and then selling access to your News Feed to advertisers.

The “random sample of 7000 Daily Active Users over a one-week period in July 2013” has produced many of the figures used in various online news reports on Facebook’s algorithm. Via Techcrunch

Exposure to ideologically diverse news and opinion on Facebook

Recently published research by three Facebook researchers was designed to ascertain the significance of the overall selection of content by the Edgerank algorithm. They compared two large datasets. The first dataset was of pieces of content shared on Facebook and specifically ‘hard’ news content. Through various techniques of text-based machine analysis they distributed these pieces of content along a single political spectrum of ‘liberal’ and ‘conservative’. This dataset was selected from “7 million distinct Web links (URLs) shared by U.S. users over a 6-month period between July 7, 2014 and January 7, 2015”. The second dataset was of 10.1 million active ‘de-identified’ individuals who ‘identified’ as ‘conservative’ or ‘liberal’. Importantly, it is not clear if they only included ‘hard news’ articles shared by those in the second set. The data represented in the appended supplementary material suggests that this was not the case. There are therefore two ways the total aggregate Facebook activity and user base was ‘sampled’ in the research. The researchers combined these two datasets to get a third dataset of event-based activity:

This dataset included approximately 3.8 billion unique potential exposures (i.e., cases in which an individual’s friend shared hard content, regardless of whether it appeared in her News Feed), 903 million unique exposures (i.e., cases in which a link to the content appears on screen in an individual’s News Feed), and 59 million unique clicks, among users in our study.

These events — potential exposures, unique exposures and unique clicks — are what the researchers are seeking to understand in terms of the frequency of appearance and then engagement by certain users with ‘cross-cutting’ content, i.e. content that cuts across ideological lines.

The first round of critiques of this research (here, here, here and here) focuses on various aspects of the study, but all resonate with a key critical point (as compared to a critique of the study itself) that the research is industry-backed and therefore suspect. I have issues with the study and I address these below, but they are not based on it being an industry study. Is our first response to find any possible reason for being critical of Facebook’s own research simply because it is ‘Facebook’?

Is the study scientifically valid?

The four critiques that I have linked to make critical remarks about the sampling method and specifically how the dataset of de-identified politically-identifying Facebook users was selected. The main article is confusing and it is only marginally clearer in the appendix but it appears that both samples were validated against the broader US-based Facebook user population and total set of news article URLs shared, respectively. This seems clear to me, and I am disconcerted that it is not clear to those others that have read and critiqued the study. The authors discuss validation, specifically point 1.2 for the user population sample and 1.4.3 for the validation of the ‘hard news’ article sample. I have my own issues with the (ridiculously) normative approach used here (the multiplicity of actual existing entries for political orientation are reduced to a single five point continuum of liberal and conservative, just… what?), but that is not the basis of the existing critiques of the study.

Eszter Hargittai’s post at Crooked Timber is a good example. Let me reiterate that if I am wrong with how I am interpreting these critiques and the study, then I am happy to be corrected. Hargittai writes:

Not in the piece published in Science proper, but in the supplementary materials we find the following:  All Facebook users can self-report their political affiliation; 9% of U.S. users over 18 do. We mapped the top 500 political designations on a five-point, -2 (Very Liberal) to +2 (Very Conservative) ideological scale; those with no response or with responses such as “other” or “I don’t care” were not included. 46% of those who entered their political affiliation on their profiles had a response that could be mapped to this scale. To recap, only 9% of FB users give information about their political affiliation in a way relevant here to sampling and 54% of those do so in a way that is not meaningful to determine their political affiliation. This means that only about 4% of FB users were eligible for the study. But it’s even less than that, because the user had to log in at least “4/7 days per week”, which “removes approximately 30% of users”.  Of course, every study has limitations. But sampling is too important here to be buried in supplementary materials. And the limitations of the sampling are too serious to warrant the following comment in the final paragraph of the paper:  we conclusively establish that on average in the context of Facebook, individual choices (2, 13, 15, 17) more than algorithms (3, 9) limit exposure to attitude-challenging content. How can a sample that has not been established to be representative of Facebook users result in such a conclusive statement? And why does Science publish papers that make such claims without the necessary empirical evidence to back up the claims?

The second paragraph above continues with a further sentence that suggestions that the sample was indeed validated against a sample of 79 thousand other FB US users. Again, I am happy to be corrected here, but this at least indicate that the study authors have attempted to do precisely what Hargittai and the other critiques are suggesting that they have not done. From the appendix of the study:

All Facebook users can self-report their political affiliation; 9% of U.S. users over 18 do. We mapped the top 500 political designations on a five-point, -2 (Very Liberal) to +2 (Very Conservative) ideological scale; those with no response or with responses such as “other” or “I don’t care” were not included. 46% of those who entered their political affiliation on their profiles had a response that could be mapped to this scale. We validated a sample of these labels against a survey of 79 thousand U.S. users in which we asked for a 5-point very-liberal to very-conservative ideological affiliation; the Spearman rank correlation between the survey responses and our labels was 0.78.

I am troubled that other scholars are so quick to condemn a study for not being valid when it does not appear as if any of the critiques (at the time of writing) attempt to engage with the methods but which the study authors tested validity. Tell me it is not valid by addressing the ways the authors attempted to demonstrate validity, don’t just ignore it.

What does the algorithm do?

A more sophisticated “It’s Not Our Fault…” critique is presented by Christian Sandvig. He notes that the study does not take into account how the presentation of News Feed posts and then ‘engagement’ with this content is a process where the work of the Edgerank algorithms and the work of users can not be easily separated (orig. emphasis):

What I mean to say is that there is no scenario in which “user choices” vs. “the algorithm” can be traded off, because they happen together (Fig. 3 [top]). Users select from what the algorithm already filtered for them. It is a sequence.**** I think the proper statement about these two things is that they’re both bad — they both increase polarization and selectivity. As I said above, the algorithm appears to modestly increase the selectivity of users.

And the footnote:

**** In fact, algorithm and user form a coupled system of at least two feedback loops. But that’s not helpful to measure “amount” in the way the study wants to, so I’ll just tuck it away down here.

A “coupled system of at least two feedback loops”, indeed. At least one of those feedback loops ‘begins’ with the way that users form social networks — that is to say, ‘friend’ other users. Why is this important? Our Facebook ‘friends’ (and pages and advertisements, etc.) serve as the source of the content we are exposed to. Users choose to friend other users (or Pages, Groups, etc.) and then select from the pieces of content these other users (and Pages, advertisements, etc.) share to their networks. That is why I began this post with a brief explanation of the way the Edgerank algorithm works. It filters an average of 1500 possible posts down to an average of 300. Scandvig’s assertion that “[u]sers select from what the algorithm already filtered for them” is therefore only partially true. The Facebook researchers assume that Facebook users have chosen the sources of news-based content that can contribute to their feed. This is a complex set of negotiations around who or what has the ability and then the likelihood of appearing in one’s feed (or what could be described as all the options for organising the conditions of possibility for how content appears in one’s News Feed).

The study is testing the work of the algorithm by comparing the ideological consistency of one’s social networks with the ideological orientation of the stories presented and of the news stories’ respective news-based media enterprises. The study tests the hypothesis that your ideologically-oriented ‘friends’ will share ideological-aligned content. Is the number of stories from across the ideological range — liberal to conservative — presented (based on an analysis of ideological orientation of each news-based media enterprise’s URL) different to the apparent ideological homophily of your social network? If so, then this is the work of the algorithm. The study finds that the algorithm works differently for liberal and conservative oriented users.

Nathan Jurgenson spins this into an interpretation of how algorithms govern our behaviour:

For example, that the newsfeed algorithm suppresses ideologically cross cutting news to a non-trivial degree teaches individuals to not share as much cross cutting news. By making the newsfeed an algorithm, Facebook enters users into a competition to be seen. If you don’t get “likes” and attention with what you share, your content will subsequently be seen even less, and thus you and your voice and presence is lessened. To post without likes means few are seeing your post, so there is little point in posting. We want likes because we want to be seen.

‘Likes’ are only signal we have that helps shape our online behaviour? No. Offline feedback is an obvious one. What about the cross-platform feedback loops? Most of what I talk about on Facebook nowadays consists of content posted by others on other social media networks. We have multiple ‘thermostats’ for aligning the appropriate and inappropriateness of posts in terms of attention, morality, sociality, cultural value, etc.  I agree with Jurgenson, when he suggests that Jay Rosen’s observation that “It simply isn’t true that an algorithmic filter can be designed to remove the designers from the equation.” A valid way of testing this has not been developed yet.

The weird thing about this study is that from a commercial point of view Facebook should want to increase the efficacy of the Edgerank algorithms as much as possible, because it is the principle method for manipulating the value of ‘visibility’ of each user’s News Feed (through frequency/competition and position).  Previous research by Facebook has sought to explore the relative value of social networks as compared to the diversity of content, this included a project that investigated the network value of weak tie social relationships.

Effect of Hard and Soft News vs the Work of Publics

What is my critique? All of the critiques mention that the Facebook research, from a certain perspective, has produced findings that are not really that surprising because they largely confirmed how we already understand how people choose ideological content. A bigger problem for me is the hyper-normative classification of ‘hard’ and ‘soft’ news as it obscures part of what makes this kind of research actually very interesting. For example, from the list of 20 stories provided as an example of hard and soft news, at least two of the ‘soft’ news stories are not ‘soft’ news stories by anyone’s definition. From the appendix (page 15):

  • Protesters are expected to gather in downtown Greenville Sunday afternoon to stage a Die In along Main Street …
  • Help us reach 1,000,000 signatures today, telling LEGO to ditch Shell and their dirty Arctic oil!

I did a Google search for the above text. One is a “die in” held as a protest over the death of Eric Garner. The other is a Greenpeace USA campaign.

There are at least two problems for any study that seeks to classify news-based media content according to normative hard and soft news distributions when working to isolate the how contemporary social media platforms have affected democracy:

1. The work of ‘politics’ (or ‘democracy’) does not only happen because of ‘hard news’. This is an old critique, but one that has been granted new life in studies of online publics. The ‘Die-In’ example is particularly important in this context. It is a story on a Fox News affiliate, and I have only been able to find the exact words provided in the appendix by the study authors to refer to this article on Fox News-based sites. Fox News is understood to be ‘conservative’ in the study (table S3 of appendix), and yet the piece on the ‘Die-In’ protest does not contain any specific examples of conservative framing. It is in fact a straightforward ‘hard news’ piece on the protest that I would actually interpret as journalistically sympathetic towards the protests. How many stories classified as ‘conservative’ because they appear on a Fox News-based URL? How many other allegedly ‘soft news’ stories were not actually soft news at all?

2. Why is ‘cross cutting’ framed only along ideological lines of content and users, when it is clear that allegedly ‘soft news’ outlets can cover ‘political topics’ and that more or less impact ‘democracy’?  In the broadcast and print-era of political communication, end users had far less participatory control over the reproduction of issue-based publics. They used ‘news’ as a social resource to isolate differences with others, to argue, to understand their relative place in the world, etc. Of profound importance in the formation of online publics is the way that this work (call it ‘politics’ or not) takes over the front stage in what have been normatively understood as non-political domains. How many times have you had ‘political’ discussions in non-political forums? Or more important for the current study, how many ‘Gamergate’ articles were dismissed from the sample because the machine-based methods of sampling could not discern that they were about more than video games?  The study does not address how ‘non-political’ news-based media outlets become vectors of political engagement when they are used as a resource by users to rearticulate political positions within issue-based publics.