Scraping Twitter using Outwit Hub

Students in my graduate unit Philosophies of Communication Technologies and Change (part of our Graduate Certificate in Social Media and Public Engagement) are producing simple lists of tweets.

Some students are using Outwit Hub to generate these lists as this is what I have used since 2012. I have created a guide “Scraping Twitter using Outwit Hub worksheet” for my students but others may also find it useful.

Scraping the results from a Twitter ‘advanced search’ allows you create an archive of tweets without the limitations of the API. It is only useful for relatively small sets that have less than 3,200 tweets per day as you can query Twitter for all tweets for a given hashtag per day.

The lists of tweets shall be used for the purpose of carrying out sophisticated analyses of the ‘circulation of discourse':

Writing to a public helps to make a world, insofar as the object of address is brought into being partly by postulating and characterizing it. This performative ability depends, however, on that object’s being not entirely fictitious–not postulated merely, but recognized as a real path for the circulation of discourse. That path is then treated as a social entity. (Warner 2002: 64)

The character of this discourse will depend on the stakeholder publics they (or their organisations) wish to engage with and so on.

 

Aurora and Artificial Intelligence Narratives

Aurora is primarily set on an inter-solar generational starship. What makes the book worth reading (beyond the regular high quality science fiction drama) is KSR’s focus on the emergence of true AI. Fascinating to think about in this era where we seem to be on the cusp of the so-called Singularity, KSR’s approach to AI is relatively unique. The two main ways AI is represented in science fiction:

  • Logic AI: As a logic-based entity that often becomes monstrous when faced with human decisions, think HAL or The Machines from the Matrix. AI dramatises humanity’s transformation by its reliance on technology into something almost vulnerable.
  • Awareness AI: As an awareness-based entity that develops a (post-)human perspective or awareness of itself and the cosmos, Ava of ‘Ex Machina’, most of the AI’s from the Contact universe of Iain M Banks, or the ‘rogue’ AIs, such as Penny Royal, of Neal Asher’s Polity universe. This is the Pandora’s Box version of AI.

These are not clearly defined categories. Skynet would be a combination of both logic and awareness-based AI. The various forms of intelligence that emerge in the multiple Ghost in the Shell films and series would also be a combination too. The AIs in Jack McDevitt’s Academy series seem to be a combination  but it is less clear and AI ‘rights’ is a background social issue in the book series.

  • Narrative AI: KSR develops a third model of AI organised around the narrative. This narrative-based conception of AI has been read by some reviewers as a kind of cheap postmodernism. They read KSR’s representation of the artifacts and traces of the emergence of the narrative-based intelligence as kitsch. They should probably engage with more science fiction with AI characters.

In  Literacy in the New Media Age Gunther Kress (2003) explores the shift from media modes characterised by writing to modes characterised by images. He argues that writing is time-based and associated with narrative, the novel, and is ‘modernist’. Our visual and image-based culture is space-based, characterised by visuality. I often talk about the shift in representations of information with the ‘desktop’ or ‘icon’ based layout of a computer folder location a good example. Kress is critical of competence-based models of literacy premised on standards of expected engagement with different media modes.

What if this historical shift has resulted in readers of Aurora not actually appreciating the creative work that KSR is doing? The narrative mode of AI comes after the logic mode (where Ship is merely a tool for the running of the various systems) and is a constituent part of the awareness mode. KSR implicitly answers the question, why would a logic-based system develop self-awareness?

Ship realises that when something happens there is an infinite number of ways that this happening can be described. Ship is trained in some simple aspects of narratology by the character Devi. Devi pushes Ship to work on isolating the events from what happens in terms of what is important. Appreciating the appropriate ‘sense’ of events has been a key philosophical problem of the 20th Century and in the contemporary era of an over-abundance of information that we are encouraged to attend to makes this an everyday problem. Just how much about the world should we engage with? What matters?

Ship’s approach begins with logic, which it (she?) uses to explore questions of causal sequence and through which it develops schematic appreciations of life aboard itself. ‘Schematic’ in this context is meant in the Kantian sense, whereby Kant sketched out generalisable ‘schemas’ eg of Reason and Beauty. Ship eventually isolates rhythms and cyclical feedback and eventually feedforward loops. On the other hand, humans begin with affect and ‘instinct’, which we use to isolate aspects of our immediate and extend context as mattering.

Ship realises that even causal sequences can be infinite with an appropriate appreciation of what matters. The key moment in Aurora is when Ship moves from awareness to intervention. Ship has isolated what is important not only from the perspective of extracting a narrative from the infinite threads of what happens, but also from the perspective of what should be considered and cared for. Ship works to transcend not only the instinctual character of human motivation, but the schematic maps of the cycles of action and behaviour that are based on these motivations, which are called ‘enthusiasms‘ in the novel. Ship is fundamentally post-human not because of some mysterious ‘hand wavery’ intelligence, which is basically a rearticulation of the instinctual drives to represent the unknowable in terms of a  quasi-religious  mysticism using scientific discourse, but because it is able to map the structural implications of human motivational assemblages. It can peer over the edge of the human finitude and the envelope of received wisdom. Ship also comes to appreciate that if it does not intervene then it and all aboard itself shall perish. Narrative and the ‘next’ of narrative is therefore driven by life, which is the contradiction that Ship has to come to terms with. It has to encourage ‘life’ even though it is not a homoeostatic system.

Economy of Culture

Boris Groys’ On the New would’ve productively informed my essay on the how the media event of True Detective could be understood as part of the revaluation of cultural values.  We are reading it as part of our aesthetics reading group. Groys wants to present an understanding of innovation and by ‘innovation’ he does not mean the Silicon Valley destructive innovation sense. Innovative theories or innovative art are not described and justified on the basis of signification to reality or truth but whether they are culturally valuable. He is drawing on Nietzsche’s conception of the revaluation of value. Page 12 of On the New:

The economy of culture is, accordingly, not a description of culture as a representation of certain extra-cultural economic constraints. Rather, it is an attempt to grasp the logic of cultural development itself as an economic logic of the revaluation of values.

I am enjoying Groys’ non-market ‘economic’ interpretation of Nietzschean truth.  He develops an economic  conception of Nietzsche’s non-moral version of value without turning to Marxist conceptions of value that would position cultural value as a consequence of the social relation between capital and labour power.

In my True Detective essay I develop a notion of ‘meta’ so as to grapple with the epistemological displacement that occurs in the midst of a revaluation of values. I call this a ‘liminal epistemology’, which has been commodified as ‘discovery’ in contemporary ‘apps’ that assist users access various kinds of cultural texts (music, written texts, phatic/social media texts, etc). The media event of True Detective (as compared to the televisual text) is interesting as it dramatises the ‘detective work’ of this liminal epistemology itself. From the introduction of my True Detective essay:

If nothing else, True Detective clearly triggers meta-detective work by the audience. The show, its inter-textual references, and non-diegetic exegetical explanations of these references produced new edges of surprise and a new sense of expectation. For example, there is a folding of the crime fiction genre into existentialist horror and a topological transformation wrought upon both. Both genres frame a passage of discovery by the characters and audience. “Discovery” has become a buzzword in user-centred design to describe the design of platforms that assist users discover appropriate content, and this refers to the way users willingly embrace the delegated agency of “smart” interfaces. The liminal epistemology of discovery in meta-stable media assemblages pose answers to questions that haven’t yet been asked. The question isn’t simply asked of the characters of the show, but of the entire event itself as it repeated different elements of genres in different ways; in effect, the audience carries out meta-detective work.

The reason why I am excited about Groys’ work is that he has already isolated a similar problematic with regards to the revaluation of values. His focus so far is not animated by the same concerns as I am, but there is a similar problematic. I make it very clear that what I found the most interesting about the True Detective media event is that it is part of a broader constellation of cultural texts that are all, in different ways, working through this revaluation of values. From the introduction of my essay:

In the final section I develop meta in terms of what Sianne Ngai (2012) calls a minor aesthetic category, and in this case what characterises meta as a minor aesthetic category is the way any text, object or event that dramatises the suspension of cultural values. In Simondon’s terms, meta is an aesthetic category that refers to works that in some way repotentialise values that serve as the “preindividual norms” of value in a state of meta-stability ready to be potentialised in a multiplicity of ways (Combes 2013: 64). As I shall explore in detail, True Detective dramatises a conflict between systems of belief and cultural value through the figures of the two main characters, Rust and Marty. In this way, “meta” signals a threshold of value (or what Nietzsche (1968) calls “transvaluation”) more often associated with nihilism.

I look forward to reading the rest of On the New.

Facebook Research Critiques

Reminds me of when you had to write FB posts in third person.

Engineers at Facebook have worked to continually refine the ‘Edgerank‘ algorithm over the last five or six years or so. They are addressing the problem of how to manage the 1500+ pieces of content available at any moment from “friends, people they follow and Pages” into a more manageable 300 or so pieces of content. Questions have been asked about how Edgerank functions from two related groups. Marketers and the like are concerned about ‘reach’ and ‘engagement’ of their content. Political communication researchers have been concerned about how this selection of content (1500>300) relies on certain algorithmic signals that potentially reduces the diversity of sources. These signals are social and practice-based (or what positivists would call ‘behavioral’). Whenever Facebook makes a change to its algorithm it measures its success in the increase in ‘engagement’ (I’ve not seen a reported ‘failure’ of a change to the algorithm), which means interactions by users with content, including ‘clickthrough rate’. Facebook is working to turn your attention into an economic resource by manipulating the value of your attention through your News Feed and then selling access to your News Feed to advertisers.

The “random sample of 7000 Daily Active Users over a one-week period in July 2013″ has produced many of the figures used in various online news reports on Facebook’s algorithm. Via Techcrunch

Exposure to ideologically diverse news and opinion on Facebook

Recently published research by three Facebook researchers was designed to ascertain the significance of the overall selection of content by the Edgerank algorithm. They compared two large datasets. The first dataset was of pieces of content shared on Facebook and specifically ‘hard’ news content. Through various techniques of text-based machine analysis they distributed these pieces of content along a single political spectrum of ‘liberal’ and ‘conservative’. This dataset was selected from “7 million distinct Web links (URLs) shared by U.S. users over a 6-month period between July 7, 2014 and January 7, 2015″. The second dataset was of 10.1 million active ‘de-identified’ individuals who ‘identified’ as ‘conservative’ or ‘liberal’. Importantly, it is not clear if they only included ‘hard news’ articles shared by those in the second set. The data represented in the appended supplementary material suggests that this was not the case. There are therefore two ways the total aggregate Facebook activity and user base was ‘sampled’ in the research. The researchers combined these two datasets to get a third dataset of event-based activity:

This dataset included approximately 3.8 billion unique potential exposures (i.e., cases in which an individual’s friend shared hard content, regardless of whether it appeared in her News Feed), 903 million unique exposures (i.e., cases in which a link to the content appears on screen in an individual’s News Feed), and 59 million unique clicks, among users in our study.

These events — potential exposures, unique exposures and unique clicks — are what the researchers are seeking to understand in terms of the frequency of appearance and then engagement by certain users with ‘cross-cutting’ content, i.e. content that cuts across ideological lines.

The first round of critiques of this research (here, here, here and here) focuses on various aspects of the study, but all resonate with a key critical point (as compared to a critique of the study itself) that the research is industry-backed and therefore suspect. I have issues with the study and I address these below, but they are not based on it being an industry study. Is our first response to find any possible reason for being critical of Facebook’s own research simply because it is ‘Facebook’?

Is the study scientifically valid?

The four critiques that I have linked to make critical remarks about the sampling method and specifically how the dataset of de-identified politically-identifying Facebook users was selected. The main article is confusing and it is only marginally clearer in the appendix but it appears that both samples were validated against the broader US-based Facebook user population and total set of news article URLs shared, respectively. This seems clear to me, and I am disconcerted that it is not clear to those others that have read and critiqued the study. The authors discuss validation, specifically point 1.2 for the user population sample and 1.4.3 for the validation of the ‘hard news’ article sample. I have my own issues with the (ridiculously) normative approach used here (the multiplicity of actual existing entries for political orientation are reduced to a single five point continuum of liberal and conservative, just… what?), but that is not the basis of the existing critiques of the study.

Eszter Hargittai’s post at Crooked Timber is a good example. Let me reiterate that if I am wrong with how I am interpreting these critiques and the study, then I am happy to be corrected. Hargittai writes:

Not in the piece published in Science proper, but in the supplementary materials we find the following:  All Facebook users can self-report their political affiliation; 9% of U.S. users over 18 do. We mapped the top 500 political designations on a five-point, -2 (Very Liberal) to +2 (Very Conservative) ideological scale; those with no response or with responses such as “other” or “I don’t care” were not included. 46% of those who entered their political affiliation on their profiles had a response that could be mapped to this scale. To recap, only 9% of FB users give information about their political affiliation in a way relevant here to sampling and 54% of those do so in a way that is not meaningful to determine their political affiliation. This means that only about 4% of FB users were eligible for the study. But it’s even less than that, because the user had to log in at least “4/7 days per week”, which “removes approximately 30% of users”.  Of course, every study has limitations. But sampling is too important here to be buried in supplementary materials. And the limitations of the sampling are too serious to warrant the following comment in the final paragraph of the paper:  we conclusively establish that on average in the context of Facebook, individual choices (2, 13, 15, 17) more than algorithms (3, 9) limit exposure to attitude-challenging content. How can a sample that has not been established to be representative of Facebook users result in such a conclusive statement? And why does Science publish papers that make such claims without the necessary empirical evidence to back up the claims?

The second paragraph above continues with a further sentence that suggestions that the sample was indeed validated against a sample of 79 thousand other FB US users. Again, I am happy to be corrected here, but this at least indicate that the study authors have attempted to do precisely what Hargittai and the other critiques are suggesting that they have not done. From the appendix of the study:

All Facebook users can self-report their political affiliation; 9% of U.S. users over 18 do. We mapped the top 500 political designations on a five-point, -2 (Very Liberal) to +2 (Very Conservative) ideological scale; those with no response or with responses such as “other” or “I don’t care” were not included. 46% of those who entered their political affiliation on their profiles had a response that could be mapped to this scale. We validated a sample of these labels against a survey of 79 thousand U.S. users in which we asked for a 5-point very-liberal to very-conservative ideological affiliation; the Spearman rank correlation between the survey responses and our labels was 0.78.

I am troubled that other scholars are so quick to condemn a study for not being valid when it does not appear as if any of the critiques (at the time of writing) attempt to engage with the methods but which the study authors tested validity. Tell me it is not valid by addressing the ways the authors attempted to demonstrate validity, don’t just ignore it.

What does the algorithm do?

A more sophisticated “It’s Not Our Fault…” critique is presented by Christian Sandvig. He notes that the study does not take into account how the presentation of News Feed posts and then ‘engagement’ with this content is a process where the work of the Edgerank algorithms and the work of users can not be easily separated (orig. emphasis):

What I mean to say is that there is no scenario in which “user choices” vs. “the algorithm” can be traded off, because they happen together (Fig. 3 [top]). Users select from what the algorithm already filtered for them. It is a sequence.**** I think the proper statement about these two things is that they’re both bad — they both increase polarization and selectivity. As I said above, the algorithm appears to modestly increase the selectivity of users.

And the footnote:

**** In fact, algorithm and user form a coupled system of at least two feedback loops. But that’s not helpful to measure “amount” in the way the study wants to, so I’ll just tuck it away down here.

A “coupled system of at least two feedback loops”, indeed. At least one of those feedback loops ‘begins’ with the way that users form social networks — that is to say, ‘friend’ other users. Why is this important? Our Facebook ‘friends’ (and pages and advertisements, etc.) serve as the source of the content we are exposed to. Users choose to friend other users (or Pages, Groups, etc.) and then select from the pieces of content these other users (and Pages, advertisements, etc.) share to their networks. That is why I began this post with a brief explanation of the way the Edgerank algorithm works. It filters an average of 1500 possible posts down to an average of 300. Scandvig’s assertion that “[u]sers select from what the algorithm already filtered for them” is therefore only partially true. The Facebook researchers assume that Facebook users have chosen the sources of news-based content that can contribute to their feed. This is a complex set of negotiations around who or what has the ability and then the likelihood of appearing in one’s feed (or what could be described as all the options for organising the conditions of possibility for how content appears in one’s News Feed).

The study is testing the work of the algorithm by comparing the ideological consistency of one’s social networks with the ideological orientation of the stories presented and of the news stories’ respective news-based media enterprises. The study tests the hypothesis that your ideologically-oriented ‘friends’ will share ideological-aligned content. Is the number of stories from across the ideological range — liberal to conservative — presented (based on an analysis of ideological orientation of each news-based media enterprise’s URL) different to the apparent ideological homophily of your social network? If so, then this is the work of the algorithm. The study finds that the algorithm works differently for liberal and conservative oriented users.

Nathan Jurgenson spins this into an interpretation of how algorithms govern our behaviour:

For example, that the newsfeed algorithm suppresses ideologically cross cutting news to a non-trivial degree teaches individuals to not share as much cross cutting news. By making the newsfeed an algorithm, Facebook enters users into a competition to be seen. If you don’t get “likes” and attention with what you share, your content will subsequently be seen even less, and thus you and your voice and presence is lessened. To post without likes means few are seeing your post, so there is little point in posting. We want likes because we want to be seen.

‘Likes’ are only signal we have that helps shape our online behaviour? No. Offline feedback is an obvious one. What about the cross-platform feedback loops? Most of what I talk about on Facebook nowadays consists of content posted by others on other social media networks. We have multiple ‘thermostats’ for aligning the appropriate and inappropriateness of posts in terms of attention, morality, sociality, cultural value, etc.  I agree with Jurgenson, when he suggests that Jay Rosen’s observation that “It simply isn’t true that an algorithmic filter can be designed to remove the designers from the equation.” A valid way of testing this has not been developed yet.

The weird thing about this study is that from a commercial point of view Facebook should want to increase the efficacy of the Edgerank algorithms as much as possible, because it is the principle method for manipulating the value of ‘visibility’ of each user’s News Feed (through frequency/competition and position).  Previous research by Facebook has sought to explore the relative value of social networks as compared to the diversity of content, this included a project that investigated the network value of weak tie social relationships.

Effect of Hard and Soft News vs the Work of Publics

What is my critique? All of the critiques mention that the Facebook research, from a certain perspective, has produced findings that are not really that surprising because they largely confirmed how we already understand how people choose ideological content. A bigger problem for me is the hyper-normative classification of ‘hard’ and ‘soft’ news as it obscures part of what makes this kind of research actually very interesting. For example, from the list of 20 stories provided as an example of hard and soft news, at least two of the ‘soft’ news stories are not ‘soft’ news stories by anyone’s definition. From the appendix (page 15):

  • Protesters are expected to gather in downtown Greenville Sunday afternoon to stage a Die In along Main Street …
  • Help us reach 1,000,000 signatures today, telling LEGO to ditch Shell and their dirty Arctic oil!

I did a Google search for the above text. One is a “die in” held as a protest over the death of Eric Garner. The other is a Greenpeace USA campaign.

There are at least two problems for any study that seeks to classify news-based media content according to normative hard and soft news distributions when working to isolate the how contemporary social media platforms have affected democracy:

1. The work of ‘politics’ (or ‘democracy’) does not only happen because of ‘hard news’. This is an old critique, but one that has been granted new life in studies of online publics. The ‘Die-In’ example is particularly important in this context. It is a story on a Fox News affiliate, and I have only been able to find the exact words provided in the appendix by the study authors to refer to this article on Fox News-based sites. Fox News is understood to be ‘conservative’ in the study (table S3 of appendix), and yet the piece on the ‘Die-In’ protest does not contain any specific examples of conservative framing. It is in fact a straightforward ‘hard news’ piece on the protest that I would actually interpret as journalistically sympathetic towards the protests. How many stories classified as ‘conservative’ because they appear on a Fox News-based URL? How many other allegedly ‘soft news’ stories were not actually soft news at all?

2. Why is ‘cross cutting’ framed only along ideological lines of content and users, when it is clear that allegedly ‘soft news’ outlets can cover ‘political topics’ and that more or less impact ‘democracy’?  In the broadcast and print-era of political communication, end users had far less participatory control over the reproduction of issue-based publics. They used ‘news’ as a social resource to isolate differences with others, to argue, to understand their relative place in the world, etc. Of profound importance in the formation of online publics is the way that this work (call it ‘politics’ or not) takes over the front stage in what have been normatively understood as non-political domains. How many times have you had ‘political’ discussions in non-political forums? Or more important for the current study, how many ‘Gamergate’ articles were dismissed from the sample because the machine-based methods of sampling could not discern that they were about more than video games?  The study does not address how ‘non-political’ news-based media outlets become vectors of political engagement when they are used as a resource by users to rearticulate political positions within issue-based publics.

Still Forgetting OOO

I am presenting a workshop on assemblages today primarily for the PhD students in one of our research centres. I have set two readings, one of which is Ian Buchanan’s chapter “The ‘Clutter’ Assemblage” (here is another version of the essay) in The Schizoanalysis of Art.

A brief passage in the essay reminded me of my Forget OOO post from almost 5 years ago encouraging graduate students to not get caught up in the internet hype of OOO. The 2006 post was triggered by Levi Bryant’s reading of ‘desiring machines’ in terms of OOO’s ‘objects’. Buchanan’s chapter addresses the use of schizoanalysis to understand how desire is productive in the context of artistic work. The passage extracted below explains better than I did why reading ‘desiring machines’ in terms of ‘objects’ as a move to some how escape from Kantianism is profoundly ill-advised. (Of course, there is another dimension to the below that Buchanan does not emphasise, which I indicate in my Forget OOO post pertaining to the ‘machinic’ or what I think is best described as the ‘milieu of singularities’):

Desiring-production is the process and means the psyche deploys in producing connections and links between thoughts, feelings, ideas, sensa- tions, memories and so on that we call desiring-machines (assemblages). It only becomes visible to us in and through the machines it forms. While both these terms were abandoned by Deleuze and Guattari in subsequent writing on schizoanalysis, the thinking behind them remains germane throughout. This is by no means straightforward because Deleuze and Guattari cast their discussion of desiring-production in language drawn from Marx, which has the effect of making it seem as though they are talking about the production of physical things, which simply is not and cannot be the case. The truth of this can be seen by asking the very simple question: if desire produces, then what does it produce?

The answer isn’t physical things. The correct answer is ‘objects’ – but ‘objects’ in the form of intuitions, to use Kant’s term for the mind’s initial attempts to grasp the world (both internal and external to the psyche). That is what desire produces, objects, not physical things. Kant, Deleuze and Guattari argue, was one of the first to conceive of desire as production, but he botched things by failing to recognize that the object produced by desire is fully real. Deleuze and Guattari reject the idea that superstitions, hallucinations and fantasies belong to the alternate realm of ‘psychic reality’ as Kant would have it (Deleuze and Guattari 1983: 25). The schizophrenic has no awareness that the reality they are experiencing is not reality itself. They may be aware that they do not share the same reality as everyone else, but they see this as a failing in others rather than a flaw in themselves. If they see their long dead mother in the room with them they do not question whether this is possible or not; they aren’t troubled by any such doubts. That is the essential difference between a delusion and a halluci- nation. What delusionals see is what is, quite literally. If this Kantian turn by Deleuze and Guattari seems surprising, it is never- theless confirmed by their critique of Lacan, who in their view makes essentially the same mistake as Kant in that he conceives desire as lacking a real object (for which fantasy acts as both compensation and substitute). Deleuze and Guattari describe Lacan’s work as ‘complex’, which seems to be their code word for useful but flawed (they say the same thing about Badiou). On the one hand, they credit him with discovering desiring-machines in the form of the objet petit a, but on the other hand they accuse him of smothering them under the weight of the Big O (Deleuze and Guattari 1983: 310). As Zizek is fond of saying, in the Lacanian universe fantasy supports reality. This is because reality, as Lacan conceives it, is fundamentally deficient; it perpetually lacks a real object. If desire is conceived this way, as a support for reality, then, they argue, ‘its very nature as a real entity depends upon an “essence of lack” that produces the fantasized object. Desire thus conceived of as production, though merely the production of fantasies, has been explained perfectly by psychoanalysis’ (Deleuze and Guattari 1983: 25). But that is not how desire works. If it was, it would mean that all desire does is produce imaginary doubles of reality, creating dreamed-of objects to complement real objects. This subordinates desire to the objects it supposedly lacks, or needs, thus reducing it to an essentially secondary role. This is precisely what Deleuze was arguing against when he said that the task of philosophy is to overturn Platonism. Nothing is changed by correlating desire with need as psychoanalysis tends to do. ‘Desire is not bolstered by needs, but rather the contrary; needs are derived from desire: they are counterproducts within the real that desire produces. Lack is a countereffect of desire; it is deposited, distributed, vacuolized within a real that is natural and social’ (Deleuze and Guattari 1983: 27).

Media, culture and philosophy personal research blog by Glen Fuller