Twitter Labeled Trump Supporters “Russian-linked,” Mixed Humans with Bots

On or around October 31, 2017, Twitter testified about so-called Russian activities related to the 2016 U.S. presidential election.  This and consequent testimonies were slanted to support the Democratic conspiracy theory of “Russian interference.” It might have been prepared together with congressional Democrats.

Even with this understanding, a closer look at Twitter testimonies reveals surprising findings:

  • Twitter and the DNC interpreted the word Russian as a reference to the native language and/or place of birth of an individual, not employment by the Russian government or even Russian citizenship. Large part (possibly most) of the so-called Russian-linked accounts belong to American citizens with no relation to the Russian government. At the same time, Twitter elected to disregard accounts making election related tweets from indeterminate locations, and likely to represent foreign governments and other entities.
  • Answering questions from senators, Twitter repeatedly used the term automated account as a synonym of a bot, thus dehumanizing Twitter users. The use of automation is allowed and even encouraged by Twitter, which provides a special Application Programming Interface (API) for this purpose. The automation might be as simple as manually scheduling tweets for a later time or auto-responding when the user is away from his or her computer.

Oddly, Twitter also said that it had difficulty distinguishing between automated and human-coordinated Twitter incorrectly identified “nonrandom Tweet timing” as an indicator of a bot. (Technically, it is other way around—computers are capable of random scheduling whereas humans are not—but it might be not relevant here.)

Twitter acknowledged that even in an expansive definition, Russian-linked tweets constituted only a tiny fraction of election-related tweets and would not be worthy to be mentioned in the absence of the Russian hysteria. Nevertheless, the media ignored this notice but grasped at the incorrect analysis to boost its Russian interference conspiracy theory.

Twitter also acknowledged that in the 2016 election season, it had intentionally blocked some tweets harmful to Hillary and the Democratic Party, including some with the hashtags #PodestaEmails and #DNCLeak. Twitter justified this action by citing the Russian interference conspiracy theory.

Look at the Twitter Testimonies

Unless stated otherwise, I refer to Twitter’s testimony Update on Results of Retrospective Review of Russian-Related Election Activity before the United States Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism, from January 2018. It is similar to the October 2017 testimony except for one thing that I will address below.

Twitter representatives alleged that a significantly larger percentage of @realDonaldTrump retweets compared to @HillaryClinton retweets were from “Russian-linked automated accounts”, and the media carried this allegation as truth. In fact, however, this “result” was an artifact of defective methodology, back fitted to produce pre-ordained conclusions.

Twitter considered an account Russian-linked if it met any of several arbitrarily selected criteria. One of them was the language—Twitter labeled an account Russian-linked if “the user’s display name contains Cyrillic characters.” This criterion has likely captured many American citizens who were born in the Soviet Union, had firsthand experience with socialism, and thus preferred Republicans over Democrats.

Ukrainian and some other Slav languages use Cyrillic as well. Thus, Ukrainians who preferred Trump over Hillary because of his more assertive stance toward Russia were declared Russian-linked participants of the Russian conspiracy theory.

Twitter also labeled an account Russian-linked if “the user has logged in from any Russian IP address, even a single time.” This captured business travelers who visited Russia, connected in any Russian airport, or even visited other countries served by Russian ISPs. None of the criteria selected by Twitter to label accounts as Russian-linked even tended to identify those linked to the Russian or any other foreign government. Although “12% of Tweets created during the election originated with accounts that had an indeterminate location,” Twitter’s criteria did not include indeterminate location.

In its congressional testimony in October 2017, Twitter said that an account was considered “Russian-linked” if the user frequently tweeted in Russian. In the January 2018 testimony, Twitter corrected itself by saying this criterion was only considered but not used.  That proves that Twitter was tweaking the methodology of detecting “Russian-linked” accounts in the process of analysis, contrary to the basic rules of statistical research. Such tweaking makes results of the “research” invalid.

Quotes from Twitter Testimonies before Congress

The testimonies of Twitter executives are listed in reverse chronological order and linked to their respective sources.

September 5, 2018

United States House Committee on Energy and Commerce

Testimony of Jack Dorsey Chief Executive Officer Twitter, Inc.

“.. we examined activity on the platform during a 10-week period preceding and immediately following the 2016 election (September 1, 2016 to November 15, 2016). We focused on identifying accounts that were automated, linked to Russia, and Tweeting election-related content, and we compared activity by those accounts to the overall activity on the platform. We reported the results of that analysis in November 2017, and we updated the Committee in January 2018 about the findings from our ongoing review. Additional information on the accounts associated with the Internet Research Agency is included below.

We identified 50,258 automated accounts that were Russian-linked and Tweeting election-related content, representing less than two one-hundredths of a percent (0.016%) of the total accounts on Twitter at the time. Of all election-related Tweets that occurred on Twitter during that period, these malicious accounts constituted approximately one percent (1.00%), totaling 2.12 million Tweets. Additionally, in the aggregate, automated, Russian-linked, election-related Tweets from these malicious accounts generated significantly fewer impressions (i.e., views by others on Twitter) relative to their volume on the platform.

Twitter is committed to ensuring that promoted accounts and paid advertisements are free from hostile foreign influence. In connection with the work we did in the fall, we conducted a comprehensive analysis of accounts that promoted election-related Tweets on the platform throughout 2016 in the form of paid ads. We reviewed nearly 6,500 accounts and our findings showed that approximately one-tenth of one-percent—only nine of the total number of accounts —were Tweeting election-related content and linked to Russia. The two most active accounts out of those nine were affiliated with Russia Today (“RT”), which Twitter subsequently barred from advertising on Twitter.“

January 19, 2018 [not 2019]

United States Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism

Update on Results of Retrospective Review of Russian-Related Election Activity Twitter, Inc.

“Also as detailed below, through our supplemental analysis, we have identified an additional 13,512 accounts, for a total of 50,258 automated accounts that we identified as Russian-linked and Tweeting election-related content. This represents approximately two onehundredths of a percent (0.016%) of the total accounts on Twitter at the time. The 2.12 million election-related Tweets that we identified through our retrospective review as generated by Russian-linked, automated accounts constituted approximately one percent (1.00%) of the overall election-related Tweets on Twitter at the time. Those 2.12 million Tweets received only one-half of a percent (0.49%) of impressions on election-related Tweets, based on impressions generated within the first seven days of posting by users logged into the system. In the aggregate, automated, Russian-linked, election-related Tweets generated significantly fewer impressions relative to their volume on the platform.”

“We took a broad approach for purposes of our review of what constitutes an election related Tweet, relying on annotations derived from a variety of information sources, including Twitter handles, hashtags, and Tweets about significant events. For example, Tweets mentioning @HillaryClinton and @realDonaldTrump received an election-related annotation, as did Tweets that included #primaryday and #feelthebern. In total, our review has now encompassed nearly 212 million Tweets annotated in this way out of the total corpus of nearly 18.2 billion Tweets posted during this time period (excluding Retweets).”

“We took a similarly expansive approach to defining what qualifies as a Russian-linked account. Because there is no single characteristic that reliably determines geographic origin or affiliation, we relied on a number of criteria, such as whether the account was created in Russia, whether the user had a Russian phone carrier or a Russian email address associated with the account, whether the user’s display name contains a significant number of Cyrillic characters, and whether the user has logged in from any Russian IP address, even a single time. We considered an account to be Russian-linked if it had even one of the relevant criteria. As clarification, while we initially considered using frequency of Tweeting in Russian as a potential signal, ultimately this signal was not included in either phase of our analysis. For purposes of both the original and supplemental analysis, we focused on account sign-up language as the language signal, as it represents the language displayed to the user in their interface with Twitter.”

“… we observed that a high concentration of automated engagement and content originated from data centers and users accessing Twitter via Virtual Private Networks (“VPNs”) and proxy servers. In fact, based on our analysis at the time of the hearing, nearly 12% of Tweets created during the election originated with accounts that had a masked/indeterminate location on the day they were posted. Use of such facilities obscures the actual origin of traffic.”

“Thus, our continued analysis has reinforced our earlier determination that the number of accounts we could link to Russia and that were Tweeting election-related content was small in comparison to the total number of accounts on our platform during the relevant time period”

“Our data showed that, during the relevant time period, @HillaryClinton Tweets were Retweeted approximately 8.6 million times. Of those Retweets, 47,846—or 0.55%—were from Russian-linked automated accounts. Tweets from @HillaryClinton received approximately 19.2 million likes during this period; 119,730—or 0.62%—were from Russian linked automated accounts. The volume of engagements with @realDonaldTrump Tweets from Russian-linked automated accounts was higher, but still relatively small. The Tweets from the @realDonaldTrump account during this period were Retweeted more than 11 million times; 469,537—or 4.25%—of those Retweets were from Russian-linked, automated accounts. Those Tweets received approximately 28.8 million likes across our platform; 517,408—or 1.8%—of those likes came from Russian-linked automated accounts.”

October 31, 2017

United States Senate Committee on the Judiciary, Subcommittee on Crime and Terrorism

Testimony of Sean J. Edgett Acting General Counsel, Twitter, Inc.

“For our review of Twitter’s core product, we analyzed election-related activity from the period preceding and including the election (September 1, 2016 to November 15, 2016) in order to identify content that appears to have originated from automated accounts or from human-coordinated activity associated with Russia.”

“We took a similarly expansive approach to defining what qualifies as a Russian-linked account. Because there is no single characteristic that reliably determines geographic origin or affiliation, we relied on a number of criteria, including whether the account was created in Russia, whether the user registered the account with a Russian phone carrier or a Russian email address, whether the user’s display name contains Cyrillic characters, whether the user frequently Tweets in Russian, and whether the user has logged in from any Russian IP address, even a single time. We considered an account to be Russian-linked if it had even one of the relevant criteria.”

“We first reviewed the accounts’ engagement with Tweets from @HillaryClinton and @realDonaldTrump. Our data showed that, during the relevant time period, a total of 1,625 @HillaryClinton Tweets were Retweeted approximately 8.3 million times. Of those Retweets, 32,254—or 0.39%—were from Russian-linked automated accounts. Tweets from @HillaryClinton received approximately 18 million likes during this period; 111,326—or 0.62%—were from Russian-linked automated accounts. The volume of engagements with @realDonaldTrump Tweets from Russian-linked automated accounts was higher, but still relatively small. The 851 Tweets from the @realDonaldTrump account during this period were Retweeted more than 11 million times; 416,632—or 3.66%—of those Retweets were from Russian-linked, automated accounts.”

“We next examined activity surrounding hashtags that have been reported as potentially connected to Russian interference efforts. We noted above that, with respect to two such hashtags—#PodestaEmails and #DNCLeak—our automated systems detected, labeled, and hid a portion of related Tweets at the time they were created.”

“… we observed that a high concentration of automated engagement and content originated from data centers and users accessing Twitter via Virtual Private Networks (“VPNs”) and proxy servers. In fact, nearly 12% of Tweets created during the election originated with accounts that had an indeterminate location. Use of such facilities obscures the actual origin of traffic.”

“Twitter provides access to the API for developers who want to design Twitter-compatible applications and innovate using Twitter data. Some of the most creative uses of our platform originate with applications built on our API, but we know that a large quantity of automated spam on our platform is also generated and disseminated through such applications.