-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME.Rmd
141 lines (109 loc) · 5.86 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: "botscan"
author: "Kurt Wirth and Ryan Moore"
date: "`r Sys.Date()`"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
A package extending the capability of [botometer](https://github.com/IUNetSci/botometer-python)
by measuring suspected bot activity in any given Twitter query. This README is
derived from Matt Kearney's excellent [rtweet]((https://github.com/mkearney/rtweet))
documentation.
## Install
Install from GitHub with the following code:
```{r install, eval = FALSE}
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
devtools::install_github("kurtawirth/botscan")
```
This package connects <code>botometer</code> to <code>rtweet</code>. On first
load, rtweet will request authentication in your default browser. Accept the
connection, return to RStudio, and botscan will continue automatically.
Users will also need to install the latest version of Python and add it to the
PATH upon installing when prompted, as botscan accesses Python-based
<code>botometer</code>, as well as acquiring a
[RapidAPI key](https://rapidapi.com/OSoMe/api/botometer-pro). Doing so requires
a RapidAPI account. BotOMeter Pro is free and has a rate limit of 2,000
inquiries per day. Alternatively, users can opt for the Ultra plan, which
enables 17,280 inquires per day and costs $50/month. Plans can be chosen
[here](https://rapidapi.com/OSoMe/api/botometer-pro/pricing).
## Usage
There are three functions currently live for botscan.
To begin, the user must first enter the following code:
```{install, eval = FALSE}
install_botometer()
```
If your Python has been added to your machine's PATH as mentioned above, this
function will install BotOMeter via <code>pip</code>.
Next, a user must create a bom object in their environment with the following
code, inserting their keys where appropriate:
```{setup, eval = FALSE}
bom <- setup_botscan("YourRapiAPIKey",
"YourTwitterConsumerAPIKey",
"YourTwitterConsumerAPISecretKey",
"YourTwitterAccessToken",
"YourTwitterAccessTokenSecret")
```
Currently, this must be done at the start of every session.
Next, the fun begins with <code>botscan</code>.
Its first argument takes any Twitter query, complete with boolean operators if
desired, surrounded by quotation marks.
The second argument allows the user to provide external data in the form of any
Twitter object with a column named "screen_name". The user can simply answer
this argument with the name of any Twitter object in their environment and
botscan will skip data collection and use the user's data instead.
The next argument determines how long an open stream of tweets will be
collected, with a default of 30 seconds. In order to gather a specific volume
of tweets, it is suggested that the user run a small initial test to determine
a rough rate of tweets for the given query. If the user prefers to use Twitter's
Search API, the next argument allows the user to specify the number of tweets
to extract.
The fourth argument takes a number, less than one, that represents the
desired threshold at which an account should be considered a bot. The default
is .430, a reliable threshold as described by BotOMeter's creator [here](http://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/).
The fifth argument allows the user to toggle between Twitter's Search and
Streaming APIs. The default is set to using the Streaming API, as it is
unfiltered by Twitter and thus produces more accurate data. Search API data is
filtered to eliminate low quality content, thus negatively impacting
identification of bot accounts.
The sixth argument allows the user to determine the volume of tweets desired
when using the Search API. Note that this argument will be ignored when using
the Streaming API.
The seventh argument determines whether retweets will be included if using the
Search API. Likewise, this argument will be ignored when using
the Streaming API.
The eighth argument allows the user to opt out of auto-parsing of data,
primarily useful when dealing with large volumes of data. The ninth and final
argument defaults to keeping the user informed about the progress of the tool
in gathering and processing data with the <code>verbose</code> package but
can be toggled off.
```{r usage, eval = FALSE}
## load botscan
library(botscan)
## Enter query surrounded by quotation marks
botscan("#rstats")
## Result is a list of three objects, described below
## If desired, choose the stream time and threshold
botscan("#rstats", timeout = 60, threshold = .995)
## Alternatively, choose to use Twitter's Search API and options associated with it.
botscan("#rstats", n_tweets = 1500, retweets = TRUE, search = TRUE, threshold = .995)
##If desired, scan only users rather than the conversation as a whole.
botscan("#rstats", user_level = TRUE)
```
The output from botscan is a list of three objects. The first is a dataframe
including all raw data from Twitter and BotOMeter. The second is a string
with the percentage of users in the data set that are estimated to be bots as
determined by the user's provided threshold. The third is the percentage of
tweets that are estimated to be bot-authored as determined by the user's
provided threshold.
This process takes some time, as botscan is currently built on a loop of
BotOMeter. A standard pull of tweets via botscan processes approximately 11 to
12 accounts per minute in addition to the initial tweet streaming.
Twitter rate limits cap the number of Search results returned to 18,000 every
15 minutes. Thus, excessive use of botscan in a short amount of time may result
in a warning and inability to pull results. In this event, simply wait 15
minutes and try again. In an effort to avoid the Twitter rate limit cap,
botscan defaults to returning 1000 results when search = TRUE.