-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathcopypasta.html
148 lines (145 loc) · 7.92 KB
/
copypasta.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Copypasta Test</title>
<link rel="icon" type="image/png" href="docs/images/favicon.PNG">
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH" crossorigin="anonymous">
<link href="./docs/css/styles.css" rel="stylesheet"/>
<script src="./docs/js/testDetailsTemplate.js" defer></script>
</head>
<body>
<div class="container-fluid p-5 page-container">
<!-- Header Section -->
<div class="row mb-4">
<!-- Header Logo -->
<div class="col col-auto px-0">
<a href="index.html">
<img class="header-logo" src="docs/images/mango-text.PNG">
</a>
</div>
<!-- Header Text -->
<div class="col">
<div class="display-5">CIB Mango Tree</div>
<div class="fw-light">A Civic Tech DC Project</div>
</div>
</div>
<div class="row p-5 homepage-hero">
<div class="d-flex align-items-start">
<div class="nav flex-column nav-pills me-3 col-3 card card-body sticky" aria-orientation="vertical">
<a class="d-flex justify-content-center nav-link text-center" type="button" href="index.html">About</a>
<a class="d-flex justify-content-center nav-link text-center" type="button" href="download-and-install.html">Download & Install</a>
<a class="d-flex justify-content-center nav-link text-center" type="button" href="data-preparation.html">Data Preparation</a>
<div class="dropdown">
<a class="nav-link dropdown-toggle text-center active" type="button" href="tests.html">Test Library</a>
<div class="dropdown-menu show">
<a class="dropdown-item disabled" href="copypasta.html">Copypasta</a>
</div>
</div>
</div>
<div class="col-9 mb-5">
<div class="card card-body transparent-card">
<div id="template-placeholder"></div>
<test-details>
<span slot="test-title">
Copypasta
</span>
<span slot="test-summary">
This test scans for identical strings of text that appear across
multiple posts in your dataset. The test reveals possible networks
of bots or human users seeking to copy and paste identical messages,
AKA "copypasta." By copying and pasting identical text, human and
bot networks can influence discourse towards a particular narrative.
</span>
<a slot="code-link" href="https://cib-mango-tree.github.io/CIB-Mango-Tree-Website/" target="_blank" rel="noopener noreferrer">
<b>See Code for Test on GitHub</b>
</a>
<a slot="reference-link" href="https://link.springer.com/article/10.1007/s13278-021-00815-2" target="_blank" rel="noopener noreferrer">
Weber, Derek, and Frank Neumann. “Amplifying influence through
coordinated behaviour in social networks.” Social Network Analysis
and Mining 11, no. 1 (October 2021).
</a>
<div slot="reqs-boxes">
<div class="reqs-boxes">Username</div>
<div class="reqs-boxes">Unique Post Number</div>
<div class="reqs-boxes">Post Content</div>
</div>
<div slot="optional-reqs-boxes">
<div class="reqs-boxes">Timestamp</div>
</div>
<ol slot="test-process" type="1">
<br>
<li>
<b>N-gram tokenizing: </b>
The test will break up the content of each post
in your dataset into sequences of words, or “n-grams.”
Each word separated by a space is an individual token
in the n-gram, and the number of token indicates the
length of an n-gram.
</li>
<li>
<b>Scanning for repeated n-grams: </b>
The test will examine if any n-grams between length
three (e.g. “he eats snails”) and five (e.g. “The
last time I voted”) are repeated in another post.
The test will reveal any n-grams that appear more
than once across different posts.
</li>
<li>
<b>Counting repeated n-grams: </b>
The test will tally the number of times an n-gram is
repeated and will sort n-grams by frequency of repetitions
(highest to lowest) in the data output. This allows you
to see if a particular phrase was copied and pasted many
times across posts.
</li>
</ol>
<div slot="output-summary">
<br>
<span>
<b>Output Format: .csv File</b>
</span>
<ul>
<li>
The test will produce a csv file where repeated n-grams
are presented in rows, with each row corresponding to
an individual post where a repeated n-gram was found.
</li>
<li>
The rows are arranged by n-grams of longest lengths
first. The csv will present all repeated n-grams of
length 5 first (if found), followed by repeated n-grams
of length 4 (if found), then 3 (if found).
</li>
<li>
The content of the n-gram is displayed, and rows are
further sorted by the number of times the particular
n-gram was repeated by all users in the dataset,
starting from the n-grams repeated the most to the
n-grams repeated the least (a minimum of two occurrences).
</li>
<li>
The unique username of the user that posted the particular
n-gram is presented, and the rows are further sorted by the
unique usernames which posted a given n-gram the most times,
to the least.
</li>
<li>
The content of the entire post in which the n-gram appeared
is presented, as well as the timestamp for when the post was made.
</li>
</ul>
<img slot="output-image" src="./docs/images/copypasta-test-csv-output.png" alt="Sample CSV output" style="max-width: 100%; height: auto; display: block; border: 2px solid #309c5c; border-radius: 8px; padding: 5px;">
</div>
</test-details>
</div>
</div>
</div>
</div>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.11.8/dist/umd/popper.min.js" integrity="sha384-I7E8VVD/ismYTF4hNIPjVp/Zjvgyol6VFvRkX/vR+Vc4jQkC+hVqc2pM8ODewa9r" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.min.js" integrity="sha384-0pUGZvbkm6XF6gxjEnlmuGrJXVbNuzT9qBBavbLwCsOGabYfZo0T0to5eqruptLy" crossorigin="anonymous"></script>
</body>
</html>