-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathCS_RoseGina.qmd
909 lines (602 loc) · 61.2 KB
/
CS_RoseGina.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
---
author: "Rose Hörsting & Gina Reinhard"
date: "2024-11-12"
#date-modified: "2024-11-12"
toc-title: "Case study: The semantics of emojis"
include-after-body: abbrv_toc.html
language:
#title-block-modified: "Last updated"
title-block-author-single: "Authors"
---
# The semantics of emojis: Explo`R`ing the results of an experimental study
```{r setup, include=FALSE}
# Depending on which graphic backend RStudio uses (see Global Options > Graphics), we need to render the plots either with AGG or Cairo to ensure that the emojis are correctly rendered in the HTML export.
#install.packages("ragg")
#knitr::opts_chunk$set(dev = "ragg_png", warning = FALSE, message = FALSE)
knitr::opts_chunk$set(dev.args = list(png = list(type = "cairo")))
```
```{r checkdown, include=FALSE}
library(checkdown)
```
::: {.callout-note collapse="true"}
#### **About the authors of this chapter** {.unnumbered}
**Rose Hörsting** is a second-year master’s student in linguistics at the University of Cologne. She completed her bachelor’s degree in linguistics at the Heinrich Heine University Düsseldorf, where she specialised in psycho- and neurolinguistics. Rose is particularly drawn to understanding language processing in both human and machine contexts. She was first introduced to `R` during her bachelor’s thesis, finding it intimidating at first, but has since developed an enthusiasm for statistics and programming. Now, Rose is enjoying the process of mastering `R` as she deepens her skills in data analysis.
**Gina Reinhard** is also a second-year student in the master's programme in linguistics at the University of Cologne, specialising in computational linguistics. Like Rose, she completed her bachelor's degree at Heinrich Heine University Düsseldorf, with a focus on foreign languages and linguistic diversity. Her background includes studying psychology at Osnabrück University and working at an AI company, which led to her interest in all things cognitive science as a combination of linguistics, psychology, and AI. She is currently working as a research assistant in the field of variational linguistics, studying dialects and regiolects while developing her skills in computational methods for linguistic analysis.
The authors made equal contributions to the chapter and are listed here in alphabetical order. Rose and Gina submitted an earlier version of this chapter as a term paper for Elen Le Foll's M.A. seminar "More than counting words: Introduction to statistics and data visualisation in R" (University of Cologne, summer 2024). Elen supervised the project, provided feedback, and contributed to the present revised version of this chapter.
The authors thank Tatjana Scheffler, co-author of the original study reproduced in this chapter, for her valuable feedback. They have further revised the chapter based on her comments.
:::
### Chapter overview {.unnumbered}
This case-study chapter will guide you through the steps to reproduce selected results from a published experimental linguistics study [@fricke2024semantic] using `R`.
The chapter will walk you through how to:
- Explore the data of a published linguistics study
- Preprocess the raw data for analysis (including how to translate, re-order, and re-categorise the levels of categorical variables)
- Analyse and interpret the frequency counts of categorical variables
- Visualise these frequencies as barplots
- Insert and display emojis in `R` and in `ggplot` graphs
- Combine multiple plots into one figure using {patchwork}
- Interpret multi-panel plots
We will work with the original raw data from:
> Fricke, L., Grosz, P. G., & Scheffler, T. (2024). Semantic differences in visually similar face emojis. Language and Cognition, 1–15. <https://doi.org/10.1017/langcog.2024.12>
## Introducing the study 🙂 {#sec-introducing-the-study}
Face emojis are frequently used in text messages. They represent facial expressions and often make fundamental contributions to the subtext of a text message. A few studies have investigated the relationship between emojis and the emotions that they depict [@fugate2021implications; @maier2023emojis; @pfeifer2022all]. However, as emojis are a relatively recent phenomenon, there is still a lot to be discovered. In this chapter, we will look into a study by @fricke2024semantic.
### Deconstructing emojis into Action Units {#sec-deconstructing}
@fricke2024semantic compared "visually similar face emojis" using an emoji annotation system developed by @fugate2021implications. This system is based on the Facial Action Coding System (FACS) for human faces [@ekman1978facial], which is an inventory of facial muscle movements that humans can make (such as raising the inner eyebrows or pulling down the corners of the lips). @fugate2021implications adapted FACS for emojis. The facial features of emojis, like *eyebrows arched* and *eyes wide*, are called Action Units (AUs). For convenience, AUs are assigned numbers, allowing them to be easily referenced. As you can see in @fig-emojipairs, each emoji consists of several AUs.
![Emoji pairs and their AU codes [from @fricke2024semantic: 5, CC-BY]](images/CS_RoseGina_emoji_pairs.png){#fig-emojipairs fig-align="center" width="700"}
@fricke2024semantic defined two different types of emoji pairs: In the **AU+ condition**, the pairs of emojis are similar, but are assigned a different set of AUs. The emoji pairs in the **AU- condition** are also similar, but their AUs are identical. @fricke2024semantic deliberately selected emoji pairs that were as visually similar as possible to each other, while ensuring that the two emojis either differed by exactly one Action Unit (AU+) or had no differences in Action Units (AU-).
AUs capture the facial expressions of emojis and, as such, can assist linguists in accurately describing them. However, only expressions that can be consciously changed by humans receive labels. For example, the AU difference between 😃 and 😆 captures the fact that the former emoji has open eyes, while the latter has closed eyes. Since humans can choose whether to open or close their eyes, this is an **AU+ pair**. If the subtle difference between emojis is not manipulable by humans, as in 😄 and 😁, the emojis are described by identical AUs (**AU-**).
### The experiment {#sec-design}
::: {.callout-note title="How did the experiment work?"}
Three AU+ and three AU- emoji pairs were created (see @fig-emojipairs). Each pair was assigned two contexts, with each context corresponding to the prominent usage of one emoji, but not the other. For example, the contexts of the first pair are *happiness* and *(cheeky) laughter*. The contexts were assigned based on <https://emojipedia.org> and a previous norming study [@scheffler2024affective].
Four single-sentence narratives were created for each of the contexts (see @fig-testitems, translated from German below [translation @fricke2024semantic: 6]).
> 1. Alex writes to his best friend Stefan:
>
> *I just learned that my cousin's dog has his own advent calendar.*
>
> Alex is amused. Which of the emojis matches the message better? 😄😁
>
> <br>
>
> 2. Alex writes to his best friend Stefan:
>
> *I just learned that I won 500 Euro in the lottery.*
>
> Alex is overjoyed. Which of the emojis matches the message better? 😄😁
![Example of a test item in the experiment [from @fricke2024semantic: 6, CC-BY]](images/CS_RoseGina_Fricke_test_items.png){#fig-testitems fig-align="center" width="500"}
These short narratives were divided up into into four experimental lists of 12 items. Each list also contained 12 filler items, so that each participant saw 24 items. The participants were then asked to help choose the emoji that best matched the context. Each participant saw each emoji pair twice. It was measured how often participants chose the context-matching emoji versus the non-matching emoji.
:::
@fricke2024semantic's central research question was: **Do AU differences lead to differences in meaning between the two emojis of a pair?** In line with the pictorial approach by @maier2023emojis the authors hypothesized that visual differences between emojis which correspond to human facial features (AU+) would be more semantically relevant than those that do not (AU-). However, they noted that if no evidence were found to support this hypothesis, it would align with @grosz2023semantics's lexicalist approach. This approach suggests that visual differences between emojis and their correspondence to human facial features are less significant, placing emphasis instead on the intrinsic meaning of the emoji and its constituent parts.
::: callout-tip
#### Quiz time! {.unnumbered}
Read the abstract of the study:
> Fricke, L., Grosz, P. G., & Scheffler, T. (2024). Semantic differences in visually similar face emojis. Language and Cognition, 1–15. <https://doi.org/10.1017/langcog.2024.12>
[**Q1.**]{style="color:green;"} According to the abstract, what were the results of @fricke2024semantic's experiment?
```{r echo=FALSE, results="asis"}
check_question(c("For both types of pairs, the context-matching emoji was preferred over the non-matching one.", "There were no significant differences between the two conditions."), options = c("Participants chose the context-matching emoji more often in the AU+ condition than in the AU- condition.", "There were no significant differences between the two conditions.", "Participants chose the context-matching emoji more often in the AU- condition than in the AU+ condition.", "For both types of pairs, the context-matching emoji was preferred over the non-matching one."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right!",
wrong = "Not quite. Read the abstract again.")
```
[**Q2.**]{style="color:green;"} The actual results of the experiment were different from what @fricke2024semantic had expected. According to authors' research hypothesis (visual differences between emojis which correspond to human facial features are more semantically relevant than those that do not), which of these experimental results were expected?
```{r echo=FALSE, results="asis"}
check_question(c("Participants will choose the context-matching emoji more often in the AU+ condition.", "For the AU- pairs, the pattern will be more random."), options = c("Participants will choose the context-matching emoji more often in the AU+ condition.", "For the AU- pairs, the pattern will be more random.", "Participants will choose the context-matching emoji more often in the AU- condition.", "For the AU+ pairs, the pattern will be more random.", "For both types of pairs, the context-matching emoji will be preferred over the non-matching one."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! The hypothesis was that visual differences between emojis would be semantically more relevant if they corresponded to differences in human facial features (AU+). This would lead to participants choosing a context-matching emoji more often in the AU+ condition and making more random choices in the AU- condition.",
wrong = "Not quite. Try again.")
check_hint("Based on the authors' hypothesis, differences corresponding to facial features (AU+) would be more semantically relevant. In contrast, differences that do not correspond to facial features (AU-) would be less semantically relevant, making them less noticeable and leading to more inconsistent choices. How might this affect the frequency with which participants choose the context-matching emoji in each condition?", hint_title = "🐭 Click on the mouse for a hint.")
```
<br>
:::
## Exploring the relationship between gender and emoji understanding {#sec-gender-understanding toc-text="Exploring gender and emoji understanding"}
@fricke2024semantic asked participants about their gender, their attitude towards emojis, how often they use emojis on WhatsApp and how well they think they understand emojis. The authors visualised the distribution of men and women for emoji use and emoji attitude as barplots.
::: {#fig-FrickeGenderBarplots layout-ncol="2"}
{#fig-emojiuse fig-align="center" width="550"}
{#fig-emojiattitude fig-align="center" width="550"}
Barplots from Fricke, Grosz & Scheffler [-@fricke2024semantic: 9-10, CC-BY]
:::
The plots in @fig-FrickeGenderBarplots show that women use emojis more often and have a more positive attitude towards emojis than men. We want to find out whether women also reported a higher level of emoji understanding than men. Our analysis will involve three steps:
1. Calculating the frequencies of the genders in the data\
2. Calculating the frequencies of the different levels of emoji understanding for each gender\
3. Visualising the frequencies in a barplot similar to the plots above.
### Impo`R`ting the data {#sec-importing-the-data}
@fricke2024semantic have made their data and analysis code publicly available on the OSF repository (see @sec-OpenScience). You can access these materials at <https://osf.io/k2t9p/>. There, the data is stored in the file `raw_data.csv`. To follow the steps of this chapter, you will need to download this file.
::: callout-warning
### Session set-up {#sec-session-set-up}
To run the code of this chapter, you will need the following packages. Make sure that they are installed and loaded before starting.
```{r load-libraries, message=FALSE}
library(here)
library(tidyverse)
#install.packages("patchwork")
library(patchwork)
#install.packages("ragg")
library(ragg)
```
:::
We import the authors' raw data using the `read.csv()` and `here()` functions. You will need to adjust the file path to match the folder structure of your computer (see @sec-ImportingDataCSV).
```{r import-data, message=FALSE}
raw_data <- read.csv(file = here("data", "raw_data.csv"))
```
As specified by Fricke, Grosz & Scheffler [-@fricke2024semantic: 8], we filter out participants who exceed the maximum age of 35 years for all following analyses. We do this by using the `filter()` function and store the result in a new data frame called `df`.
```{r filter-age}
df <- raw_data |>
filter(age <= 35)
```
### Gender frequency analysis {#sec-gender-freq}
Let's first get a general overview: How many men, women, and non-binary people participated in the study?
The relevant variable in the data set is called `gender`. However, you will see that the names of the gender groups are in German. To figure out what the labels of the different gender groups are, we use the `count()` function:
```{r table-gender}
df |>
count(gender)
```
Before we start analysing, we should translate the labels (levels) of the categories into English. Using a combination of `mutate()` and `recode()`, we translate *männlich* to *men*, *weiblich* to *women*, and *divers* to *non-binary.*[^cs_rosegina-1]
[^cs_rosegina-1]: We decided to translate '*divers*' as 'non-binary', as this is the English term that @fricke2024semantic used in their paper.
```{r recode-gender}
df <- df |>
mutate(gender = recode(gender,
"männlich" = "men",
"weiblich" = "women",
"divers" = "non-binary"))
df |>
count(gender)
```
Now that `gender` variable have English labels, we want to determine how many male, female, and non-binary subjects participated. We have used the `table()` function, which determines the number of occurrences of the different genders in the data. But in this case counting the occurrences is not straightforward. The data frame contains 24 rows for each subject, as each participant saw 24 items (see @sec-design). So, if we were to simply count the occurrences of *men*, *women*, and *non-binary* in the data with `count()`, we would end up with 24 times the values of the frequencies.
To determine the actual gender distribution, we need to count the occurrences according to the subjects' unique IDs. To do this, we apply the `distinct()` function to keep only unique occurrences (to be precise, the first unique occurrence) of each `submission_id`. The argument `.keep_all` is set to `TRUE`, which means that all other variables in the data frame are kept and not deleted.
```{r gender-distinct}
df |>
distinct(submission_id, .keep_all = TRUE) |>
count(gender)
```
The **mode** (see @sec-Mode) of the `gender` variable in the dataset is *men*, as you can see from the output. The gender distribution is very uneven: 109 men, 47 women, and 3 non-binary people participated in the study. If we are not careful, this imbalance can lead to misleading data visualisations.
::: callout-tip
#### Quiz time! {.unnumbered}
[**Q3.**]{style="color:green;"} Which of these problems are likely to occur if we plot emoji understanding by gender in a barplot with unequal gender groups?
```{r echo=FALSE, results="asis"}
check_question(c("The differences in emoji understanding between gender groups may look bigger or smaller than they actually are."),
options = c("The barplot is likely to be too wide for portrait publishing formats.", "Readers may misinterpret the barplot as a histogram.", "The y-axis of the plot may become distorted and therefore inaccurate.", "The differences in emoji understanding between gender groups may look bigger or smaller than they actually are.", "We will not be able to use ggplot to visualise the data."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Unequal group sizes can distort the differences by showing larger totals for larger groups and smaller totals for smaller groups. <br><br>Can you think of ways to address this issue?",
wrong = "Not quite. Try again.")
check_hint("Only one of these is likely to be a real problem.", hint_title = "🐭 Click on the mouse for a hint.")
```
<br>
:::
To solve this problem, we will use the same strategy as @fricke2024semantic. We will use relative rather than absolute frequencies to make sure that the numbers for the different genders are comparable. This means that we will calculate the percentages of emoji understanding within each gender group, treating the total number of male, female, and non-binary participants separately as 100%, rather than counting all subjects together as 100%. In this way, we can see, for example, what percentage of men, women, and non-binary participants reported a very good emoji understanding and compare the numbers across groups.
### How well do the different genders understand emojis? {#sec-gender-understanding-freq}
Next, we calculate the relative frequencies of the different levels of emoji understanding for each gender.
The variable we are interested in is called `emoji_understanding`. Just like with `gender`, we first have to do some data wrangling. We use the `count()` function to get the labels:
```{r table-understanding}
df |>
count(emoji_understanding)
```
We translate *mittelmäßig* to *moderate*, *eher gut* to *rather good*, *gut* to *good*, and *sehr gut* to *very good*:
```{r recode-understanding}
df <- df |>
mutate(emoji_understanding = recode(emoji_understanding,
"mittelmäßig" = "moderate",
"eher gut" = "rather good",
"gut" = "good",
"sehr gut" = "very good"))
df |>
count(emoji_understanding)
```
The levels are still in the wrong order. We need to rearrange them in an ascending order from *moderate* to *very good*. To do this, we define a vector `c("moderate", "rather good", "good", "very good")`. Using the `factor()` function, we encode this vector as a factor:
```{r reorder-understanding}
df <- df |>
mutate(emoji_understanding = factor(emoji_understanding,
levels = c("moderate",
"rather good",
"good",
"very good")))
df |>
count(emoji_understanding)
```
The levels now look good, so we can determine the frequencies for the different gender groups within `emoji_understanding`. We could do this by simply cross-tabulating gender with emoji understanding (see @sec-Mode). But since we know that the sizes of the gender subsets are very unequal, we also want to calculate the relative frequencies to make the numbers comparable. There is an easy way to calculate relative frequencies using the `proportions()` function (see @sec-DistCat). However, we need to make two additional considerations:
1. Our aim is to calculate proportions within groups and not across the whole data.
2. We want to create a comprehensive visualisation that includes both groups of men and women in a single barplot.
To achieve both, we have to first group our data, using the powerful combination of `group_by()` and `count()`. We create a new data frame `gender_understanding_count` and again keep only each participant's unique `submission_id` as above. We group the data by gender and count the frequencies for the different genders within the `emoji_understanding` factor:
```{r gender-understanding-freq}
df |>
distinct(submission_id, .keep_all = TRUE) |>
group_by(gender) |>
count(gender, emoji_understanding)
```
In this table, `n` was calculated by the `count()` function and represents the number of occurrences for each combination of `gender` and `emoji_understanding`. Next, we use `mutate()` to add a column with the relative frequencies, which we calculate with the formula `proportions(n) * 100` to obtain percentages.
```{r gender-understanding-percent}
#| source-line-numbers: "5"
gender_understanding_count <- df |>
distinct(submission_id, .keep_all = TRUE) |>
group_by(gender) |>
count(gender, emoji_understanding) |>
mutate(percentage = proportions(n) * 100)
gender_understanding_count
```
This tabular presentation of the data already shows us that non-binary participants reported either a *rather good* or *good* understanding of emojis. A higher percentage of women (44.7%) reported a *very good* emoji understanding compared to men (38.5%). But let's create our barplot to see the distribution more clearly.
### Data visualisation 📊 {#sec-gender-understanding-vis}
As mentioned above, we will visualise the relative rather than the absolute frequencies to make sure that the numbers for the different genders are comparable. In line with Fricke, Grosz & Scheffler [-@fricke2024semantic: 9], we also exclude the three non-binary participants. To this end, we use the `filter()` function combined with the `!=` operator (see @sec-RelationalOperators).
```{r filter-gender}
gender_understanding_count <- gender_understanding_count |>
filter(gender != "non-binary")
```
We use `ggplot()` to create a barplot with the `emoji_understanding` categories on the *x*-axis and the relative frequencies that we calculated on the *y*-axis. The bars are coloured according to `gender`. We also add a title and axis labels. Finally, we remove the white space between the bottom of the bars with an additional `scale_y_continuous(expand = c(0,0))` layer and change the colours to make our plot look nicer. The hexadecimal color values chosen here are from the colour-blind friendly palette "Set2" from the package [{RColorBrewer}](https://cran.r-project.org/web/packages/RColorBrewer/index.html)[@neuwirth2022package]. Since we only need two colours, we chose to insert them manually to avoid having to install an additional package.
```{r gender-understanding-plot}
ggplot(gender_understanding_count,
aes(x = emoji_understanding,
y = percentage,
fill = gender)) +
geom_bar(stat = "identity", position = "dodge") +
scale_y_continuous(expand = c(0,0)) +
labs(title = "Self-reported Emoji Understanding by Gender",
x = "Emoji understanding",
y = "Percent") +
scale_fill_manual(values = c("#8DA0CB", "#FC8D62")) +
theme_classic()
```
As you can see from the barplot, the gender distribution for emoji understanding is much more even than for emoji use and emoji attitude (see @fig-FrickeGenderBarplots).
::: callout-tip
#### Quiz time! {.unnumbered}
[**Q4.**]{style="color:green;"} How do you interpret this plot?
```{r echo=FALSE, results="asis"}
check_question(c("Proportionally more women than men reported a very good emoji understanding.", "Proportionally more women than men reported a moderate emoji understanding."), options = c("Proportionally more women than men reported a very good emoji understanding.", "Proportionally more women than men reported a moderate emoji understanding.", "Women reported a lower level of emoji understanding than men.", "Around half of all participants reported a rather good understanding of emojis."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! While proportionally more women than men reported a very good emoji understanding, a relatively large number of men also stated that they understood emojis very well. There are also proportionally more women than men in the moderate and rather good category. Around 23% of men and 25% of women reported a rather good understanding, but this does not correspond to 50% of participants as there were more men than women in this study.",
wrong = "Not quite. Try again.")
check_hint("Two of the above options are correct interpretations of the plot.", hint_title = "🐭 Click on the mouse for a hint.")
```
<br>
:::
When comparing our barplot to @fig-FrickeGenderBarplots, it is interesting to note that, whilst women reported more frequent use of emojis and a more positive attitude towards emojis, they did not report a higher understanding of emojis. It is possible that some women were more modest in rating their understanding of emojis than men, which could indicate a gender confidence gap. Reporting a good understanding likely requires more confidence compared to emoji use or attitude towards emoji.
## Comparing matching rates between AU conditions {#sec-au-plot}
We will now turn to exploring the central research question of @fricke2024semantic: **Do AU differences lead to differences in meaning between the two emojis of a pair?** As explained in @sec-deconstructing, AUs are numbers which correspond to human-like facial features. In emoji pairs of the AU+ condition, the visual difference between the emojis is reflected in a number difference, e.g. *grinning face with big eyes* 😃 (AU: **5** + 12 + 25 + 26) and *grinning squinting face* 😆 (AU: 12 + 25 + 26 + **43**). In the AU- condition, the visual difference does not correspond to an AU difference, e.g. *grinning face with smiling eyes* 😄 and *beaming face with smiling eyes* 😁 both have the same AUs (12 + 25 + 26 + 63).
Step by step, we will build an informative plot which will include all the information needed to answer this question. This plot will display how many times each emoji was chosen in its presumed corresponding context.
To achieve this, we need to create a variable that tells us when each participant responded with the matching emoji. In the following, we will create this variable based on the raw data from @fricke2024semantic.
### Preprocessing the data {#sec-au-data-preparation}
First, we need a variable that includes the experimental conditions of each trial. The variable `name` tells us whether a trial consisted of an emoji pair with an AU difference (AU+) or not (AU-), or a filler. Trials with AU+ differences include "AU" in the trial `name`, those with no AU difference begin with "N", and the fillers with "filler".
```{r eval=FALSE}
df |>
distinct(name)
```
```{r echo=FALSE}
df |>
distinct(name) |>
slice(1:10)
```
We use a combination of `mutate()`, `case_when()` and `str_detect()` to construct a new variable (`AU_difference`) that captures the type of trial that we are dealing with. The command essentially says: look for the string "AU" in the column `name`, and in all cases where you find it (`case_when()`), insert the value "AU+" in a new column called `AU_difference`. We follow this procedure for the other trial conditions, too. If neither "AU", "N" or "filler" is detected, nothing (`NULL`) is inserted in `AU_difference`.
```{r AU-variable-complete}
df <- df |>
mutate(AU_difference = case_when(str_detect(name, "AU") ~ "AU+",
str_detect(name, "N") ~ "AU-",
str_detect(name, "filler") ~ "filler",
.default = NULL))
```
We use `select()` to compare the two columns and check that everything worked.
```{r table-AU}
df |>
slice(1:10) |>
select(name, AU_difference)
```
This looks promising. Since we are only interested in the experimental items, we now filter out all filler trials.
```{r filter-filler}
df <- df |>
filter(AU_difference != "filler")
```
We will now create another variable called `context`. The column of this variable will contain the context descriptions used by Fricke, Grosz & Scheffler [-@fricke2024semantic: 5] in @fig-emojipairs. Again, we combine `mutate()`, `case_when()` and `str_detect()`: In the `question` column, we look for context-characteristic strings, and add the context descriptions whenever we have a match. Again, we check the output with `table()`.
```{r context-variable}
df <- df |>
mutate(context = case_when(str_detect(question, "freut sich") ~ "happiness",
str_detect(question, "lacht") ~ "(cheeky) laughter",
str_detect(question, "macht sich Sorgen") ~ "concern",
str_detect(question, "ist überrascht") ~ "surprise",
str_detect(question, "ist etwas genervt") ~ "mild irritation",
str_detect(question, "ärgert sich") ~ "annoyance",
str_detect(question, "amüsiert sich") ~ "amusement",
str_detect(question, "ist überglücklich") ~ "(intense) happiness",
str_detect(question, "ist enttäuscht") ~ "mild disappointment",
str_detect(question, "ist enttäuscht") ~ "moderate disappointment",
str_detect(question, "ist gut gelaunt") ~ "happiness2",
str_detect(question, "ist verlegen") ~ "bashfulness",
.default = NULL))
table(df$context)
```
::: callout-tip
#### Quiz time! {.unnumbered}
[**Q5.**]{style="color:green;"} Which problems become apparent when checking the content of our new `context` variable using the `table()` function?
```{r echo=FALSE, results="asis"}
check_question(c("All contexts have 159 occurrences, except for mild disappointment which occurs 318 times.", "There are fewer contexts in the output than we coded for."), options = c("All contexts have 159 occurrences, except for mild disappointment which occurs 318 times.", "There are fewer contexts in the output than we coded for.", "The context descriptions were not correctly assigned in the case of matching strings.", "There are too many occurrences of matches per context than can reasonably be assumed.", "There are more contexts in the output than we coded for."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Something must have gone wrong. As you can see, it is always a good idea to check the output of every step of your code for inconsistencies!",
wrong = "Not quite, try again.")
```
:::
The contexts *mild disappointment* and *moderate disappointment* have created some issues: In the `question` column, both are described as *ist enttäuscht* ('is disappointed'). Except for their encoding in the `name` column, these contexts appear to be identical. At this point, we have no choice but to look for additional disambiguating information in @fricke2024semantic's analysis script, which you can access at <https://osf.io/k8dtp>. The relevant information can be found in lines 522 and 523 (see @fig-script).
{#fig-script fig-align="center" width="600"}
The emoji 🙁 (*mild disappointment*) is coded as `N-36` and ☹️ (*moderate disappointment*) as `N-37`. We use this information to assign these two contexts to our `context` variable.
```{r recode-disappointment}
df <- df |>
mutate(context = case_when(
str_detect(name, "N-36") ~ "mild disappointment",
str_detect(name, "N-37") ~ "moderate disappointment",
.default = context))
table(df$context)
```
Finally, we add the critical variable that describes whether there is a match between the chosen emojis and the contexts: if the emoji and the context agree, the variable will have the value *match*. Otherwise, the value will be *no match*.
```{r match-variable}
df <- df |>
mutate(
match = case_when(
context == "happiness" & response == "grinning_face_with_big_eyes" ~ "match",
context == "(cheeky) laughter" & response == "grinning_squinting_face" ~ "match",
context == "concern" & response == "hushed_face" ~ "match",
context == "surprise" & response == "astonished_face" ~ "match",
context == "mild irritation" & response == "neutral_face" ~ "match",
context == "annoyance" & response == "expressionless_face" ~ "match",
context == "amusement" & response == "grinning_face_with_smiling_eyes" ~ "match",
context == "(intense) happiness" & response == "beaming_face_with_smiling_eyes" ~ "match",
context == "mild disappointment" & response == "slightly_frowning_face" ~ "match",
context == "moderate disappointment" & response == "frowning_face" ~ "match",
context == "happiness2" & response == "smiling_face_with_smiling_eyes" ~ "match",
context == "bashfulness" & response == "smiling_face" ~ "match",
.default = "no match"))
```
### Building the plots {#sec-au-plot-building}
We will now build our plots to visualise the matching rates per emoji pair. In a new table called `data_AU`, we group the data by contexts. The command `count(match)` counts matches and non-matches for each context and stores them in a new column called `n`. We add the column `percent` to store the rounded percentage of matches and non-matches for each context-pair.
```{r data-AU-context-match}
data_AU <- df |>
group_by(context) |>
count(match) |>
mutate(percent = round(proportions(n)*100, 2))
```
Using the `View()` function, we take a look at our data.
{#fig-data-percentages fig-align="center" width="400"}
We plot the first emoji pair of the AU+ condition 😯 😲 with their respective contexts *concern* and *surprise.*
```{r create-plot-concern-surprise}
#| code-line-numbers: true
plot_concern_surprise <- data_AU |>
filter(context == "concern" | context == "surprise") |>
ggplot(aes(x = context, y = percent, fill = match)) +
geom_col() +
scale_x_discrete(limits = c("concern", "surprise")) +
scale_y_continuous(expand = c(0,0)) +
scale_fill_manual(values = c("#66C2A5", "#E78AC3")) +
geom_text(aes(label = percent), position = position_stack(vjust = 0.5)) +
labs(title = "😯 😲", x = "") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, size = 20),
legend.title=element_blank())
```
The code above creates a barplot and stores it in `plot_concern_surprise`. Here's what each line of code does:
1. Begin by assigning a clear name for the plot and call the data to be plotted.
2. Filter the contexts, such that only rows of the contexts *concern* or (`|`) *surprise* are plotted.
3. Create a `ggplot` object with the context values on the *x*-axis and percentages on the *y*-axis. Colours are filled corresponding to the match values.
4. Display the data as a barplot. By default, `geom_bar()` counts how many times *match* and *no match* occur. However, as we have already calculated and stored the values in the column `percent`, we use `geom_col()` to be able to use the data as is.
5. The context values, which are displayed on the *x*-axis, are discrete. With this command, we set and order the contexts.
6. Use the "expand" argument of the `scale_y_continuous()` function to remove the white space between the bars and the *x*-axis.
7. Adjust colours with values from "Set2" from the {RColorBrewer} package (see @sec-gender-understanding-vis).
8. Annotate the percentages of matching rates by adding them as text and placing them inside the plot, in the middle of the corresponding bars.
9. Add the corresponding emojis at the top of the plot and remove the superfluous *x-*axis label "context".
10. Add a theme, in this case `theme_classic()`.
11. Plots are left-aligned by default. Since we want the emojis to be displayed on top of their corresponding context bars, we move the title to the centre of the plot. To ensure that the emojis are easily interpretable, we also increase the font size to 20 points.
12. Finally, we remove the title of the legend because the *match* and *no match* values are self-explanatory.
Let's take a look at our plot. It's looking great, but we don't need just one plot, we need six: one for each emoji pair. We could write it all out for each emoji pair, but since the code is identical (except for the contexts and the emojis), it is much more efficient to define a **function** to do this.
```{r show-plot-concern-surprise}
#| eval: false
plot_concern_surprise
```
```{r}
#| eval: false
#| echo: false
ggsave(plot = plot_concern_surprise, here("images", "CS_RoseGina_plot_concern_surprise.png"), height = 1344, width = 1881, units = "px")
```

::: {.callout-note title="Defining our own functions"}
Functions are reusable code snippets that perform specific tasks. So far, we have only used built-in `R` functions (see @sec-RFunctions) and functions from add-on packages such {dplyr} from the tidyverse (see @sec-tidyverse), but we have not defined our own functions.
Defining our own functions can help us make our code more efficient and organised. As a rule of thumb, whenever writing new code seems redundant (i.e., when you find yourself copying and pasting entire sections of code), it is best to define a function for that task. This is will ensure that the task is always performed in the same way and, if you find that you need to amend the code to perform the task, you will only need to make the change once, within the function assigned to this task.
The basic structure of a function is `function(argument)`. Looks familiar? Accordingly, we define a function the following way: `function(parameters){function body}`
Here are the steps:
1. We define a function using the keyword `function`. After this keyword, we write a list of **parameters** in brackets. **Parameters** act as placeholders for the function's arguments.
2. We then write code in the **function body** and enclose it in curly brackets. The **function body** tells the function what it is meant to do when called upon.
3. We assign our function a name using the assignment operator (`<-)`. This name will be used to call up the function. To avoid conflicts (see @sec-Conflicts), we choose a name that is not already assigned to a built-in function.
:::
In our case, the process of defining a plotting function is straightforward:
1. We start with the keyword `function()` and state that our function should take `contexts` as its first argument and `emojis` as its second argument, as only these change with each plot.
2. We then simply paste the code that we just wrote for the above barplot inside the curly braces, replacing the specific contexts and emojis with the parameters of our function.
3. We name our function `plot_AU_matches` to make clear what it does: it plots AU matches.
```{r define-function}
plot_AU_matches <- function(contexts, emojis) {
data_AU |>
filter(context %in% contexts) |>
ggplot(aes(x = context, y = percent, fill = match)) +
geom_col() +
scale_x_discrete(limits = contexts) +
scale_y_continuous(expand = c(0,0)) +
scale_fill_manual(values = c("#66C2A5", "#E78AC3")) +
geom_text(aes(label = percent), position = position_stack(vjust = 0.5)) +
labs (x="", y = "percent", title = emojis) +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, size = 20),
legend.title=element_blank())
}
```
We the apply this function to all contexts and emoji pairs by filling them in as the arguments.
```{r apply-function}
plot_concern_surprise <- plot_AU_matches(c("concern", "surprise"), "😯 😲")
plot_happiness_cheeky <- plot_AU_matches(c("happiness", "(cheeky) laughter"), "😃 😆")
plot_mild_irr_annoyance <- plot_AU_matches(c("mild irritation", "annoyance"), "😐 😑")
plot_mild_disapp_mod_dissap <- plot_AU_matches(c("mild disappointment", "moderate disappointment"), "🙁️ ☹️")
plot_amusement_int_happiness <- plot_AU_matches(c("amusement", "(intense) happiness"), "😄 😁")
plot_happiness2_bashfulness <- plot_AU_matches(c("happiness2", "bashfulness"), "😊 ☺️")
```
:::: {.callout-note collapse="true"}
#### How to insert emojis in `R` and render plots with emojis 😰
There are various ways to insert emojis in R. The easiest is to use the emoji keyboard (see @fig-emoji-keyboard). To open it on MacOS, use the keyboard shortcut `Crtl + ⌘ Cmd + Space` or `🌐 fn + e` and on Windows `⊞ Win + . (period)`. The emoji keyboard is also available in RStudio, if you go to the "Edit" drop-down menu and click on "Emojis & Symbols". Alternatively, there are emoji libraries for `R`, for example {emo(ji)} [@wickham2024emoji].
As we want to display emojis within plots, we need to pay even more attention to graphics. Emojis as part of plots created by `ggplot` cannot be displayed by default. Additional problems can occur when rendering a Quarto or RMarkdown document to HTML.
If displaying emojis as part of plots in RStudio does not work for you, you will need to use the high-quality graphics library "AGG" ("Anti-Grain Geometry") or "Cairo" as your graphics backend in RStudio. To do this, head to the "Tools" drop-down menu and click on "Global Options". Then, go to the "Graphics" tab and select the "AGG" or "Cairo" option (see @fig-agg).
::: {#fig-emoji-tools layout-ncol="2"}
{#fig-emoji-keyboard fig-align="center" width="270"}
{#fig-agg fig-align="center" width="300"}
Tools for inserting and displaying emojis in RStudio
:::
If you are using AGG as your graphics backend, you can use the [{ragg}](https://ragg.r-lib.org) package [@pedersen2024ragg] to correctly render your Quarto/RMarkdown document to HTML with all the emojis in the plots. This package provides graphic devices based on AGG and includes advanced text rendering, with support for emojis. To use {ragg} in combinination with the {knitr} engine, first install the package and then add the following setup command at the beginning of your document:
```{r setup-example, eval = FALSE}
install.packages("ragg")
knitr::opts_chunk$set(dev = "ragg_png")
```
If AGG does not work for you, you can use Cairo. Cairo comes preinstalled with `R` so you don't need to install it yourself. The set-up command for your Quarto/RMarkdown document is:
```{r setup-example2, eval = FALSE}
knitr::opts_chunk$set(dev.args = list(png = list(type = "cairo")))
```
::::
### Assembling plots with {patchwork} {#sec-au-patchwork}
By applying our newly created function `plot_AU_matches()` to all emoji pairs and contexts, we have created one barplot for each emoji pair. We will now use the [{patchwork}](https://patchwork.data-imaginist.com) package [@pedersen2024patchwork] to assemble the plots into one figure. As the name suggests, {patchwork} enables us to patch several plots together and arrange them as we wish. The basic operator to combine plots in {patchwork} is the `+` operator. Additionally, plots can be combined:
- Horizontally using `|` and
- Vertically using `/`.
Brackets can be used to combine horizontal and vertical arrangements.
 (CC-BY)](images/AHorst_patchwork.png){#fig-patchwork width="600"}
In line with the research question of @fricke2024semantic, we want to compare the matching rates of emojis and contexts in the AU+ condition with the matching rates in the AU- condition. Our goal is therefore to create a plot that looks similar to @fig-AUFricke created by @fricke2024semantic.
![Plot of individual emoji pairs that compares AU conditions [@fricke2024semantic: 11, CC-BY]](images/CS_RoseGina_Fricke_AUplot.png){#fig-AUFricke width="700"}
First, let us plan the layout of our combined plot with some placeholder names. Our combined plot will have two columns and three rows: In both `column1` and `column2`, three plots are stacked **vertically** on top of each other. These are the plots of the AU+ and the AU- condition, respectively. We then place these patchworks next to each other (**horizontally**) for comparison.
```
column1 <- p1 / p2 / p3
column2 <- p4 / p5 / p6
columns_combined <- column1 | column2
```
We follow the logic above to create our combined plot, choosing informative names for our subplots. To run this code, you will need to have the {patchwork} package installed and the library loaded.
```{r AU-patch-condition}
#install.packages("patchwork")
#library(patchwork)
#AU+ condition:
AU_plus_patch <-
plot_concern_surprise / plot_happiness_cheeky / plot_mild_irr_annoyance
#AU- condition:
AU_minus_patch <-
plot_mild_disapp_mod_dissap / plot_amusement_int_happiness / plot_happiness2_bashfulness
```
To further specify the layout, we use the `plot_layout()` function from {patchwork}. By setting the "guide" argument to "collect", legends that are identical within each patchwork are merged into one. We place the legends at the bottom of each subplot.
```{r legends}
#AU+ condition:
AU_plus_patch <- AU_plus_patch +
plot_layout(guides = "collect") & theme(legend.position = "bottom")
#AU- condition:
AU_minus_patch <- AU_minus_patch +
plot_layout(guides = "collect") & theme(legend.position = "bottom")
```
Since `AU_plus_patch` and `AU_minus_patch` are to be combined in one plot, we need to add titles to keep them apart. Technically, it is possible (and recommended!) to use the `plot_annotation()` function of the {patchwork} package for this. However, annotations made with this function are only shown at the highest nesting level. As we will be building a double-nested plot, any annotations we do on the "blocks-of-three"-level will not be displayed. We can work around this by using the function `wrap_elements()`. This fixates the blocks in their current position and allows us to add titles using `ggtitles()` instead.
```{r patch-titles}
AU_plus_patch <- wrap_elements(plot = AU_plus_patch) +
ggtitle("[AU+] condition")
AU_minus_patch <- wrap_elements(plot = AU_minus_patch) +
ggtitle("[AU-] condition")
```
Finally, we put both columns together to get our final plot.
```{r show-AU-combined-patch}
#| fig-height: 11
#| eval: false
AU_plus_patch | AU_minus_patch
```
```{r}
#| eval: false
#| echo: false
AUpatch_complete <- AU_plus_patch | AU_minus_patch
ggsave(plot = AUpatch_complete,
here("images", "CS_RoseGina_AUpatch_complete.png"),
height = 3200, width = 2300, units = "px")
```
{#fig-AUpatch width="100%"}
@fig-AUpatch contains information on the matching rates of all emoji pairs with their contexts. It contains the same information as @fig-AUFricke from @fricke2024semantic, but does not look exactly the same. Which look do you think is easiest to interpret?
:::: {.callout-note title="Alternative ways of dealing with the legend" collapse="true"}
You probably will have noticed that @fig-AUpatch contains two identical legends. This is the trade-off we take by using the `wrap_elements()` function: We have fixated the patchworks in their state with their legends, which means that the legends cannot be merged later. There are a couple of other options that will produce different outcomes, however, none is going to be perfect: Since we do not want to delete the legend completely, one option would be to keep the legends of all six plots, as in @fig-alllegends. Another option would be to keep the legend of one block and delete the other. However, as you can see in @fig-onelegend, this makes the bars take up the space of the legend, and the bars in one block become wider than in the other one.
::: {#fig-AU-plot-options layout-ncol="2"}
{#fig-alllegends fig-align="center" width="550"}
{#fig-onelegend fig-align="center" width="550"}
Options for the legend placement
:::
::::
### Interpreting the plot
By looking and interpreting @fig-AUpatch, we can now finally answer the research question: Do AU differences lead to differences in meaning between the two emojis of a pair?
Based on the descriptive statistics visualised in @fig-AUpatch, the answer is no, seemingly not. The AU difference does not seem to be critical when deciding which emoji to use in a specific context. The original study also concluded that the matching emoji was "generally preferred with matching rates above chance level" [@fricke2024semantic: 11], both in the AU+ and in the AU- condition. Now, was all this work for nothing?
No, not at all! We can still draw some interesting inferences from the plot we created. For example, we see that minor visual differences between emojis do appear to affect the understanding and selection of emojis in different contexts: By slightly varying the contexts, participants were made to choose emojis with different facial features. Matching rates were quite similar within emoji pairs and, notably, also largely consistent across emoji pairs, regardless of their AU status. Hence, there is much more to explore in future linguistics studies on the semantics of emoji in text messages!
The following quiz questions are about the interpretation of @fig-AUpatch. The questions should help you make sense of the information displayed.
::: callout-tip
#### Quiz time! {.unnumbered}
[**Q6.**]{style="color:green;"} In which context pair did participants choose the matching emojis most often?
```{r echo=FALSE, results="asis"}
check_question("happiness and (cheeky) laughter", options = c("happiness and (cheeky) laughter", "concern and surprise", "amusement and (intense) happiness", "mild irritation and annoyance", "mild disappointment and moderate disappointment", "happiness2 and bashfulness "), type = "radio",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! The vast majority of participants (86.79% and 88.68%) opted for the matching emoji in the contexts *happiness* (😃) and *(cheeky) laughter* (😆).",
wrong = "Not quite, try again.")
```
<br><br> [**Q7.**]{style="color:green;"} How can similar matching rates across emoji pairs (that is, across sub-plots) be interpreted?
```{r echo=FALSE, results="asis"}
check_question("For these pairs, a very similar number of participants chose the matching emoji.", options = c("For these pairs, a very similar number of participants chose the matching emoji.", "For these pairs, participants had the most difficulty choosing one emoji over the other.", "For these pairs, the least number of participants chose the matching emoji.", "For these pairs, half the participants preferred one emoji and half preferred the other."), type = "radio",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Of course, this does not mean that the same participants preferred the matching emojis for these pairs.",
wrong = "This is not correct. Look at the subplots *happiness* 😃 - *(cheeky) laughter* 😆 and *amusement* 😄 - *(intense) happiness* ️😁 and think about their meaning.")
check_hint("Similar matching rates imply that the *match* / *no match* ratio is somewhat constant across contexts. For example, compare the subplots *happiness* 😃 - *(cheeky) laughter* 😆 and *amusement* 😄 - *(intense) happiness*: the matching rates are consistenly high for all emojis and contexts. What might this indicate?", hint_title = "🐭 Click on the mouse for a hint.")
```
<br><br> [**Q8.**]{style="color:green;"} Which interpretations of the lower left plot are correct? Select all that apply.
```{r echo=FALSE, results="asis"}
check_question(c("The emoji 😐 was selected for its matching context considerably less often than all the other emojis.", "Only in the 😐😑 pair does the matching rate for one emoji exceed chance level, while the matching rate for the other falls below chance."), options = c("The emoji 😐 was selected for its matching context considerably less often than all the other emojis.", "There is a major difference between the matching rates of the annoyance and happiness contexts.","Only in the 😐😑 pair does the matching rate for one emoji exceed chance level, while the matching rate for the other falls below chance.", "The disparate matching rates of the 😐😑 pair prove that, in general, AU differences affect participants' preferences."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Apparently, two thirds (66.67 %) of participants favoured the non matching emoji in the *mild irritation*-context: Instead of 😐, they chose 😑. One of the narratives in this context was 'a malfunctioning wifi router'. Which emoji would you choose in this context, 😐 or 😑?" ,
wrong = "Not quite. Consider the implications of a low matching rate versus a high matching rate.", alignment = "vertical")
check_hint("Two statements are correct. A low matching rate indicates that participants rarely chose the emoji that was intended for the context, that is, the emoji that the authors identified as the most fitting.", hint_title = "🐭 Click on the mouse for a hint.")
```
<br><br> [**Q9.**]{style="color:green;"} What are plausible reasons for the striking results presented in the lower left barplot? Select all that apply.
```{r echo=FALSE, results="asis"}
check_question(c("The stories of the mild irritation context were perceived as more annoying than the authors had anticipated.", "The stories created for mild irritation and annoyance triggered a similar reaction."), options = c("The stories of the mild irritation context were perceived as more annoying than the authors had anticipated.", "The stories created for mild irritation and annoyance triggered a similar reaction.", "Unlike what was anticipated, most participants used the emojis 😐 and 😑 in very different contexts.", "The participants did not realize there was a difference between 😐 and 😑."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! These are two possible explanations. However, as we have no way of tracking the participants' reasoning, we can only make some educated guesses.",
wrong = "Not quite. Consider what could have caused the differing matching rates.", alignment = "vertical")
check_hint("Two of these are plausible reasons for the observed result. The plot suggests that participants preferred the same emoji for the stories associated with *mild irritation* and *annoyance*.", hint_title = "🐭 Click on the mouse for a hint.")
```
<br>
:::
Overall, @fig-AUpatch shows that there was indeed a preference for context-matching emojis. Importantly, participants preferred the context-matching emoji in both the AU+ condition and the AU- condition: The overall matching rate of AU- pairs is very similar to that of AU+ pairs. This means that whether or not emoji features coincided with human facial features did not (significantly) affect the participants' decision for one emoji or the other.
The findings do not support @fricke2024semantic's experimental hypothesis, which was based on the pictorial approach. Instead, as Fricke, Grosz & Scheffler [-@fricke2024semantic: 12] observe, the results suggest that visually similar emojis can convey different meanings, even when they correspond to the same human facial expressions.
This observation closely aligns with the predictions of the lexicalist approach proposed by @grosz2023semantics. However, the authors caution that it does not provide definitive evidence for the lexicalist approach [@fricke2024semantic: 12].
## Conclusion {#sec-conclusion}
You have successfully completed [`r checkdown::insert_score()` out of 9 quiz questions]{style="color:green;"} in this chapter.
You are now a pro in handling (stacked) barplots! You can build, customise, arrange, and interpret them. Barplots are powerful for visualising categorical data, offering a straightforward way to compare frequencies and make patterns apparent. However, they do have their limitations. For instance, they are not ideal for displaying continuous data. Building and assembling plots can be quite fiddly and it can take some trial-and-error to make the plot look like what you had imagined. But there is a solution for (almost) everything and hopefully, the beautiful plot you create in the process will be worth the effort.
This chapter's analysis revealed gender-specific differences in emoji understanding, potentially indicating a gender confidence gap between men and women. On average, however, both genders reported at least a good understanding of emojis. The visualisations have been adjusted for the gender imbalance in the data, demonstrating the importance of accounting for differences in group sizes.
In this chapter, we have created an informative figure that answers the experiment’s research question by combining multiple plots. The question whether Action Unit (AU) differences are critical for emoji preference was answered in the negative. However, we have made several other discoveries along the way: As noted by @fricke2024semantic, we have found that small changes of emojis’ facial features do affect choice patterns.
Emojis, it turns out, contain lots of information, and there is a science behind them 🤓. While experimentally measuring why we prefer certain emojis over other ones represents a real challenge, @fricke2024semantic provide valuable insights into this fascinating area of study. As the authors shared their data and code, we were able to successfully reproduce their results, as well as create new informative figures on the basis of their data.
::: {.callout-note collapse="true"}
#### **How to cite this chapter** {.unnumbered}
This is a case study chapter of the web version of the textbook "Data Analysis for the Language Sciences: A very gentle introduction to statistics and data visualisation in R" by Elen Le Foll.
Please cite the current version of this chapter as:
> ::: {style="color: black"}
> Hörsting, Rose and Gina Reinhard. 2024. The semantics of emojis: ExploRing the results of an experimental study. In Elen Le Foll (Ed.), *Data Analysis for the Language Sciences: A very gentle introduction to statistics and data visualisation in R*. Open Educational Resource. <https://elenlefoll.github.io/RstatsTextbook/> (accessed DATE).
> :::
:::
## References {.unnumbered}
Fricke, Lea, Patrick G Grosz, and Tatjana Scheffler. 2024. Semantic Differences in Visually Similar Face Emojis. *Language and Cognition*. Cambridge University Press 1–15. <https://doi.org/10.1017/langcog.2024.12>.
Fugate, Jennifer MB & Courtny L Franco. 2021. Implications for Emotion: Using Anatomically Based Facial Coding to Compare Emoji Faces Across Platforms. *Frontiers in Psychology*. Frontiers Media SA 12. 605928. <https://doi.org/10.3389/fpsyg.2021.605928>.
Grosz, Patrick Georg, Gabriel Greenberg, Christian De Leon & Elsi Kaiser. 2023. A semantics of face emoji in discourse. *Linguistics and Philosophy.* Springer 46(4). 905-957. <https://doi.org/10.1007/s10988-022-09369-8>
Maier, Emar. 2023. Emojis as Pictures. *Ergo* 10. <https://doi.org/10.3998/ergo.4641>.
Neuwirth, Erich. 2022. Package "RColorBrewer". *ColorBrewer Palettes* 991. <https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf>.
Pedersen, Thomas Lin. 2024. *Patchwork: The Composer of Plots*. <https://patchwork.data-imaginist.com>.
Pedersen, Thomas Lin & Maxim Shemanarev. 2024. *Ragg: Graphic Devices Based on AGG*. <https://ragg.r-lib.org>.
Pfeifer, Valeria A, Emma L Armstrong & Vicky Tzuyin Lai. 2022. Do all facial emojis communicate emotion? The impact of facial emojis on perceived sender emotion and text processing. *Computers in Human Behavior*. Elsevier 126. 107016. <https://doi.org/10.1016/j.chb.2021.107016>.
Scheffler, Tatjana & Ivan Nenchev. 2024. Affective, semantic, frequency, and descriptive norms for 107 face emojis. *Behavior Research Methods*. Springer 1–22. <https://doi.org/10.3758/s13428-024-02444-x>.
Wickham, Hadley, Romain François & Lucy D’Agostino McGowan. 2024. *Emo: Easily Insert ’emoji’*. <https://github.com/hadley/emo>.
### Packages used in this chapter {.unnumbered}
```{r package-versions, echo=FALSE}
sessionInfo()
```
### Package references {.unnumbered}
```{r generateBibliography, results="asis", echo=FALSE}
CS_RoseGina_packages.bib <- sapply(1:length(loadedNamespaces()), function(i) toBibtex(citation(loadedNamespaces()[i])))
#knitr::write_bib(c(.packages(), "knitr"), "CS_RoseGina_packages.bib")
require("knitcitations")
cleanbib()
options("citation_format" = "pandoc")
read.bibtex(file = "CS_RoseGina_packages.bib")
```