CS_RoseGina.qmd

---
author: "Rose Hörsting & Gina Reinhard"
date: "2024-11-12"
#date-modified: "2024-11-12"
toc-title: "Case study: The semantics of emojis"
include-after-body: abbrv_toc.html
language: 
  #title-block-modified: "Last updated"
  title-block-author-single: "Authors"
---

# The semantics of emojis: Explo`R`ing the results of an experimental study

```{r setup, include=FALSE}
# Depending on which graphic backend RStudio uses (see Global Options > Graphics), we need to render the plots either with AGG or Cairo to ensure that the emojis are correctly rendered in the HTML export.

#install.packages("ragg")
#knitr::opts_chunk$set(dev = "ragg_png", warning = FALSE, message = FALSE)

knitr::opts_chunk$set(dev.args = list(png = list(type = "cairo")))
```

```{r checkdown, include=FALSE}

library(checkdown)
```

::: {.callout-note collapse="true"}
#### **About the authors of this chapter** {.unnumbered}

**Rose Hörsting** is a second-year master’s student in linguistics at the University of Cologne. She completed her bachelor’s degree in linguistics at the Heinrich Heine University Düsseldorf, where she specialised in psycho- and neurolinguistics. Rose is particularly drawn to understanding language processing in both human and machine contexts. She was first introduced to `R` during her bachelor’s thesis, finding it intimidating at first, but has since developed an enthusiasm for statistics and programming. Now, Rose is enjoying the process of mastering `R` as she deepens her skills in data analysis.

**Gina Reinhard** is also a second-year student in the master's programme in linguistics at the University of Cologne, specialising in computational linguistics. Like Rose, she completed her bachelor's degree at Heinrich Heine University Düsseldorf, with a focus on foreign languages and linguistic diversity. Her background includes studying psychology at Osnabrück University and working at an AI company, which led to her interest in all things cognitive science as a combination of linguistics, psychology, and AI. She is currently working as a research assistant in the field of variational linguistics, studying dialects and regiolects while developing her skills in computational methods for linguistic analysis.

The authors made equal contributions to the chapter and are listed here in alphabetical order. Rose and Gina submitted an earlier version of this chapter as a term paper for Elen Le Foll's M.A. seminar "More than counting words: Introduction to statistics and data visualisation in R" (University of Cologne, summer 2024). Elen supervised the project, provided feedback, and contributed to the present revised version of this chapter.

The authors thank Tatjana Scheffler, co-author of the original study reproduced in this chapter, for her valuable feedback. They have further revised the chapter based on her comments.
:::

### Chapter overview {.unnumbered}

This case-study chapter will guide you through the steps to reproduce selected results from a published experimental linguistics study [@fricke2024semantic] using `R`.

The chapter will walk you through how to:

-   Explore the data of a published linguistics study
-   Preprocess the raw data for analysis (including how to translate, re-order, and re-categorise the levels of categorical variables)
-   Analyse and interpret the frequency counts of categorical variables
-   Visualise these frequencies as barplots
-   Insert and display emojis in `R` and in `ggplot` graphs
-   Combine multiple plots into one figure using {patchwork}
-   Interpret multi-panel plots

We will work with the original raw data from:

> Fricke, L., Grosz, P. G., & Scheffler, T. (2024). Semantic differences in visually similar face emojis. Language and Cognition, 1–15. <https://doi.org/10.1017/langcog.2024.12>

## Introducing the study 🙂 {#sec-introducing-the-study}

Face emojis are frequently used in text messages. They represent facial expressions and often make fundamental contributions to the subtext of a text message. A few studies have investigated the relationship between emojis and the emotions that they depict [@fugate2021implications; @maier2023emojis; @pfeifer2022all]. However, as emojis are a relatively recent phenomenon, there is still a lot to be discovered. In this chapter, we will look into a study by @fricke2024semantic.

### Deconstructing emojis into Action Units {#sec-deconstructing}

@fricke2024semantic compared "visually similar face emojis" using an emoji annotation system developed by @fugate2021implications. This system is based on the Facial Action Coding System (FACS) for human faces [@ekman1978facial], which is an inventory of facial muscle movements that humans can make (such as raising the inner eyebrows or pulling down the corners of the lips). @fugate2021implications adapted FACS for emojis. The facial features of emojis, like *eyebrows arched* and *eyes wide*, are called Action Units (AUs). For convenience, AUs are assigned numbers, allowing them to be easily referenced. As you can see in @fig-emojipairs, each emoji consists of several AUs.

![Emoji pairs and their AU codes [from @fricke2024semantic: 5, CC-BY]](images/CS_RoseGina_emoji_pairs.png){#fig-emojipairs fig-align="center" width="700"}

@fricke2024semantic defined two different types of emoji pairs: In the **AU+ condition**, the pairs of emojis are similar, but are assigned a different set of AUs. The emoji pairs in the **AU- condition** are also similar, but their AUs are identical. @fricke2024semantic deliberately selected emoji pairs that were as visually similar as possible to each other, while ensuring that the two emojis either differed by exactly one Action Unit (AU+) or had no differences in Action Units (AU-).

AUs capture the facial expressions of emojis and, as such, can assist linguists in accurately describing them. However, only expressions that can be consciously changed by humans receive labels. For example, the AU difference between 😃 and 😆 captures the fact that the former emoji has open eyes, while the latter has closed eyes. Since humans can choose whether to open or close their eyes, this is an **AU+ pair**. If the subtle difference between emojis is not manipulable by humans, as in 😄 and 😁, the emojis are described by identical AUs (**AU-**).

### The experiment {#sec-design}

::: {.callout-note title="How did the experiment work?"}
Three AU+ and three AU- emoji pairs were created (see @fig-emojipairs). Each pair was assigned two contexts, with each context corresponding to the prominent usage of one emoji, but not the other. For example, the contexts of the first pair are *happiness* and *(cheeky) laughter*. The contexts were assigned based on <https://emojipedia.org> and a previous norming study [@scheffler2024affective].

Four single-sentence narratives were created for each of the contexts (see @fig-testitems, translated from German below [translation @fricke2024semantic: 6]).

> 1.  Alex writes to his best friend Stefan:
>
>     *I just learned that my cousin's dog has his own advent calendar.*
>
>     Alex is amused. Which of the emojis matches the message better? 😄😁
>
>     <br>
>
> 2.  Alex writes to his best friend Stefan:
>
>     *I just learned that I won 500 Euro in the lottery.*
>
>     Alex is overjoyed. Which of the emojis matches the message better? 😄😁

![Example of a test item in the experiment [from @fricke2024semantic: 6, CC-BY]](images/CS_RoseGina_Fricke_test_items.png){#fig-testitems fig-align="center" width="500"}

These short narratives were divided up into into four experimental lists of 12 items. Each list also contained 12 filler items, so that each participant saw 24 items. The participants were then asked to help choose the emoji that best matched the context. Each participant saw each emoji pair twice. It was measured how often participants chose the context-matching emoji versus the non-matching emoji.
:::

@fricke2024semantic's central research question was: **Do AU differences lead to differences in meaning between the two emojis of a pair?** In line with the pictorial approach by @maier2023emojis the authors hypothesized that visual differences between emojis which correspond to human facial features (AU+) would be more semantically relevant than those that do not (AU-). However, they noted that if no evidence were found to support this hypothesis, it would align with @grosz2023semantics's lexicalist approach. This approach suggests that visual differences between emojis and their correspondence to human facial features are less significant, placing emphasis instead on the intrinsic meaning of the emoji and its constituent parts.

::: callout-tip
#### Quiz time! {.unnumbered}

Read the abstract of the study:

> Fricke, L., Grosz, P. G., & Scheffler, T. (2024). Semantic differences in visually similar face emojis. Language and Cognition, 1–15. <https://doi.org/10.1017/langcog.2024.12>

[**Q1.**]{style="color:green;"} According to the abstract, what were the results of @fricke2024semantic's experiment?

```{r echo=FALSE, results="asis"}

check_question(c("For both types of pairs, the context-matching emoji was preferred over the non-matching one.", "There were no significant differences between the two conditions."), options = c("Participants chose the context-matching emoji more often in the AU+ condition than in the AU- condition.", "There were no significant differences between the two conditions.", "Participants chose the context-matching emoji more often in the AU- condition than in the AU+ condition.", "For both types of pairs, the context-matching emoji was preferred over the non-matching one."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right!",
wrong = "Not quite. Read the abstract again.")

```

 
[**Q2.**]{style="color:green;"} The actual results of the experiment were different from what @fricke2024semantic had expected. According to authors' research hypothesis (visual differences between emojis which correspond to human facial features are more semantically relevant than those that do not), which of these experimental results were expected?

```{r echo=FALSE, results="asis"}

check_question(c("Participants will choose the context-matching emoji more often in the AU+ condition.", "For the AU- pairs, the pattern will be more random."), options = c("Participants will choose the context-matching emoji more often in the AU+ condition.", "For the AU- pairs, the pattern will be more random.", "Participants will choose the context-matching emoji more often in the AU- condition.", "For the AU+ pairs, the pattern will be more random.", "For both types of pairs, the context-matching emoji will be preferred over the non-matching one."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! The hypothesis was that visual differences between emojis would be semantically more relevant if they corresponded to differences in human facial features (AU+). This would lead to participants choosing a context-matching emoji more often in the AU+ condition and making more random choices in the AU- condition.",
wrong = "Not quite. Try again.")
check_hint("Based on the authors' hypothesis, differences corresponding to facial features (AU+) would be more semantically relevant. In contrast, differences that do not correspond to facial features (AU-) would be less semantically relevant, making them less noticeable and leading to more inconsistent choices. How might this affect the frequency with which participants choose the context-matching emoji in each condition?", hint_title = "🐭 Click on the mouse for a hint.")

```

<br>
:::

## Exploring the relationship between gender and emoji understanding {#sec-gender-understanding toc-text="Exploring gender and emoji understanding"}

@fricke2024semantic asked participants about their gender, their attitude towards emojis, how often they use emojis on WhatsApp and how well they think they understand emojis. The authors visualised the distribution of men and women for emoji use and emoji attitude as barplots.

::: {#fig-FrickeGenderBarplots layout-ncol="2"}
![Emoji use by gender](images/CS_RoseGina_Fricke_emoji_use.png){#fig-emojiuse fig-align="center" width="550"}

![Attitude towards emojis by gender](images/CS_RoseGina_Fricke_emoji_attitude.png){#fig-emojiattitude fig-align="center" width="550"}

Barplots from Fricke, Grosz & Scheffler [-@fricke2024semantic: 9-10, CC-BY]
:::

The plots in @fig-FrickeGenderBarplots show that women use emojis more often and have a more positive attitude towards emojis than men. We want to find out whether women also reported a higher level of emoji understanding than men. Our analysis will involve three steps:

1.  Calculating the frequencies of the genders in the data\
2.  Calculating the frequencies of the different levels of emoji understanding for each gender\
3.  Visualising the frequencies in a barplot similar to the plots above.

### Impo`R`ting the data {#sec-importing-the-data}

@fricke2024semantic have made their data and analysis code publicly available on the OSF repository (see @sec-OpenScience). You can access these materials at <https://osf.io/k2t9p/>. There, the data is stored in the file `raw_data.csv`. To follow the steps of this chapter, you will need to download this file.

::: callout-warning
### Session set-up {#sec-session-set-up}

To run the code of this chapter, you will need the following packages. Make sure that they are installed and loaded before starting.

```{r load-libraries, message=FALSE}

library(here)
library(tidyverse)
#install.packages("patchwork")
library(patchwork)
#install.packages("ragg")
library(ragg)

```
:::

We import the authors' raw data using the `read.csv()` and `here()` functions. You will need to adjust the file path to match the folder structure of your computer (see @sec-ImportingDataCSV).

```{r import-data, message=FALSE}

raw_data <- read.csv(file = here("data", "raw_data.csv"))

```

As specified by Fricke, Grosz & Scheffler [-@fricke2024semantic: 8], we filter out participants who exceed the maximum age of 35 years for all following analyses. We do this by using the `filter()` function and store the result in a new data frame called `df`.

```{r filter-age}

df <- raw_data |> 
  filter(age <= 35)

```

### Gender frequency analysis {#sec-gender-freq}

Let's first get a general overview: How many men, women, and non-binary people participated in the study?

The relevant variable in the data set is called `gender`. However, you will see that the names of the gender groups are in German. To figure out what the labels of the different gender groups are, we use the `count()` function:

```{r table-gender}

df |> 
  count(gender)

```

Before we start analysing, we should translate the labels (levels) of the categories into English. Using a combination of `mutate()` and `recode()`, we translate *männlich* to *men*, *weiblich* to *women*, and *divers* to *non-binary.*[^cs_rosegina-1]

[^cs_rosegina-1]: We decided to translate '*divers*' as 'non-binary', as this is the English term that @fricke2024semantic used in their paper.

```{r recode-gender}

df <- df |> 
  mutate(gender = recode(gender, 
                         "männlich" = "men", 
                         "weiblich" = "women", 
                         "divers" = "non-binary"))

df |> 
  count(gender)

```

Now that `gender` variable have English labels, we want to determine how many male, female, and non-binary subjects participated. We have used the `table()` function, which determines the number of occurrences of the different genders in the data. But in this case counting the occurrences is not straightforward. The data frame contains 24 rows for each subject, as each participant saw 24 items (see @sec-design). So, if we were to simply count the occurrences of *men*, *women*, and *non-binary* in the data with `count()`, we would end up with 24 times the values of the frequencies.

To determine the actual gender distribution, we need to count the occurrences according to the subjects' unique IDs. To do this, we apply the `distinct()` function to keep only unique occurrences (to be precise, the first unique occurrence) of each `submission_id`. The argument `.keep_all` is set to `TRUE`, which means that all other variables in the data frame are kept and not deleted.

```{r gender-distinct}

df |> 
  distinct(submission_id, .keep_all = TRUE) |> 
  count(gender)

```

The **mode** (see @sec-Mode) of the `gender` variable in the dataset is *men*, as you can see from the output. The gender distribution is very uneven: 109 men, 47 women, and 3 non-binary people participated in the study. If we are not careful, this imbalance can lead to misleading data visualisations.

::: callout-tip
#### Quiz time! {.unnumbered}

[**Q3.**]{style="color:green;"} Which of these problems are likely to occur if we plot emoji understanding by gender in a barplot with unequal gender groups?

```{r echo=FALSE, results="asis"}

check_question(c("The differences in emoji understanding between gender groups may look bigger or smaller than they actually are."), 
               options = c("The barplot is likely to be too wide for portrait publishing formats.", "Readers may misinterpret the barplot as a histogram.", "The y-axis of the plot may become distorted and therefore inaccurate.", "The differences in emoji understanding between gender groups may look bigger or smaller than they actually are.", "We will not be able to use ggplot to visualise the data."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Unequal group sizes can distort the differences by showing larger totals for larger groups and smaller totals for smaller groups. <br><br>Can you think of ways to address this issue?",
wrong = "Not quite. Try again.")
check_hint("Only one of these is likely to be a real problem.", hint_title = "🐭 Click on the mouse for a hint.")

```

<br>
:::

To solve this problem, we will use the same strategy as @fricke2024semantic. We will use relative rather than absolute frequencies to make sure that the numbers for the different genders are comparable. This means that we will calculate the percentages of emoji understanding within each gender group, treating the total number of male, female, and non-binary participants separately as 100%, rather than counting all subjects together as 100%. In this way, we can see, for example, what percentage of men, women, and non-binary participants reported a very good emoji understanding and compare the numbers across groups.

### How well do the different genders understand emojis? {#sec-gender-understanding-freq}

Next, we calculate the relative frequencies of the different levels of emoji understanding for each gender.

The variable we are interested in is called `emoji_understanding`. Just like with `gender`, we first have to do some data wrangling. We use the `count()` function to get the labels:

```{r table-understanding}

df |> 
  count(emoji_understanding)

```

We translate *mittelmäßig* to *moderate*, *eher gut* to *rather good*, *gut* to *good*, and *sehr gut* to *very good*:

```{r recode-understanding}

df <- df |> 
    mutate(emoji_understanding = recode(emoji_understanding,
                                        "mittelmäßig" = "moderate",
                                        "eher gut" = "rather good",
                                        "gut" = "good",
                                        "sehr gut" = "very good"))
df |> 
  count(emoji_understanding)

```

The levels are still in the wrong order. We need to rearrange them in an ascending order from *moderate* to *very good*. To do this, we define a vector `c("moderate", "rather good", "good", "very good")`. Using the `factor()` function, we encode this vector as a factor:

```{r reorder-understanding}

df <- df |> 
    mutate(emoji_understanding = factor(emoji_understanding,
                                        levels = c("moderate",
                                                   "rather good",
                                                   "good",
                                                   "very good")))

df |> 
  count(emoji_understanding)
```

The levels now look good, so we can determine the frequencies for the different gender groups within `emoji_understanding`. We could do this by simply cross-tabulating gender with emoji understanding (see @sec-Mode). But since we know that the sizes of the gender subsets are very unequal, we also want to calculate the relative frequencies to make the numbers comparable. There is an easy way to calculate relative frequencies using the `proportions()` function (see @sec-DistCat). However, we need to make two additional considerations:

1.  Our aim is to calculate proportions within groups and not across the whole data.
2.  We want to create a comprehensive visualisation that includes both groups of men and women in a single barplot.

To achieve both, we have to first group our data, using the powerful combination of `group_by()` and `count()`. We create a new data frame `gender_understanding_count` and again keep only each participant's unique `submission_id` as above. We group the data by gender and count the frequencies for the different genders within the `emoji_understanding` factor:

```{r gender-understanding-freq}

df |>
  distinct(submission_id, .keep_all = TRUE) |>
  group_by(gender) |> 
  count(gender, emoji_understanding)
```

In this table, `n` was calculated by the `count()` function and represents the number of occurrences for each combination of `gender` and `emoji_understanding`. Next, we use `mutate()` to add a column with the relative frequencies, which we calculate with the formula `proportions(n) * 100` to obtain percentages.

```{r gender-understanding-percent}
#| source-line-numbers: "5"

gender_understanding_count <- df |>
  distinct(submission_id, .keep_all = TRUE) |>
  group_by(gender) |> 
  count(gender, emoji_understanding) |> 
  mutate(percentage = proportions(n) * 100)

gender_understanding_count
```

This tabular presentation of the data already shows us that non-binary participants reported either a *rather good* or *good* understanding of emojis. A higher percentage of women (44.7%) reported a *very good* emoji understanding compared to men (38.5%). But let's create our barplot to see the distribution more clearly.

### Data visualisation 📊 {#sec-gender-understanding-vis}

As mentioned above, we will visualise the relative rather than the absolute frequencies to make sure that the numbers for the different genders are comparable. In line with Fricke, Grosz & Scheffler [-@fricke2024semantic: 9], we also exclude the three non-binary participants. To this end, we use the `filter()` function combined with the `!=` operator (see @sec-RelationalOperators).

```{r filter-gender}

gender_understanding_count <- gender_understanding_count |> 
  filter(gender != "non-binary")

```

We use `ggplot()` to create a barplot with the `emoji_understanding` categories on the *x*-axis and the relative frequencies that we calculated on the *y*-axis. The bars are coloured according to `gender`. We also add a title and axis labels. Finally, we remove the white space between the bottom of the bars with an additional `scale_y_continuous(expand = c(0,0))` layer and change the colours to make our plot look nicer. The hexadecimal color values chosen here are from the colour-blind friendly palette "Set2" from the package [{RColorBrewer}](https://cran.r-project.org/web/packages/RColorBrewer/index.html)[@neuwirth2022package]. Since we only need two colours, we chose to insert them manually to avoid having to install an additional package.

```{r gender-understanding-plot}

ggplot(gender_understanding_count, 
       aes(x = emoji_understanding, 
           y = percentage, 
           fill = gender)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_y_continuous(expand = c(0,0)) +
  labs(title = "Self-reported Emoji Understanding by Gender",
       x = "Emoji understanding",
       y = "Percent") +
  scale_fill_manual(values = c("#8DA0CB", "#FC8D62")) +
  theme_classic()

```

As you can see from the barplot, the gender distribution for emoji understanding is much more even than for emoji use and emoji attitude (see @fig-FrickeGenderBarplots).

::: callout-tip
#### Quiz time! {.unnumbered}

[**Q4.**]{style="color:green;"} How do you interpret this plot?

```{r echo=FALSE, results="asis"}

check_question(c("Proportionally more women than men reported a very good emoji understanding.", "Proportionally more women than men reported a moderate emoji understanding."), options = c("Proportionally more women than men reported a very good emoji understanding.", "Proportionally more women than men reported a moderate emoji understanding.", "Women reported a lower level of emoji understanding than men.", "Around half of all participants reported a rather good understanding of emojis."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! While proportionally more women than men reported a very good emoji understanding, a relatively large number of men also stated that they understood emojis very well. There are also proportionally more women than men in the moderate and rather good category. Around 23% of men and 25% of women reported a rather good understanding, but this does not correspond to 50% of participants as there were more men than women in this study.",
wrong = "Not quite. Try again.")
check_hint("Two of the above options are correct interpretations of the plot.", hint_title = "🐭 Click on the mouse for a hint.")

```

<br>
:::

When comparing our barplot to @fig-FrickeGenderBarplots, it is interesting to note that, whilst women reported more frequent use of emojis and a more positive attitude towards emojis, they did not report a higher understanding of emojis. It is possible that some women were more modest in rating their understanding of emojis than men, which could indicate a gender confidence gap. Reporting a good understanding likely requires more confidence compared to emoji use or attitude towards emoji.

## Comparing matching rates between AU conditions {#sec-au-plot}

We will now turn to exploring the central research question of @fricke2024semantic: **Do AU differences lead to differences in meaning between the two emojis of a pair?** As explained in @sec-deconstructing, AUs are numbers which correspond to human-like facial features. In emoji pairs of the AU+ condition, the visual difference between the emojis is reflected in a number difference, e.g. *grinning face with big eyes* 😃 (AU: **5** + 12 + 25 + 26) and *grinning squinting face* 😆 (AU: 12 + 25 + 26 + **43**). In the AU- condition, the visual difference does not correspond to an AU difference, e.g. *grinning face with smiling eyes* 😄 and *beaming face with smiling eyes* 😁 both have the same AUs (12 + 25 + 26 + 63).

Step by step, we will build an informative plot which will include all the information needed to answer this question. This plot will display how many times each emoji was chosen in its presumed corresponding context.

To achieve this, we need to create a variable that tells us when each participant responded with the matching emoji. In the following, we will create this variable based on the raw data from @fricke2024semantic.

### Preprocessing the data {#sec-au-data-preparation}

First, we need a variable that includes the experimental conditions of each trial. The variable `name` tells us whether a trial consisted of an emoji pair with an AU difference (AU+) or not (AU-), or a filler. Trials with AU+ differences include "AU" in the trial `name`, those with no AU difference begin with "N", and the fillers with "filler".

```{r eval=FALSE}
df |> 
  distinct(name)
```

```{r echo=FALSE}
df |> 
  distinct(name) |> 
  slice(1:10)
```

We use a combination of `mutate()`, `case_when()` and `str_detect()` to construct a new variable (`AU_difference`) that captures the type of trial that we are dealing with. The command essentially says: look for the string "AU" in the column `name`, and in all cases where you find it (`case_when()`), insert the value "AU+" in a new column called `AU_difference`. We follow this procedure for the other trial conditions, too. If neither "AU", "N" or "filler" is detected, nothing (`NULL`) is inserted in `AU_difference`.

```{r AU-variable-complete}

df <- df |> 
  mutate(AU_difference = case_when(str_detect(name, "AU") ~ "AU+",
                                   str_detect(name, "N") ~ "AU-",
                                   str_detect(name, "filler") ~ "filler",
                                   .default = NULL))

```

We use `select()` to compare the two columns and check that everything worked.

```{r table-AU}
df |> 
  slice(1:10) |> 
  select(name, AU_difference)

```

This looks promising. Since we are only interested in the experimental items, we now filter out all filler trials.

```{r filter-filler}

df <- df |>
  filter(AU_difference != "filler")

```

We will now create another variable called `context`. The column of this variable will contain the context descriptions used by Fricke, Grosz & Scheffler [-@fricke2024semantic: 5] in @fig-emojipairs. Again, we combine `mutate()`, `case_when()` and `str_detect()`: In the `question` column, we look for context-characteristic strings, and add the context descriptions whenever we have a match. Again, we check the output with `table()`.

```{r context-variable}

df <- df |> 
  mutate(context = case_when(str_detect(question, "freut sich") ~ "happiness",
                             str_detect(question, "lacht") ~ "(cheeky) laughter",

                             str_detect(question, "macht sich Sorgen") ~ "concern",
                             str_detect(question, "ist überrascht") ~ "surprise",
                             str_detect(question, "ist etwas genervt") ~ "mild irritation",
                             str_detect(question, "ärgert sich") ~ "annoyance",
                             str_detect(question, "amüsiert sich") ~ "amusement",
                             str_detect(question, "ist überglücklich") ~ "(intense) happiness",
                             str_detect(question, "ist enttäuscht") ~ "mild disappointment",
                             str_detect(question, "ist enttäuscht") ~ "moderate disappointment",
                             str_detect(question, "ist gut gelaunt") ~ "happiness2",
                             str_detect(question, "ist verlegen") ~ "bashfulness",
                                   .default = NULL))


table(df$context)

```

::: callout-tip
#### Quiz time! {.unnumbered}

[**Q5.**]{style="color:green;"} Which problems become apparent when checking the content of our new `context` variable using the `table()` function?

```{r echo=FALSE, results="asis"}

check_question(c("All contexts have 159 occurrences, except for mild disappointment which occurs 318 times.", "There are fewer contexts in the output than we coded for."), options = c("All contexts have 159 occurrences, except for mild disappointment which occurs 318 times.", "There are fewer contexts in the output than we coded for.", "The context descriptions were not correctly assigned in the case of matching strings.", "There are too many occurrences of matches per context than can reasonably be assumed.", "There are more contexts in the output than we coded for."), type = "checkbox", 
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Something must have gone wrong. As you can see, it is always a good idea to check the output of every step of your code for inconsistencies!",
wrong = "Not quite, try again.")

```
:::

The contexts *mild disappointment* and *moderate disappointment* have created some issues: In the `question` column, both are described as *ist enttäuscht* ('is disappointed'). Except for their encoding in the `name` column, these contexts appear to be identical. At this point, we have no choice but to look for additional disambiguating information in @fricke2024semantic's analysis script, which you can access at <https://osf.io/k8dtp>. The relevant information can be found in lines 522 and 523 (see @fig-script).

![Screenshot from the authors' analysis script available at <https://osf.io/k8dtp>](images/CS_RoseGina_disappointment_script_fricke.png){#fig-script fig-align="center" width="600"}

The emoji 🙁 (*mild disappointment*) is coded as `N-36` and ☹️ (*moderate disappointment*) as `N-37`. We use this information to assign these two contexts to our `context` variable.

```{r recode-disappointment}

df <- df |> 
  mutate(context = case_when(
                             str_detect(name, "N-36") ~ "mild disappointment",
                             str_detect(name, "N-37") ~ "moderate disappointment",
                                   .default = context))

table(df$context)

```

Finally, we add the critical variable that describes whether there is a match between the chosen emojis and the contexts: if the emoji and the context agree, the variable will have the value *match*. Otherwise, the value will be *no match*.

```{r match-variable}

df <- df |> 
  mutate(
  match = case_when(
    context == "happiness" & response == "grinning_face_with_big_eyes" ~ "match",
    context == "(cheeky) laughter" & response == "grinning_squinting_face" ~ "match",
    context == "concern" & response == "hushed_face" ~ "match",
    context == "surprise" & response == "astonished_face" ~ "match",
    context == "mild irritation" & response == "neutral_face" ~ "match",
    context == "annoyance" & response == "expressionless_face" ~ "match",
    context == "amusement" & response == "grinning_face_with_smiling_eyes" ~ "match",
    context == "(intense) happiness" & response == "beaming_face_with_smiling_eyes" ~ "match",
    context == "mild disappointment" & response == "slightly_frowning_face" ~ "match",
    context == "moderate disappointment" & response == "frowning_face" ~ "match",
    context == "happiness2" & response == "smiling_face_with_smiling_eyes" ~ "match",
    context == "bashfulness" & response == "smiling_face" ~ "match",
    .default = "no match"))

```

### Building the plots {#sec-au-plot-building}

We will now build our plots to visualise the matching rates per emoji pair. In a new table called `data_AU`, we group the data by contexts. The command `count(match)` counts matches and non-matches for each context and stores them in a new column called `n`. We add the column `percent` to store the rounded percentage of matches and non-matches for each context-pair.

```{r data-AU-context-match}

data_AU <- df |> 
  group_by(context) |> 
  count(match) |>
  mutate(percent = round(proportions(n)*100, 2))

```

Using the `View()` function, we take a look at our data.

![The first 14 columns of the data frame data_AU as visualised using the `View()` function in RStudio](images/CS_RoseGina_context_percentages.png){#fig-data-percentages fig-align="center" width="400"}

We plot the first emoji pair of the AU+ condition 😯 😲 with their respective contexts *concern* and *surprise.*

```{r create-plot-concern-surprise}
#| code-line-numbers: true

plot_concern_surprise <- data_AU |> 
  filter(context == "concern" | context == "surprise") |>
  ggplot(aes(x = context, y = percent, fill = match)) +
  geom_col() +
  scale_x_discrete(limits = c("concern", "surprise")) +
  scale_y_continuous(expand = c(0,0)) +
  scale_fill_manual(values = c("#66C2A5", "#E78AC3")) +
  geom_text(aes(label = percent), position = position_stack(vjust = 0.5)) +
  labs(title = "😯 😲", x = "") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, size = 20), 
        legend.title=element_blank())

```

The code above creates a barplot and stores it in `plot_concern_surprise`. Here's what each line of code does:

1.  Begin by assigning a clear name for the plot and call the data to be plotted.
2.  Filter the contexts, such that only rows of the contexts *concern* or (`|`) *surprise* are plotted.
3.  Create a `ggplot` object with the context values on the *x*-axis and percentages on the *y*-axis. Colours are filled corresponding to the match values.
4.  Display the data as a barplot. By default, `geom_bar()` counts how many times *match* and *no match* occur. However, as we have already calculated and stored the values in the column `percent`, we use `geom_col()` to be able to use the data as is.
5.  The context values, which are displayed on the *x*-axis, are discrete. With this command, we set and order the contexts.
6.  Use the "expand" argument of the `scale_y_continuous()` function to remove the white space between the bars and the *x*-axis.
7.  Adjust colours with values from "Set2" from the {RColorBrewer} package (see @sec-gender-understanding-vis).
8.  Annotate the percentages of matching rates by adding them as text and placing them inside the plot, in the middle of the corresponding bars.
9.  Add the corresponding emojis at the top of the plot and remove the superfluous *x-*axis label "context".
10. Add a theme, in this case `theme_classic()`.
11. Plots are left-aligned by default. Since we want the emojis to be displayed on top of their corresponding context bars, we move the title to the centre of the plot. To ensure that the emojis are easily interpretable, we also increase the font size to 20 points.
12. Finally, we remove the title of the legend because the *match* and *no match* values are self-explanatory.

Let's take a look at our plot. It's looking great, but we don't need just one plot, we need six: one for each emoji pair. We could write it all out for each emoji pair, but since the code is identical (except for the contexts and the emojis), it is much more efficient to define a **function** to do this.

```{r show-plot-concern-surprise}
#| eval: false
plot_concern_surprise
```

```{r}
#| eval: false
#| echo: false
ggsave(plot = plot_concern_surprise, here("images", "CS_RoseGina_plot_concern_surprise.png"), height = 1344, width = 1881, units = "px")
```

![](images/CS_RoseGina_plot_concern_surprise.png)

::: {.callout-note title="Defining our own functions"}
Functions are reusable code snippets that perform specific tasks. So far, we have only used built-in `R` functions (see @sec-RFunctions) and functions from add-on packages such {dplyr} from the tidyverse (see @sec-tidyverse), but we have not defined our own functions.

Defining our own functions can help us make our code more efficient and organised. As a rule of thumb, whenever writing new code seems redundant (i.e., when you find yourself copying and pasting entire sections of code), it is best to define a function for that task. This is will ensure that the task is always performed in the same way and, if you find that you need to amend the code to perform the task, you will only need to make the change once, within the function assigned to this task.

The basic structure of a function is `function(argument)`. Looks familiar? Accordingly, we define a function the following way: `function(parameters){function body}`

Here are the steps:

1.  We define a function using the keyword `function`. After this keyword, we write a list of **parameters** in brackets. **Parameters** act as placeholders for the function's arguments.
2.  We then write code in the **function body** and enclose it in curly brackets. The **function body** tells the function what it is meant to do when called upon.
3.  We assign our function a name using the assignment operator (`<-)`. This name will be used to call up the function. To avoid conflicts (see @sec-Conflicts), we choose a name that is not already assigned to a built-in function.
:::

In our case, the process of defining a plotting function is straightforward:

1.  We start with the keyword `function()` and state that our function should take `contexts` as its first argument and `emojis` as its second argument, as only these change with each plot.
2.  We then simply paste the code that we just wrote for the above barplot inside the curly braces, replacing the specific contexts and emojis with the parameters of our function.
3.  We name our function `plot_AU_matches` to make clear what it does: it plots AU matches.

```{r define-function}

plot_AU_matches <- function(contexts, emojis) {
  data_AU |> 
    filter(context %in% contexts) |> 
    ggplot(aes(x = context, y = percent, fill = match)) +
    geom_col() +
    scale_x_discrete(limits = contexts) +
    scale_y_continuous(expand = c(0,0)) +
    scale_fill_manual(values = c("#66C2A5", "#E78AC3")) +
    geom_text(aes(label = percent), position = position_stack(vjust = 0.5)) +
    labs (x="", y = "percent", title = emojis) +
    theme_classic() +
    theme(plot.title = element_text(hjust = 0.5, size = 20), 
          legend.title=element_blank())
}
```

We the apply this function to all contexts and emoji pairs by filling them in as the arguments.

```{r apply-function}

plot_concern_surprise <- plot_AU_matches(c("concern", "surprise"), "😯 😲")
plot_happiness_cheeky <- plot_AU_matches(c("happiness", "(cheeky) laughter"), "😃 😆")
plot_mild_irr_annoyance <- plot_AU_matches(c("mild irritation", "annoyance"), "😐 😑")
plot_mild_disapp_mod_dissap <- plot_AU_matches(c("mild disappointment", "moderate disappointment"), "🙁️ ☹️")
plot_amusement_int_happiness <- plot_AU_matches(c("amusement", "(intense) happiness"), "😄 😁")
plot_happiness2_bashfulness <- plot_AU_matches(c("happiness2", "bashfulness"), "😊 ☺️")
```

:::: {.callout-note collapse="true"}
#### How to insert emojis in `R` and render plots with emojis 😰

There are various ways to insert emojis in R. The easiest is to use the emoji keyboard (see @fig-emoji-keyboard). To open it on MacOS, use the keyboard shortcut `Crtl + ⌘ Cmd + Space` or `🌐 fn + e` and on Windows `⊞ Win + . (period)`. The emoji keyboard is also available in RStudio, if you go to the "Edit" drop-down menu and click on "Emojis & Symbols". Alternatively, there are emoji libraries for `R`, for example {emo(ji)} [@wickham2024emoji].

As we want to display emojis within plots, we need to pay even more attention to graphics. Emojis as part of plots created by `ggplot` cannot be displayed by default. Additional problems can occur when rendering a Quarto or RMarkdown document to HTML.

If displaying emojis as part of plots in RStudio does not work for you, you will need to use the high-quality graphics library "AGG" ("Anti-Grain Geometry") or "Cairo" as your graphics backend in RStudio. To do this, head to the "Tools" drop-down menu and click on "Global Options". Then, go to the "Graphics" tab and select the "AGG" or "Cairo" option (see @fig-agg).

::: {#fig-emoji-tools layout-ncol="2"}
![The emoji keyboard](images/CS_RoseGina_emoji_keyboard.png){#fig-emoji-keyboard fig-align="center" width="270"}

![Recommended graphics backend in RStudio](images/CS_RoseGina_AGG.png){#fig-agg fig-align="center" width="300"}

Tools for inserting and displaying emojis in RStudio
:::

If you are using AGG as your graphics backend, you can use the [{ragg}](https://ragg.r-lib.org) package [@pedersen2024ragg] to correctly render your Quarto/RMarkdown document to HTML with all the emojis in the plots. This package provides graphic devices based on AGG and includes advanced text rendering, with support for emojis. To use {ragg} in combinination with the {knitr} engine, first install the package and then add the following setup command at the beginning of your document:

```{r setup-example, eval = FALSE}

install.packages("ragg")

knitr::opts_chunk$set(dev = "ragg_png")

```

If AGG does not work for you, you can use Cairo. Cairo comes preinstalled with `R` so you don't need to install it yourself. The set-up command for your Quarto/RMarkdown document is:

```{r setup-example2, eval = FALSE}

knitr::opts_chunk$set(dev.args = list(png = list(type = "cairo")))

```
::::

### Assembling plots with {patchwork} {#sec-au-patchwork}

By applying our newly created function `plot_AU_matches()` to all emoji pairs and contexts, we have created one barplot for each emoji pair. We will now use the [{patchwork}](https://patchwork.data-imaginist.com) package [@pedersen2024patchwork] to assemble the plots into one figure. As the name suggests, {patchwork} enables us to patch several plots together and arrange them as we wish. The basic operator to combine plots in {patchwork} is the `+` operator. Additionally, plots can be combined:

-   Horizontally using `|` and
-   Vertically using `/`.

Brackets can be used to combine horizontal and vertical arrangements.

![Patchwork artwork by [Allison Horst](https://allisonhorst.com/r-packages-functions) (CC-BY)](images/AHorst_patchwork.png){#fig-patchwork width="600"}

In line with the research question of @fricke2024semantic, we want to compare the matching rates of emojis and contexts in the AU+ condition with the matching rates in the AU- condition. Our goal is therefore to create a plot that looks similar to @fig-AUFricke created by @fricke2024semantic.

![Plot of individual emoji pairs that compares AU conditions [@fricke2024semantic: 11, CC-BY]](images/CS_RoseGina_Fricke_AUplot.png){#fig-AUFricke width="700"}

First, let us plan the layout of our combined plot with some placeholder names. Our combined plot will have two columns and three rows: In both `column1` and `column2`, three plots are stacked **vertically** on top of each other. These are the plots of the AU+ and the AU- condition, respectively. We then place these patchworks next to each other (**horizontally**) for comparison.

```         
column1 <- p1 / p2 / p3

column2 <- p4 / p5 / p6

columns_combined <- column1 | column2
```

We follow the logic above to create our combined plot, choosing informative names for our subplots. To run this code, you will need to have the {patchwork} package installed and the library loaded.

```{r AU-patch-condition}

#install.packages("patchwork")
#library(patchwork)

#AU+ condition:
AU_plus_patch <-
  plot_concern_surprise / plot_happiness_cheeky / plot_mild_irr_annoyance

#AU- condition:
AU_minus_patch <-
  plot_mild_disapp_mod_dissap / plot_amusement_int_happiness / plot_happiness2_bashfulness

```

To further specify the layout, we use the `plot_layout()` function from {patchwork}. By setting the "guide" argument to "collect", legends that are identical within each patchwork are merged into one. We place the legends at the bottom of each subplot.

```{r legends}

#AU+ condition:
AU_plus_patch <- AU_plus_patch +
  plot_layout(guides = "collect") & theme(legend.position = "bottom")

#AU- condition:
AU_minus_patch <- AU_minus_patch +
  plot_layout(guides = "collect") & theme(legend.position = "bottom")
```

Since `AU_plus_patch` and `AU_minus_patch` are to be combined in one plot, we need to add titles to keep them apart. Technically, it is possible (and recommended!) to use the `plot_annotation()` function of the {patchwork} package for this. However, annotations made with this function are only shown at the highest nesting level. As we will be building a double-nested plot, any annotations we do on the "blocks-of-three"-level will not be displayed. We can work around this by using the function `wrap_elements()`. This fixates the blocks in their current position and allows us to add titles using `ggtitles()` instead.

```{r patch-titles}

AU_plus_patch <- wrap_elements(plot = AU_plus_patch) +
  ggtitle("[AU+] condition")

AU_minus_patch <- wrap_elements(plot = AU_minus_patch) +
  ggtitle("[AU-] condition")
```

Finally, we put both columns together to get our final plot.

```{r show-AU-combined-patch}
#| fig-height: 11
#| eval: false

AU_plus_patch | AU_minus_patch

```

```{r}
#| eval: false
#| echo: false

AUpatch_complete <- AU_plus_patch | AU_minus_patch

ggsave(plot = AUpatch_complete, 
       here("images", "CS_RoseGina_AUpatch_complete.png"), 
       height = 3200, width = 2300, units = "px")
```

![Plot combining all six plots into one figure](images/CS_RoseGina_AUpatch_complete.png){#fig-AUpatch width="100%"}

@fig-AUpatch contains information on the matching rates of all emoji pairs with their contexts. It contains the same information as @fig-AUFricke from @fricke2024semantic, but does not look exactly the same. Which look do you think is easiest to interpret?

:::: {.callout-note title="Alternative ways of dealing with the legend" collapse="true"}
You probably will have noticed that @fig-AUpatch contains two identical legends. This is the trade-off we take by using the `wrap_elements()` function: We have fixated the patchworks in their state with their legends, which means that the legends cannot be merged later. There are a couple of other options that will produce different outcomes, however, none is going to be perfect: Since we do not want to delete the legend completely, one option would be to keep the legends of all six plots, as in @fig-alllegends. Another option would be to keep the legend of one block and delete the other. However, as you can see in @fig-onelegend, this makes the bars take up the space of the legend, and the bars in one block become wider than in the other one.

::: {#fig-AU-plot-options layout-ncol="2"}
![The combined plot with six legends](images/CS_RoseGina_AU_plot_six_legends.png){#fig-alllegends fig-align="center" width="550"}

![The combined plot with one legend](images/CS_RoseGina_AU_plot_one_legend.png){#fig-onelegend fig-align="center" width="550"}

Options for the legend placement
:::
::::

### Interpreting the plot

By looking and interpreting @fig-AUpatch, we can now finally answer the research question: Do AU differences lead to differences in meaning between the two emojis of a pair?

Based on the descriptive statistics visualised in @fig-AUpatch, the answer is no, seemingly not. The AU difference does not seem to be critical when deciding which emoji to use in a specific context. The original study also concluded that the matching emoji was "generally preferred with matching rates above chance level" [@fricke2024semantic: 11], both in the AU+ and in the AU- condition. Now, was all this work for nothing?

No, not at all! We can still draw some interesting inferences from the plot we created. For example, we see that minor visual differences between emojis do appear to affect the understanding and selection of emojis in different contexts: By slightly varying the contexts, participants were made to choose emojis with different facial features. Matching rates were quite similar within emoji pairs and, notably, also largely consistent across emoji pairs, regardless of their AU status. Hence, there is much more to explore in future linguistics studies on the semantics of emoji in text messages!

The following quiz questions are about the interpretation of @fig-AUpatch. The questions should help you make sense of the information displayed.

::: callout-tip
#### Quiz time! {.unnumbered}

[**Q6.**]{style="color:green;"} In which context pair did participants choose the matching emojis most often?

```{r echo=FALSE, results="asis"}

check_question("happiness and (cheeky) laughter", options = c("happiness and (cheeky) laughter", "concern and surprise", "amusement and (intense) happiness", "mild irritation and annoyance", "mild disappointment and moderate disappointment", "happiness2 and bashfulness "), type = "radio", 
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! The vast majority of participants (86.79% and 88.68%) opted for the matching emoji in the contexts *happiness* (😃) and *(cheeky) laughter* (😆).",
wrong = "Not quite, try again.")

```

<br><br> [**Q7.**]{style="color:green;"} How can similar matching rates across emoji pairs (that is, across sub-plots) be interpreted?

```{r echo=FALSE, results="asis"}
check_question("For these pairs, a very similar number of participants chose the matching emoji.", options = c("For these pairs, a very similar number of participants chose the matching emoji.", "For these pairs, participants had the most difficulty choosing one emoji over the other.", "For these pairs, the least number of participants chose the matching emoji.", "For these pairs, half the participants preferred one emoji and half preferred the other."), type = "radio", 
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Of course, this does not mean that the same participants preferred the matching emojis for these pairs.",
wrong = "This is not correct. Look at the subplots *happiness* 😃 - *(cheeky) laughter* 😆 and *amusement* 😄 - *(intense) happiness* ️😁 and think about their meaning.")
check_hint("Similar matching rates imply that the *match* / *no match* ratio is somewhat constant across contexts. For example, compare the subplots *happiness* 😃 - *(cheeky) laughter* 😆 and *amusement* 😄 - *(intense) happiness*: the matching rates are consistenly high for all emojis and contexts. What might this indicate?", hint_title = "🐭 Click on the mouse for a hint.")
```

<br><br> [**Q8.**]{style="color:green;"} Which interpretations of the lower left plot are correct? Select all that apply.

```{r echo=FALSE, results="asis"}
check_question(c("The emoji 😐 was selected for its matching context considerably less often than all the other emojis.", "Only in the 😐😑 pair does the matching rate for one emoji exceed chance level, while the matching rate for the other falls below chance."), options = c("The emoji 😐 was selected for its matching context considerably less often than all the other emojis.", "There is a major difference between the matching rates of the annoyance and happiness contexts.","Only in the 😐😑 pair does the matching rate for one emoji exceed chance level, while the matching rate for the other falls below chance.", "The disparate matching rates of the 😐😑 pair prove that, in general, AU differences affect participants' preferences."), type = "checkbox", 
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! Apparently, two thirds (66.67 %) of participants favoured the non matching emoji in the *mild irritation*-context: Instead of 😐, they chose 😑. One of the narratives in this context was 'a malfunctioning wifi router'. Which emoji would you choose in this context, 😐 or 😑?" ,
wrong = "Not quite. Consider the implications of a low matching rate versus a high matching rate.", alignment = "vertical")
check_hint("Two statements are correct. A low matching rate indicates that participants rarely chose the emoji that was intended for the context, that is, the emoji that the authors identified as the most fitting.", hint_title = "🐭 Click on the mouse for a hint.")
```

<br><br> [**Q9.**]{style="color:green;"} What are plausible reasons for the striking results presented in the lower left barplot? Select all that apply.

```{r echo=FALSE, results="asis"}

check_question(c("The stories of the mild irritation context were perceived as more annoying than the authors had anticipated.", "The stories created for mild irritation and annoyance triggered a similar reaction."), options = c("The stories of the mild irritation context were perceived as more annoying than the authors had anticipated.", "The stories created for mild irritation and annoyance triggered a similar reaction.", "Unlike what was anticipated, most participants used the emojis 😐 and 😑 in very different contexts.", "The participants did not realize there was a difference between 😐 and 😑."), type = "checkbox",
random_answer_order = TRUE,
button_label = "Check answer",
right = "That's right! These are two possible explanations. However, as we have no way of tracking the participants' reasoning, we can only make some educated guesses.",
wrong = "Not quite. Consider what could have caused the differing matching rates.", alignment = "vertical")
check_hint("Two of these are plausible reasons for the observed result. The plot suggests that participants preferred the same emoji for the stories associated with *mild irritation* and *annoyance*.", hint_title = "🐭 Click on the mouse for a hint.")

```

<br>
:::

Overall, @fig-AUpatch shows that there was indeed a preference for context-matching emojis. Importantly, participants preferred the context-matching emoji in both the AU+ condition and the AU- condition: The overall matching rate of AU- pairs is very similar to that of AU+ pairs. This means that whether or not emoji features coincided with human facial features did not (significantly) affect the participants' decision for one emoji or the other.

The findings do not support @fricke2024semantic's experimental hypothesis, which was based on the pictorial approach. Instead, as Fricke, Grosz & Scheffler [-@fricke2024semantic: 12] observe, the results suggest that visually similar emojis can convey different meanings, even when they correspond to the same human facial expressions.

This observation closely aligns with the predictions of the lexicalist approach proposed by @grosz2023semantics. However, the authors caution that it does not provide definitive evidence for the lexicalist approach [@fricke2024semantic: 12].

## Conclusion {#sec-conclusion}

You have successfully completed [`r checkdown::insert_score()` out of 9 quiz questions]{style="color:green;"} in this chapter.

You are now a pro in handling (stacked) barplots! You can build, customise, arrange, and interpret them. Barplots are powerful for visualising categorical data, offering a straightforward way to compare frequencies and make patterns apparent. However, they do have their limitations. For instance, they are not ideal for displaying continuous data. Building and assembling plots can be quite fiddly and it can take some trial-and-error to make the plot look like what you had imagined. But there is a solution for (almost) everything and hopefully, the beautiful plot you create in the process will be worth the effort.

This chapter's analysis revealed gender-specific differences in emoji understanding, potentially indicating a gender confidence gap between men and women. On average, however, both genders reported at least a good understanding of emojis. The visualisations have been adjusted for the gender imbalance in the data, demonstrating the importance of accounting for differences in group sizes.

In this chapter, we have created an informative figure that answers the experiment’s research question by combining multiple plots. The question whether Action Unit (AU) differences are critical for emoji preference was answered in the negative. However, we have made several other discoveries along the way: As noted by @fricke2024semantic, we have found that small changes of emojis’ facial features do affect choice patterns.

Emojis, it turns out, contain lots of information, and there is a science behind them 🤓. While experimentally measuring why we prefer certain emojis over other ones represents a real challenge, @fricke2024semantic provide valuable insights into this fascinating area of study. As the authors shared their data and code, we were able to successfully reproduce their results, as well as create new informative figures on the basis of their data.

::: {.callout-note collapse="true"}
#### **How to cite this chapter** {.unnumbered}

This is a case study chapter of the web version of the textbook "Data Analysis for the Language Sciences: A very gentle introduction to statistics and data visualisation in R" by Elen Le Foll.

Please cite the current version of this chapter as:

> ::: {style="color: black"}
> Hörsting, Rose and Gina Reinhard. 2024. The semantics of emojis: ExploRing the results of an experimental study. In Elen Le Foll (Ed.), *Data Analysis for the Language Sciences: A very gentle introduction to statistics and data visualisation in R*. Open Educational Resource. <https://elenlefoll.github.io/RstatsTextbook/> (accessed DATE).
> :::
:::

## References {.unnumbered}

Fricke, Lea, Patrick G Grosz, and Tatjana Scheffler. 2024. Semantic Differences in Visually Similar Face Emojis. *Language and Cognition*. Cambridge University Press 1–15. <https://doi.org/10.1017/langcog.2024.12>.

Fugate, Jennifer MB & Courtny L Franco. 2021. Implications for Emotion: Using Anatomically Based Facial Coding to Compare Emoji Faces Across Platforms. *Frontiers in Psychology*. Frontiers Media SA 12. 605928. <https://doi.org/10.3389/fpsyg.2021.605928>.

Grosz, Patrick Georg, Gabriel Greenberg, Christian De Leon & Elsi Kaiser. 2023. A semantics of face emoji in discourse. *Linguistics and Philosophy.* Springer 46(4). 905-957. <https://doi.org/10.1007/s10988-022-09369-8>

Maier, Emar. 2023. Emojis as Pictures. *Ergo* 10. <https://doi.org/10.3998/ergo.4641>.

Neuwirth, Erich. 2022. Package "RColorBrewer". *ColorBrewer Palettes* 991. <https://cran.r-project.org/web/packages/RColorBrewer/RColorBrewer.pdf>.

Pedersen, Thomas Lin. 2024. *Patchwork: The Composer of Plots*. <https://patchwork.data-imaginist.com>.

Pedersen, Thomas Lin & Maxim Shemanarev. 2024. *Ragg: Graphic Devices Based on AGG*. <https://ragg.r-lib.org>.

Pfeifer, Valeria A, Emma L Armstrong & Vicky Tzuyin Lai. 2022. Do all facial emojis communicate emotion? The impact of facial emojis on perceived sender emotion and text processing. *Computers in Human Behavior*. Elsevier 126. 107016. <https://doi.org/10.1016/j.chb.2021.107016>.

Scheffler, Tatjana & Ivan Nenchev. 2024. Affective, semantic, frequency, and descriptive norms for 107 face emojis. *Behavior Research Methods*. Springer 1–22. <https://doi.org/10.3758/s13428-024-02444-x>.

Wickham, Hadley, Romain François & Lucy D’Agostino McGowan. 2024. *Emo: Easily Insert ’emoji’*. <https://github.com/hadley/emo>.

### Packages used in this chapter {.unnumbered}

```{r package-versions, echo=FALSE}
sessionInfo()
```

### Package references {.unnumbered}

```{r generateBibliography, results="asis", echo=FALSE}

CS_RoseGina_packages.bib <- sapply(1:length(loadedNamespaces()), function(i) toBibtex(citation(loadedNamespaces()[i])))

#knitr::write_bib(c(.packages(), "knitr"), "CS_RoseGina_packages.bib")

require("knitcitations")
cleanbib()
options("citation_format" = "pandoc")
read.bibtex(file = "CS_RoseGina_packages.bib")
```