quickstart.qmd

---
title: "Quickstart demo"

date-modified: 'today'
date-format: long

format: 
  html:
    number-sections: true
    number-depth: 2
    footer: "CC BY 4.0 John R Little"

license: CC BY   
---

## Outline

-   Make RStudio project
    -   Make a data folder
    -   Drag CSV into the data folder
-   Open Quarto notebook

2.  Load libraries

3.  IMPORT data

    -   `read_csv()`
    -   See Also *RStudio data import wizard*
    -   ATTACH data

4.  Visualize

5.  Wrangle data: five dplyr verbs

    -   `filter`, `select`, `arrange`, `mutate`
    -   `count` / `group_by` & `summarize`

6.  EDA: `skimr::skim(starwars)`

7.  EDA: summary(fav_rating)

8.  `left_join()`

9.  Pivot

10. Interactive visualization

11. Linear regression / models

12. Reports: notebooks, slides, dashboards, word document, PDF, book, etc.

------------------------------------------------------------------------

## `library(tidyverses)` plus other libraries

```{r}
#| message: false
#| warning: false
library(tidyverse)
library(skimr)
library(plotly)
library(moderndive)
library(broom)
```

## Import data

See Also [data import wizard](https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio)

```{r}
#| message: false
#| warning: false
favorability <- read_csv("data/fav.csv")
```

### Attached on-board data

-   dplyr::starwars

`dplyr::starwars`

```{r}
data("starwars")
```

## Visualization

Visualize with the `ggplot2` library.

```{r}
plot <- ggplot(data = starwars, 
               aes(x = hair_color)) + 
  geom_bar()
plot
```

### One improvement

Arrange bars by frequency using `forcats::fct_infreq()`

```{r}
plot1 <- ggplot(starwars, 
                aes(fct_infreq(hair_color))) + 
  geom_bar()
plot1
```

## Wrangle data:

From the `dplyr` library, use the five verbs ...

-   filter, select, arrange,

-   mutate

-   count, group_by & summarize

### `select` to subset data by columns

```{r}
starwars %>% 
  select(name, gender, hair_color)
```

### `filter` to subset data rows

```{r}
starwars %>% 
  filter(gender == "feminine")
```

### `arrange` to sort data

```{r}
starwars %>% 
  arrange(desc(height), desc(name))
```

### `mutate` to add new variable or transform existing

```{r}
starwars %>%
  drop_na(mass) %>% 
  select(name, mass) %>% 
  mutate(big_mass = mass * 2)
```

### `count` / `group_by` & `summarize`

subtotals of variables

```{r}
starwars %>% 
  count(gender)
```

Variable totals (and also, but not here, calculations)

```{r}
starwars %>% 
  drop_na(mass) %>% 
  summarise(sum(mass))
```

Variable subtotals and calculations

> `group_by(gender, species) %>%  summarise(mean_height = mean(height), total = n())`

```{r message=FALSE, warning=FALSE}
starwars %>% 
  drop_na(height) %>% 
  group_by(gender, species) %>% 
  summarise(mean_height = mean(height), total = n()) %>% 
  arrange(desc(total)) %>%
  drop_na(species) %>%
  filter(total > 1) %>% 
  select(species, gender, total, everything())
```

## `skim()`

The `skimr` library presents summary EDA results using the `skim()` function

```{r}
skim(starwars)
```

## Summary

```{r}
summary(favorability)
```

## `left_join()`

[Joins](https://dplyr.tidyverse.org/articles/two-table.html) or merges are part of the`dplyr` library.

```{r}
starwars %>% 
  left_join(favorability, by = "name") %>% 
  select(name, fav_rating, everything()) %>% 
  arrange(-fav_rating)
```

## Pivot

-   `pivot_longer()`
-   `pivot_wider()`

```{r}
relig_income %>%
  pivot_longer(!religion, names_to = "income", values_to = "count")
```

## Interactive visualization

from the `plotly` library

```{r}
ggplotly(plot1)
```

## Regression / models

Predict mass from height after eliminating Jabba from the data set. Here we'll use primarily base R, `moderndive` for model outputs, and tidyverse for the pipe `%>%` and `dplyr` for data transformations. Plus, alternatively, the `broom` library to manipulate models.

```{r}
my_model <- lm(mass ~ height, data = starwars %>% filter(mass < 500))
broom::tidy(my_model)
broom::glance(my_model)
```

### Visualize regression

`mass` over `height` with a fitted linear regression line and confidence interval using `geom_smooth()`

```{r}
#| warning: false
#| message: false
starwars %>% 
  filter(mass < 500) %>%
  ggplot(aes(height, mass)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
```

## Render reports

By changing the argument in the YAML header, you can render many report styles. A few popular examples include HTML, PDF, or MS Word and Power Point documents; Websites; slide-deck presentations; Books, and Interactivity. See more at the [comprehensive guide to report outputs via quarto](https://quarto.org/docs/guide/).