-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathp_data_on_display_univariate.qmd
347 lines (240 loc) · 9.1 KB
/
p_data_on_display_univariate.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
---
title: 'Univariate Graphs with Plotly Express'
---
## Intro
In this lesson, you'll learn how to create univariate graphs using Plotly Express. Univariate graphs are essential for understanding the distribution of a single variable, whether it's categorical or quantitative.
Let's get started!
## Learning objectives
- Create bar charts, pie charts, and treemaps for categorical data using Plotly Express
- Generate histograms for quantitative data using Plotly Express
- Customize graph appearance and labels
## Imports
This lesson requires plotly.express, pandas, and vega_datasets. Install them if you haven't already.
```{python}
import plotly.express as px
import pandas as pd
from vega_datasets import data
```
## Quantitative Data
### Histogram
Histograms are used to visualize the distribution of continuous variables.
Let's make a histogram of the tip amounts in the tips dataset.
```{python}
tips = px.data.tips()
tips.head() # view the first 5 rows
```
```{python}
px.histogram(tips, x='tip')
```
We can see that the highest bar, corresponding to tips between 1.75 and 2.24, has a frequency of 55. This means that there were 55 tips between 1.75 and 2.24.
::: {.callout-note title='Side-note'}
Notice that plotly charts are interactive. You can hover over the bars to see the exact number of tips in each bin.
Try playing with the buttons at the top right. The button to download the chart as a png is especially useful.
:::
::: {.callout-tip title='Practice'}
### Practice Q: Speed Distribution Histogram
Following the example of the histogram of tips, create a histogram of the speed distribution (Speed_IAS_in_knots) using the birdstrikes dataset.
```{python}
birdstrikes = data.birdstrikes()
birdstrikes.head()
# Your code here
```
```{python}
#| include: false
# Solution
px.histogram(birdstrikes, x='Speed_IAS_in_knots')
```
:::
We can view the help documentation for the function by typing `px.histogram?` in a cell and running it.
```{python}
px.histogram?
```
From the help documentation, we can see that the `px.histogram` function has many arguments that we can use to customize the graph.
Let's make the histogram a bit nicer by adding a title, customizing the x axis label, and changing the color.
```{python}
px.histogram(
tips,
x="tip",
labels={"tip": "Tip Amount ($)"},
title="Distribution of Tips",
color_discrete_sequence=["lightseagreen"]
)
```
Color names are based on standard CSS color naming from Mozilla. You can see the full list [here](https://developer.mozilla.org/en-US/docs/Web/CSS/named-color).
Alternatively, you can use hex color codes, like `#1f77b4`. You can get these easily by using a color picker. Search for "color picker" on Google.
```{python}
px.histogram(
tips,
x="tip",
labels={"tip": "Tip Amount ($)"},
title="Distribution of Tips",
color_discrete_sequence=["#6a5acd"]
)
```
::: {.callout-tip title='Practice'}
### Practice Q: Bird Strikes Histogram Custom
Update your birdstrikes histogram to use a hex code color, add a title, and change the x-axis label to "Speed (Nautical Miles Per Hour)".
```{python}
# Your code here
```
```{python}
#| include: false
# Solution
px.histogram(
birdstrikes,
x="Speed_IAS_in_knots",
labels={"Speed_IAS_in_knots": "Speed (Nautical Miles Per Hour)"},
title="Distribution of Bird Strike Speeds",
color_discrete_sequence=["#4B0082"] # Indigo color
)
```
:::
## Categorical Data
### Bar chart
Bar charts can be used to display the frequency of a single categorical variable.
Plotly has a `px.bar` function that we will see later. But for **single categorical variables**, the function plotly wants you to use is actually `px.histogram`. (Statisticians everywhere are crying; histograms are supposed to be used for just quantitative data!)
Let's create a basic bar chart showing the distribution of sex in the tips dataset:
```{python}
px.histogram(tips, x='sex')
```
Let's add counts to the bars.
```{python}
px.histogram(tips, x='sex', text_auto= True)
```
We can enhance the chart by adding a color axis, and customizing the labels and title.
```{python}
px.histogram(tips, x='sex', text_auto=True, color='sex',
labels={'sex': 'Gender'},
title='Distribution of Customers by Gender')
```
Arguably, in this plot, we do not need the `color` axis, since the `sex` variable is already represented by the x axis. But public audiences like colors, so it may still be worth including.
However, we should remove the legend. Let's also use custom colors.
For this, we can first create a figure object, then use the `.layout.update` method from that object to update the legend.
```{python}
tips_by_sex = px.histogram(
tips,
x="sex",
text_auto=True,
color="sex",
labels={"sex": "Gender"},
title="Distribution of Customers by Gender",
color_discrete_sequence=["#1f77b4", "#ff7f0e"],
)
tips_by_sex.update_layout(showlegend=False)
```
::: {.callout-tip title='Practice'}
### Practice Q: Bird Strikes by Phase of Flight
Create a bar chart showing the frequency of bird strikes by the phase of flight, `When__Phase_of_flight`. Add appropriate labels and a title. Use colors of your choice, and remove the legend.
```{python}
# Your code here
```
:::
```{python}
# | include: false
# Solution
fig = px.histogram(
birdstrikes,
x="When__Phase_of_flight",
text_auto=True,
color="When__Phase_of_flight",
labels={"When__Phase_of_flight": "Phase of Flight"},
title="Bird Strikes by Phase of Flight",
color_discrete_sequence=px.colors.qualitative.Set3,
)
fig.update_layout(showlegend=False)
```
:::
#### Sorting categories
It is sometimes useful to dictate a specific order for the categories in a bar chart.
Consider this bar chart of the election winners by district in the 2013 Montreal mayoral election.
```{python}
election = px.data.election()
election.head()
```
```{python}
px.histogram(election, x='winner')
```
Let's define a custom order for the categories. "Bergeron" will be first, then "Joly" then "Coderre".
```{python}
custom_order = ["Bergeron", "Joly", "Coderre"]
election_chart = px.histogram(election, x='winner', category_orders={'winner': custom_order})
election_chart
```
We can also sort the categories by frequency.
We can sort the categories by frequency using the `categoryorder` attribute of the x axis.
```{python}
election_chart = px.histogram(election, x="winner")
election_chart.update_xaxes(categoryorder="total descending")
```
Or in ascending order:
```{python}
election_chart = px.histogram(election, x="winner")
election_chart.update_xaxes(categoryorder="total ascending")
```
::: {.callout-tip title='Practice'}
### Practice Q: Sorted Origin State Bar Chart
Create a sorted bar chart showing the distribution of bird strikes by origin state. Sort the bars in descending order of frequency.
```{python}
# Your code here
```
```{python}
#| include: false
# Solution
fig = px.histogram(birdstrikes, x="Origin_State")
fig.update_xaxes(categoryorder="total descending")
```
:::
### Horizontal bar chart
When you have many categories, horizontal bar charts are often easier to read than vertical bar charts. To make a horizontal bar chart, simply use the `y` axis instead of the `x` axis.
```{python}
px.histogram(tips, y='day')
```
::: {.callout-tip title='Practice'}
### Practice Q: Horizontal Bar Chart of Origin State
Create a horizontal bar chart showing the distribution of bird strikes by origin state.
```{python}
# Your code here
```
```{python}
#| include: false
# Solution
px.histogram(birdstrikes, y="Origin_State")
```
:::
### Pie chart
Pie charts are also useful for showing the proportion of categorical variables. They are best used when you have a *small* number of categories. For larger numbers of categories, pie charts are hard to read.
Let's make a pie chart of the distribution of tips by day of the week.
```{python}
px.pie(tips, names="day")
```
We can add labels to the pie chart to make it easier to read.
```{python}
tips_by_day = px.pie(tips, names="day")
tips_by_day_with_labels = tips_by_day.update_traces(textposition="inside", textinfo="percent+label")
tips_by_day_with_labels
```
The legend is no longer needed, so we can remove it.
```{python}
tips_by_day_with_labels.update_layout(showlegend=False)
```
::: {.callout-note title='Pro Tip'}
If you forget how to make simple changes like this, don't hesitate to consult the plotly documentation, Google or ChatGPT.
:::
::: {.callout-tip title='Practice'}
### Practice Q: Wildlife Size Pie Chart
Create a pie chart showing the distribution of bird strikes by wildlife size. Include percentages and labels inside the pie slices.
```{python}
# Your code here
```
:::
```{python}
# | include: false
# Solution
fig = px.pie(birdstrikes, names="Wildlife__Size")
fig.update_traces(textposition="inside", textinfo="percent+label")
fig.update_layout(showlegend=False)
```
:::
## Summary
In this lesson, you learned how to create univariate graphs using Plotly Express. You should now feel confident in your ability to create bar charts, pie charts, and histograms. You should also feel comfortable customizing the appearance of your graphs.
See you in the next lesson.