Skip to content

Commit

Permalink
deploy: d0ec114
Browse files Browse the repository at this point in the history
  • Loading branch information
trevorcampbell committed Dec 28, 2023
1 parent a2dc8fd commit 0c9713b
Show file tree
Hide file tree
Showing 12 changed files with 863 additions and 863 deletions.
29 changes: 19 additions & 10 deletions pull341/_sources/classification1.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@ perimeter and concavity variables. Recall that the default palette in `altair`
is colorblind-friendly, so we can stick with that here.

```{code-cell} ipython3
:tags: ["remove-output"]
perim_concav = alt.Chart(cancer).mark_circle().encode(
x=alt.X("Perimeter").title("Perimeter (standardized)"),
y=alt.Y("Concavity").title("Concavity (standardized)"),
Expand All @@ -289,12 +290,16 @@ perim_concav = alt.Chart(cancer).mark_circle().encode(
perim_concav
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:05-scatter", perim_concav)
```

:::{glue:figure} fig:05-scatter
:name: fig:05-scatter
:figclass: caption-hack

Scatter plot of concavity versus perimeter colored by diagnosis label.
```
:::

+++

Expand Down Expand Up @@ -1432,6 +1437,7 @@ The new imbalanced data is shown in {numref}`fig:05-unbalanced`,
and we print the counts of the classes using the `value_counts` function.

```{code-cell} ipython3
:tags: ["remove-output"]
rare_cancer = pd.concat((
cancer[cancer["Class"] == "Benign"],
cancer[cancer["Class"] == "Malignant"].head(3)
Expand All @@ -1445,12 +1451,16 @@ rare_plot = alt.Chart(rare_cancer).mark_circle().encode(
rare_plot
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:05-unbalanced", rare_plot)
```

:::{glue:figure} fig:05-unbalanced
:name: fig:05-unbalanced
:figclass: caption-hack

Imbalanced data.
```
:::

```{code-cell} ipython3
rare_cancer["Class"].value_counts()
Expand Down Expand Up @@ -1947,16 +1957,15 @@ unscaled_plot + prediction_plot
```

```{code-cell} ipython3
:tags: [remove-input]
:tags: [remove-cell]
glue("fig:05-workflow-plot", (unscaled_plot + prediction_plot))
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:::{glue:figure} fig:05-workflow-plot
:name: fig:05-workflow-plot
:figclass: caption-hack

Scatter plot of smoothness versus area where background color indicates the decision of the classifier.
```
:::

+++

Expand Down
105 changes: 69 additions & 36 deletions pull341/_sources/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -645,7 +645,7 @@ was \$`r round(mean(airbnb$price),2)`.
-->

```{code-cell} ipython3
:tags: [remove-input]
:tags: ["remove-cell"]
glue(
"fig:11-example-means5",
Expand Down Expand Up @@ -681,12 +681,12 @@ glue(
)
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:::{glue:figure} fig:11-example-means5
:name: fig:11-example-means5
:figclass: caption-hack

Comparison of population distribution, sample distribution, and sampling distribution.
```
:::


+++

Expand All @@ -699,7 +699,7 @@ sampling distribution of the sample mean. We indicate the mean of the sampling
distribution with a vertical line.

```{code-cell} ipython3
:tags: [remove-input]
:tags: ["remove-cell"]
# Plot sampling distributions for multiple sample sizes
base = alt.Chart(
Expand Down Expand Up @@ -753,12 +753,11 @@ glue(
)
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:::{glue:figure} fig:11-example-means7
:name: fig:11-example-means7
:figclass: caption-hack

Comparison of sampling distributions, with mean highlighted as a vertical line.
```
:::

+++

Expand Down Expand Up @@ -963,8 +962,7 @@ one_sample
```

```{code-cell} ipython3
:tags: []
:tags: ["remove-output"]
one_sample_dist = alt.Chart(one_sample).mark_bar().encode(
x=alt.X("price")
.bin(maxbins=30)
Expand All @@ -975,12 +973,17 @@ one_sample_dist = alt.Chart(one_sample).mark_bar().encode(
one_sample_dist
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:11-bootstrapping1", one_sample_dist)
```

:::{glue:figure} fig:11-bootstrapping1
:name: fig:11-bootstrapping1
:figclass: caption-hack

Histogram of price per night (dollars) for one sample of size 40.
```
:::

+++

Expand All @@ -1002,7 +1005,7 @@ Since we need to sample with replacement when bootstrapping,
we change the `replace` parameter to `True`.

```{code-cell} ipython3
:tags: []
:tags: ["remove-output"]
boot1 = one_sample.sample(frac=1, replace=True)
boot1_dist = alt.Chart(boot1).mark_bar().encode(
Expand All @@ -1015,12 +1018,17 @@ boot1_dist = alt.Chart(boot1).mark_bar().encode(
boot1_dist
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:11-bootstrapping3", boot1_dist)
```

:::{glue:figure} fig:11-bootstrapping3
:name: fig:11-bootstrapping3
:figclass: caption-hack

Bootstrap distribution.
```
:::

```{code-cell} ipython3
boot1["price"].mean()
Expand Down Expand Up @@ -1055,10 +1063,10 @@ boot20000
Let's take a look at the histograms of the first six replicates of our bootstrap samples.

```{code-cell} ipython3
:tags: []
:tags: ["remove-output"]
six_bootstrap_samples = boot20000.query("replicate < 6")
alt.Chart(six_bootstrap_samples, height=150).mark_bar().encode(
six_bootstrap_fig = alt.Chart(six_bootstrap_samples, height=150).mark_bar().encode(
x=alt.X("price")
.bin(maxbins=20)
.title("Price per night (dollars)"),
Expand All @@ -1067,14 +1075,20 @@ alt.Chart(six_bootstrap_samples, height=150).mark_bar().encode(
"replicate:N", # Recall that `:N` converts the variable to a categorical type
columns=2
)
six_bootstrap_fig
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:11-bootstrapping-six-bootstrap-samples", six_bootstrap_fig)
```

:::{glue:figure} fig:11-bootstrapping-six-bootstrap-samples
:name: fig:11-bootstrapping-six-bootstrap-samples
:figclass: caption-hack

Histograms of the first six replicates of the bootstrap samples.
```
:::

+++

Expand Down Expand Up @@ -1125,7 +1139,7 @@ boot20000_means
```

```{code-cell} ipython3
:tags: []
:tags: ["remove-output"]
boot_est_dist = alt.Chart(boot20000_means).mark_bar().encode(
x=alt.X("mean_price")
Expand All @@ -1137,23 +1151,28 @@ boot_est_dist = alt.Chart(boot20000_means).mark_bar().encode(
boot_est_dist
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:11-bootstrapping5", boot_est_dist)
```

:::{glue:figure} fig:11-bootstrapping5
:name: fig:11-bootstrapping5
:figclass: caption-hack

Distribution of the bootstrap sample means.
```
:::

+++

Let's compare the bootstrap distribution&mdash;which we construct by taking many samples from our original sample of size 40&mdash;with
the true sampling distribution&mdash;which corresponds to taking many samples from the population.

```{code-cell} ipython3
:tags: [remove-input]
:tags: [remove-cell]
sampling_distribution.encoding.x["bin"]["extent"] = (90, 250)
alt.vconcat(
bootstr6fig = alt.vconcat(
alt.layer(
sampling_distribution,
alt.Chart(sample_estimates).mark_rule(color="black", size=1.5, strokeDash=[6]).encode(x="mean(mean_price)"),
Expand All @@ -1175,12 +1194,19 @@ alt.vconcat(
)
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:11-bootstrapping6", bootstr6fig)
```

:::{glue:figure} fig:11-bootstrapping6
:name: fig:11-bootstrapping6
:figclass: caption-hack

Comparison of the distribution of the bootstrap sample means and sampling distribution.
```
:::



```{code-cell} ipython3
:tags: [remove-cell]
Expand Down Expand Up @@ -1277,7 +1303,7 @@ the middle 95\% of the sample mean prices in the bootstrap distribution. We can
visualize the interval on our distribution in {numref}`fig:11-bootstrapping9`.

```{code-cell} ipython3
:tags: [remove-input]
:tags: [remove-cell]
# Create the annotation for for the 2.5th percentile
rule_025 = alt.Chart().mark_rule(color="black", size=1.5, strokeDash=[6]).encode(
x=alt.datum(ci_bounds[0.025])
Expand All @@ -1301,15 +1327,22 @@ text_975 = text_025.encode(
rule_975 = rule_025.encode(x=alt.datum(ci_bounds[0.975]))
# Layer the annotations on top of the distribution plot
boot_est_dist + rule_025 + text_025 + rule_975 + text_975
bootstr9fig = boot_est_dist + rule_025 + text_025 + rule_975 + text_975
```

```{code-cell} ipython3
:tags: ["remove-cell"]
glue("fig:11-bootstrapping9", bootstr9fig)
```

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:::{glue:figure} fig:11-bootstrapping9
:name: fig:11-bootstrapping9
:figclass: caption-hack

Distribution of the bootstrap sample means with percentile lower and upper bounds.
```
:::



+++

Expand Down
235 changes: 112 additions & 123 deletions pull341/classification1.html

Large diffs are not rendered by default.

156 changes: 78 additions & 78 deletions pull341/classification2.html

Large diffs are not rendered by default.

Loading

0 comments on commit 0c9713b

Please sign in to comment.