Skip to content

Commit

Permalink
small edits
Browse files Browse the repository at this point in the history
  • Loading branch information
mgimond committed Apr 17, 2024
1 parent a3f7be5 commit b16a2d3
Show file tree
Hide file tree
Showing 9 changed files with 13 additions and 13 deletions.
10 changes: 5 additions & 5 deletions bivariate.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ Non-parametric fit applies to the family of fitting strategies that *do not* imp

#### Loess

A flexible curve fitting option is the **loess** curve (short for **lo**cal regr**ess**ion; also known as the *local weighted regression*). Unlike the parametric approach to fitting a curve, the loess does **not** impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the *smooth* curve. The range of x-values that contribute to each localized regression lines is defined by the $\alpha$ parameter which usually ranges from 0.2 to 1. The larger the $\alpha$ value, the smoother the curve. The other parameter that defines a loess curve is $\lambda$: it defines the polynomial order of the localized regression line. This is usually set to 1 (though `ggplot2`'s implementation of the loess defaults to a 2^nd^ order polynomial).
A flexible curve fitting option is the **loess** curve (short for **lo**cal regr**ess**ion; also known as the *local weighted regression*). Unlike the parametric approach to fitting a curve, the loess does **not** impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the *smooth* curve. The range of x-values that contribute to each localized regression lines is defined by the **span** parameter, $\alpha$, which usually ranges from 0.2 to 1 (but, it can be greater than 1 for smaller datasets). The larger the $\alpha$ value, the smoother the curve. The other parameter that defines a loess curve is $\lambda$: it defines the **polynomial order** of the localized regression line. This is usually set to 1 (though `ggplot2`'s implementation of the loess defaults to a 2^nd^ order polynomial).

```{r echo=FALSE}
library(dplyr)
Expand Down Expand Up @@ -297,7 +297,7 @@ ggplot(df, aes(x = area, y = residuals)) + geom_point() +
```

We are interested in identifying any pattern in the residuals. If the model does a good job in fitting the data, the points should be uniformly distributed across the plot and the loess fit should approximate a horizontal line. With the linear model `M`, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show *dependence* on the x values.
We are interested in identifying any pattern in the residuals. **If the model does a good job in fitting the data, the points should be uniformly distributed across the plot** and the loess fit should approximate a horizontal line. With the linear model `M`, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show *dependence* on the x values.

Next, we'll look at the residuals from the second order polynomial model `M2`.

Expand Down Expand Up @@ -334,7 +334,7 @@ where $\varepsilon$ is a constant that does not vary as a function of varying $x

### Spread-location plot

The `M2` and `lo` models do a good job in eliminating any dependence between residual and x-value. Next, we will check that the residuals do not show a dependence with *fitted* y-values. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted `cp.ratio` values (for a univariate analogy, think of the fitted line as representing a *level* across different segments along the x-axis). We'll generate a spread-level plot of model `M2`'s residuals (note that in the realm of regression analysis, such plot is often referred to as a **scale-location** plot). We'll also add a loess curve to help visualize any patterns in the plot.
The `M2` and `lo` models do a good job in eliminating any dependence between residual and x-value. Next, we will check that **the residuals do not show a dependence with *fitted* y-values**. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted `cp.ratio` values (for a univariate analogy, think of the fitted line as representing a *level* across different segments along the x-axis). We'll generate a spread-level plot of model `M2`'s residuals (note that in the realm of regression analysis, such plot is often referred to as a **scale-location** plot). We'll also add a loess curve to help visualize any patterns in the plot.

```{r fig.height=2.5, fig.width=2.5, small.mar=TRUE}
sl2 <- data.frame( std.res = sqrt(abs(residuals(M2))),
Expand All @@ -345,12 +345,12 @@ ggplot(sl2, aes(x = fit, y =std.res)) + geom_point() +
method.args = list(degree = 1) )
```

The function `predict()` extracts the y-values from the fitted model `M2` and is plotted along the x-axis. It's clear from this plot that the residuals are not homogeneous; they increase as a function of increasing *fitted* CP ratio. The "bend" observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may want to increase the loess span.
The function `predict()` extracts the y-values from the fitted model `M2` and is plotted along the x-axis. It's clear from this plot that the residuals are not homogeneous; they increase as a function of increasing *fitted* CP ratio. The "bend" observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may therefore want to increase the loess span by setting `span = 2`.


```{r fig.height=2.5, fig.width=2.5, small.mar=TRUE}
ggplot(sl2, aes(x = fit, y = std.res)) + geom_point() +
stat_smooth(method = "loess", se = FALSE, span = 1.5,
stat_smooth(method = "loess", se = FALSE, span = 2,
method.args = list(degree = 1) )
```

Expand Down
Binary file not shown.
Binary file not shown.
Binary file modified bivariate_files/figure-html/unnamed-chunk-25-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions docs/bivariate.html
Original file line number Diff line number Diff line change
Expand Up @@ -606,7 +606,7 @@ <h3 data-number="24.2.2" class="anchored" data-anchor-id="non-parametric-fits"><
<p>Non-parametric fit applies to the family of fitting strategies that <em>do not</em> impose a structure on the data. Instead, they are designed to let the dataset reveal its inherent structure. One explored in this course is the <em>loess</em> fit.</p>
<section id="loess" class="level4" data-number="24.2.2.1">
<h4 data-number="24.2.2.1" class="anchored" data-anchor-id="loess"><span class="header-section-number">24.2.2.1</span> Loess</h4>
<p>A flexible curve fitting option is the <strong>loess</strong> curve (short for <strong>lo</strong>cal regr<strong>ess</strong>ion; also known as the <em>local weighted regression</em>). Unlike the parametric approach to fitting a curve, the loess does <strong>not</strong> impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the <em>smooth</em> curve. The range of x-values that contribute to each localized regression lines is defined by the <span class="math inline">\(\alpha\)</span> parameter which usually ranges from 0.2 to 1. The larger the <span class="math inline">\(\alpha\)</span> value, the smoother the curve. The other parameter that defines a loess curve is <span class="math inline">\(\lambda\)</span>: it defines the polynomial order of the localized regression line. This is usually set to 1 (though <code>ggplot2</code>’s implementation of the loess defaults to a 2<sup>nd</sup> order polynomial).</p>
<p>A flexible curve fitting option is the <strong>loess</strong> curve (short for <strong>lo</strong>cal regr<strong>ess</strong>ion; also known as the <em>local weighted regression</em>). Unlike the parametric approach to fitting a curve, the loess does <strong>not</strong> impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the <em>smooth</em> curve. The range of x-values that contribute to each localized regression lines is defined by the <strong>span</strong> parameter, <span class="math inline">\(\alpha\)</span>, which usually ranges from 0.2 to 1 (but, it can be greater than 1 for smaller datasets). The larger the <span class="math inline">\(\alpha\)</span> value, the smoother the curve. The other parameter that defines a loess curve is <span class="math inline">\(\lambda\)</span>: it defines the <strong>polynomial order</strong> of the localized regression line. This is usually set to 1 (though <code>ggplot2</code>’s implementation of the loess defaults to a 2<sup>nd</sup> order polynomial).</p>
</section>
<section id="how-a-loess-is-constructed" class="level4" data-number="24.2.2.2">
<h4 data-number="24.2.2.2" class="anchored" data-anchor-id="how-a-loess-is-constructed"><span class="header-section-number">24.2.2.2</span> How a loess is constructed</h4>
Expand Down Expand Up @@ -703,7 +703,7 @@ <h3 data-number="24.3.1" class="anchored" data-anchor-id="residual-dependence-pl
<p><img src="bivariate_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid" width="240"></p>
</div>
</div>
<p>We are interested in identifying any pattern in the residuals. If the model does a good job in fitting the data, the points should be uniformly distributed across the plot and the loess fit should approximate a horizontal line. With the linear model <code>M</code>, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show <em>dependence</em> on the x values.</p>
<p>We are interested in identifying any pattern in the residuals. <strong>If the model does a good job in fitting the data, the points should be uniformly distributed across the plot</strong> and the loess fit should approximate a horizontal line. With the linear model <code>M</code>, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show <em>dependence</em> on the x values.</p>
<p>Next, we’ll look at the residuals from the second order polynomial model <code>M2</code>.</p>
<div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-22_fb96e1aa15975b174d62d20b9081d5bb">
<div class="sourceCode cell-code" id="cb18"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a>df<span class="sc">$</span>residuals2 <span class="ot">&lt;-</span> <span class="fu">residuals</span>(M2)</span>
Expand Down Expand Up @@ -735,7 +735,7 @@ <h3 data-number="24.3.1" class="anchored" data-anchor-id="residual-dependence-pl
</section>
<section id="spread-location-plot" class="level3" data-number="24.3.2">
<h3 data-number="24.3.2" class="anchored" data-anchor-id="spread-location-plot"><span class="header-section-number">24.3.2</span> Spread-location plot</h3>
<p>The <code>M2</code> and <code>lo</code> models do a good job in eliminating any dependence between residual and x-value. Next, we will check that the residuals do not show a dependence with <em>fitted</em> y-values. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted <code>cp.ratio</code> values (for a univariate analogy, think of the fitted line as representing a <em>level</em> across different segments along the x-axis). We’ll generate a spread-level plot of model <code>M2</code>’s residuals (note that in the realm of regression analysis, such plot is often referred to as a <strong>scale-location</strong> plot). We’ll also add a loess curve to help visualize any patterns in the plot.</p>
<p>The <code>M2</code> and <code>lo</code> models do a good job in eliminating any dependence between residual and x-value. Next, we will check that <strong>the residuals do not show a dependence with <em>fitted</em> y-values</strong>. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted <code>cp.ratio</code> values (for a univariate analogy, think of the fitted line as representing a <em>level</em> across different segments along the x-axis). We’ll generate a spread-level plot of model <code>M2</code>’s residuals (note that in the realm of regression analysis, such plot is often referred to as a <strong>scale-location</strong> plot). We’ll also add a loess curve to help visualize any patterns in the plot.</p>
<div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-24_beb1abcfc562cfc3cb631cc257e96e5b">
<div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>sl2 <span class="ot">&lt;-</span> <span class="fu">data.frame</span>( <span class="at">std.res =</span> <span class="fu">sqrt</span>(<span class="fu">abs</span>(<span class="fu">residuals</span>(M2))), </span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a> <span class="at">fit =</span> <span class="fu">predict</span>(M2))</span>
Expand All @@ -747,10 +747,10 @@ <h3 data-number="24.3.2" class="anchored" data-anchor-id="spread-location-plot">
<p><img src="bivariate_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid" width="240"></p>
</div>
</div>
<p>The function <code>predict()</code> extracts the y-values from the fitted model <code>M2</code> and is plotted along the x-axis. It’s clear from this plot that the residuals are not homogeneous; they increase as a function of increasing <em>fitted</em> CP ratio. The “bend” observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may want to increase the loess span.</p>
<div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-25_015ed87426b2e1459bdf2eaf4001adcd">
<p>The function <code>predict()</code> extracts the y-values from the fitted model <code>M2</code> and is plotted along the x-axis. It’s clear from this plot that the residuals are not homogeneous; they increase as a function of increasing <em>fitted</em> CP ratio. The “bend” observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may therefore want to increase the loess span by setting <code>span = 2</code>.</p>
<div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-25_580c4e09e0751ca71793d95540cf5875">
<div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(sl2, <span class="fu">aes</span>(<span class="at">x =</span> fit, <span class="at">y =</span> std.res)) <span class="sc">+</span> <span class="fu">geom_point</span>() <span class="sc">+</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">stat_smooth</span>(<span class="at">method =</span> <span class="st">"loess"</span>, <span class="at">se =</span> <span class="cn">FALSE</span>, <span class="at">span =</span> <span class="fl">1.5</span>, </span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">stat_smooth</span>(<span class="at">method =</span> <span class="st">"loess"</span>, <span class="at">se =</span> <span class="cn">FALSE</span>, <span class="at">span =</span> <span class="dv">2</span>, </span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> <span class="at">method.args =</span> <span class="fu">list</span>(<span class="at">degree =</span> <span class="dv">1</span>) )</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output-display">
<p><img src="bivariate_files/figure-html/unnamed-chunk-25-1.png" class="img-fluid" width="240"></p>
Expand Down
Binary file modified docs/bivariate_files/figure-html/unnamed-chunk-25-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b16a2d3

Please sign in to comment.