small edits

mgimond · Apr 17, 2024 · b16a2d3 · b16a2d3
1 parent a3f7be5
commit b16a2d3
Show file tree

Hide file tree

Showing 9 changed files with 13 additions and 13 deletions.
diff --git a/bivariate.qmd b/bivariate.qmd
@@ -118,7 +118,7 @@ Non-parametric fit applies to the family of fitting strategies that *do not* imp
 
 #### Loess
 
-A flexible curve fitting option is the **loess**  curve (short for **lo**cal regr**ess**ion; also known as the *local weighted regression*). Unlike the parametric approach to fitting a curve, the loess does **not** impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the *smooth* curve. The range of x-values that contribute to each localized regression lines is defined by the $\alpha$ parameter which usually ranges from 0.2 to 1. The larger the $\alpha$ value, the smoother the curve. The other parameter that defines a loess curve is $\lambda$: it defines the polynomial order of the localized regression line. This is usually set to 1 (though `ggplot2`'s implementation of the loess defaults to a 2^nd^ order polynomial).
+A flexible curve fitting option is the **loess**  curve (short for **lo**cal regr**ess**ion; also known as the *local weighted regression*). Unlike the parametric approach to fitting a curve, the loess does **not** impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the *smooth* curve. The range of x-values that contribute to each localized regression lines is defined by the **span** parameter,  $\alpha$,  which usually ranges from 0.2 to 1 (but, it can be greater than 1 for smaller datasets). The larger the $\alpha$ value, the smoother the curve. The other parameter that defines a loess curve is $\lambda$: it defines the **polynomial order** of the localized regression line. This is usually set to 1 (though `ggplot2`'s implementation of the loess defaults to a 2^nd^ order polynomial).
 
 ```{r echo=FALSE}
 library(dplyr)
@@ -297,7 +297,7 @@ ggplot(df, aes(x = area, y = residuals)) + geom_point() +
 
 ```
 
-We are interested in identifying any pattern in the residuals. If the model does a good job in fitting the data, the points should be uniformly distributed across the plot and the loess fit should approximate a horizontal line. With the linear model `M`, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show *dependence* on the x values. 
+We are interested in identifying any pattern in the residuals. **If the model does a good job in fitting the data, the points should be uniformly distributed across the plot** and the loess fit should approximate a horizontal line. With the linear model `M`, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show *dependence* on the x values. 
 
 Next, we'll look at the residuals from the second order polynomial model `M2`.
 
@@ -334,7 +334,7 @@ where $\varepsilon$ is a constant that does not vary as a function of varying $x
 
 ### Spread-location plot
 
-The `M2` and `lo` models do a good job in eliminating any dependence between residual and x-value. Next, we will check that the residuals do not show a dependence with *fitted* y-values. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted `cp.ratio` values (for a univariate analogy, think of the fitted line as representing a *level* across different segments along the x-axis). We'll generate a spread-level plot of model `M2`'s residuals (note that in the realm of regression analysis, such plot is often referred to as a **scale-location** plot). We'll also add a loess curve to help visualize any patterns in the plot.
+The `M2` and `lo` models do a good job in eliminating any dependence between residual and x-value. Next, we will check that **the residuals do not show a dependence with *fitted* y-values**. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted `cp.ratio` values (for a univariate analogy, think of the fitted line as representing a *level* across different segments along the x-axis). We'll generate a spread-level plot of model `M2`'s residuals (note that in the realm of regression analysis, such plot is often referred to as a **scale-location** plot). We'll also add a loess curve to help visualize any patterns in the plot.
 
 ```{r fig.height=2.5, fig.width=2.5, small.mar=TRUE}
 sl2 <- data.frame( std.res = sqrt(abs(residuals(M2))), 
@@ -345,12 +345,12 @@ ggplot(sl2, aes(x = fit, y  =std.res)) + geom_point() +
                           method.args = list(degree = 1) )
 ```
 
-The function `predict()` extracts the y-values from the fitted model `M2` and is plotted along the x-axis. It's clear from this plot that the residuals are not homogeneous; they increase as a function of increasing *fitted* CP ratio. The "bend" observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may want to increase the loess span.
+The function `predict()` extracts the y-values from the fitted model `M2` and is plotted along the x-axis. It's clear from this plot that the residuals are not homogeneous; they increase as a function of increasing *fitted* CP ratio. The "bend" observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may therefore want to increase the loess span by setting `span = 2`.
 
 
 ```{r fig.height=2.5, fig.width=2.5, small.mar=TRUE}
 ggplot(sl2, aes(x = fit, y = std.res)) + geom_point() +
-              stat_smooth(method = "loess", se = FALSE, span = 1.5, 
+              stat_smooth(method = "loess", se = FALSE, span = 2, 
                           method.args = list(degree = 1) )
 ```
 

diff --git a/bivariate_cache/html/unnamed-chunk-25_015ed87426b2e1459bdf2eaf4001adcd.RData b/bivariate_cache/html/unnamed-chunk-25_015ed87426b2e1459bdf2eaf4001adcd.RData
diff --git a/bivariate_cache/html/unnamed-chunk-25_580c4e09e0751ca71793d95540cf5875.RData b/bivariate_cache/html/unnamed-chunk-25_580c4e09e0751ca71793d95540cf5875.RData
diff --git a/...k-25_015ed87426b2e1459bdf2eaf4001adcd.rdb → ...k-25_580c4e09e0751ca71793d95540cf5875.rdb b/...k-25_015ed87426b2e1459bdf2eaf4001adcd.rdb → ...k-25_580c4e09e0751ca71793d95540cf5875.rdb
diff --git a/...k-25_015ed87426b2e1459bdf2eaf4001adcd.rdx → ...k-25_580c4e09e0751ca71793d95540cf5875.rdx b/...k-25_015ed87426b2e1459bdf2eaf4001adcd.rdx → ...k-25_580c4e09e0751ca71793d95540cf5875.rdx
diff --git a/bivariate_files/figure-html/unnamed-chunk-25-1.png b/bivariate_files/figure-html/unnamed-chunk-25-1.png
diff --git a/docs/bivariate.html b/docs/bivariate.html
@@ -606,7 +606,7 @@ <h3 data-number="24.2.2" class="anchored" data-anchor-id="non-parametric-fits"><
 <p>Non-parametric fit applies to the family of fitting strategies that <em>do not</em> impose a structure on the data. Instead, they are designed to let the dataset reveal its inherent structure. One explored in this course is the <em>loess</em> fit.</p>
 <section id="loess" class="level4" data-number="24.2.2.1">
 <h4 data-number="24.2.2.1" class="anchored" data-anchor-id="loess"><span class="header-section-number">24.2.2.1</span> Loess</h4>
-<p>A flexible curve fitting option is the <strong>loess</strong> curve (short for <strong>lo</strong>cal regr<strong>ess</strong>ion; also known as the <em>local weighted regression</em>). Unlike the parametric approach to fitting a curve, the loess does <strong>not</strong> impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the <em>smooth</em> curve. The range of x-values that contribute to each localized regression lines is defined by the <span class="math inline">\(\alpha\)</span> parameter which usually ranges from 0.2 to 1. The larger the <span class="math inline">\(\alpha\)</span> value, the smoother the curve. The other parameter that defines a loess curve is <span class="math inline">\(\lambda\)</span>: it defines the polynomial order of the localized regression line. This is usually set to 1 (though <code>ggplot2</code>’s implementation of the loess defaults to a 2<sup>nd</sup> order polynomial).</p>
+<p>A flexible curve fitting option is the <strong>loess</strong> curve (short for <strong>lo</strong>cal regr<strong>ess</strong>ion; also known as the <em>local weighted regression</em>). Unlike the parametric approach to fitting a curve, the loess does <strong>not</strong> impose a structure on the data. The loess curve fits small segments of a regression lines across the range of x-values, then links the mid-points of these regression lines to generate the <em>smooth</em> curve. The range of x-values that contribute to each localized regression lines is defined by the <strong>span</strong> parameter, <span class="math inline">\(\alpha\)</span>, which usually ranges from 0.2 to 1 (but, it can be greater than 1 for smaller datasets). The larger the <span class="math inline">\(\alpha\)</span> value, the smoother the curve. The other parameter that defines a loess curve is <span class="math inline">\(\lambda\)</span>: it defines the <strong>polynomial order</strong> of the localized regression line. This is usually set to 1 (though <code>ggplot2</code>’s implementation of the loess defaults to a 2<sup>nd</sup> order polynomial).</p>
 </section>
 <section id="how-a-loess-is-constructed" class="level4" data-number="24.2.2.2">
 <h4 data-number="24.2.2.2" class="anchored" data-anchor-id="how-a-loess-is-constructed"><span class="header-section-number">24.2.2.2</span> How a loess is constructed</h4>
@@ -703,7 +703,7 @@ <h3 data-number="24.3.1" class="anchored" data-anchor-id="residual-dependence-pl
 <p><img src="bivariate_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid" width="240"></p>
 </div>
 </div>
-<p>We are interested in identifying any pattern in the residuals. If the model does a good job in fitting the data, the points should be uniformly distributed across the plot and the loess fit should approximate a horizontal line. With the linear model <code>M</code>, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show <em>dependence</em> on the x values.</p>
+<p>We are interested in identifying any pattern in the residuals. <strong>If the model does a good job in fitting the data, the points should be uniformly distributed across the plot</strong> and the loess fit should approximate a horizontal line. With the linear model <code>M</code>, we observe a convex pattern in the residuals suggesting that the linear model is not a good fit. We say that the residuals show <em>dependence</em> on the x values.</p>
 <p>Next, we’ll look at the residuals from the second order polynomial model <code>M2</code>.</p>
 <div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-22_fb96e1aa15975b174d62d20b9081d5bb">
 <div class="sourceCode cell-code" id="cb18"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a>df<span class="sc">$</span>residuals2 <span class="ot">&lt;-</span> <span class="fu">residuals</span>(M2)</span>
@@ -735,7 +735,7 @@ <h3 data-number="24.3.1" class="anchored" data-anchor-id="residual-dependence-pl
 </section>
 <section id="spread-location-plot" class="level3" data-number="24.3.2">
 <h3 data-number="24.3.2" class="anchored" data-anchor-id="spread-location-plot"><span class="header-section-number">24.3.2</span> Spread-location plot</h3>
-<p>The <code>M2</code> and <code>lo</code> models do a good job in eliminating any dependence between residual and x-value. Next, we will check that the residuals do not show a dependence with <em>fitted</em> y-values. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted <code>cp.ratio</code> values (for a univariate analogy, think of the fitted line as representing a <em>level</em> across different segments along the x-axis). We’ll generate a spread-level plot of model <code>M2</code>’s residuals (note that in the realm of regression analysis, such plot is often referred to as a <strong>scale-location</strong> plot). We’ll also add a loess curve to help visualize any patterns in the plot.</p>
+<p>The <code>M2</code> and <code>lo</code> models do a good job in eliminating any dependence between residual and x-value. Next, we will check that <strong>the residuals do not show a dependence with <em>fitted</em> y-values</strong>. This is analogous to univariate analysis where we checked if residuals increased or decreased with increasing medians across categories. Here we will compare residuals to the fitted <code>cp.ratio</code> values (for a univariate analogy, think of the fitted line as representing a <em>level</em> across different segments along the x-axis). We’ll generate a spread-level plot of model <code>M2</code>’s residuals (note that in the realm of regression analysis, such plot is often referred to as a <strong>scale-location</strong> plot). We’ll also add a loess curve to help visualize any patterns in the plot.</p>
 <div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-24_beb1abcfc562cfc3cb631cc257e96e5b">
 <div class="sourceCode cell-code" id="cb20"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>sl2 <span class="ot">&lt;-</span> <span class="fu">data.frame</span>( <span class="at">std.res =</span> <span class="fu">sqrt</span>(<span class="fu">abs</span>(<span class="fu">residuals</span>(M2))), </span>
 <span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>                   <span class="at">fit     =</span> <span class="fu">predict</span>(M2))</span>
@@ -747,10 +747,10 @@ <h3 data-number="24.3.2" class="anchored" data-anchor-id="spread-location-plot">
 <p><img src="bivariate_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid" width="240"></p>
 </div>
 </div>
-<p>The function <code>predict()</code> extracts the y-values from the fitted model <code>M2</code> and is plotted along the x-axis. It’s clear from this plot that the residuals are not homogeneous; they increase as a function of increasing <em>fitted</em> CP ratio. The “bend” observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may want to increase the loess span.</p>
-<div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-25_015ed87426b2e1459bdf2eaf4001adcd">
+<p>The function <code>predict()</code> extracts the y-values from the fitted model <code>M2</code> and is plotted along the x-axis. It’s clear from this plot that the residuals are not homogeneous; they increase as a function of increasing <em>fitted</em> CP ratio. The “bend” observed in the loess curve is most likely due to a single point at the far (right) end of the fitted range. Given that we have a small batch of numbers, a loess can be easily influenced by an outlier. We may therefore want to increase the loess span by setting <code>span = 2</code>.</p>
+<div class="cell" data-small.mar="true" data-hash="bivariate_cache/html/unnamed-chunk-25_580c4e09e0751ca71793d95540cf5875">
 <div class="sourceCode cell-code" id="cb21"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(sl2, <span class="fu">aes</span>(<span class="at">x =</span> fit, <span class="at">y =</span> std.res)) <span class="sc">+</span> <span class="fu">geom_point</span>() <span class="sc">+</span></span>
-<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>              <span class="fu">stat_smooth</span>(<span class="at">method =</span> <span class="st">"loess"</span>, <span class="at">se =</span> <span class="cn">FALSE</span>, <span class="at">span =</span> <span class="fl">1.5</span>, </span>
+<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>              <span class="fu">stat_smooth</span>(<span class="at">method =</span> <span class="st">"loess"</span>, <span class="at">se =</span> <span class="cn">FALSE</span>, <span class="at">span =</span> <span class="dv">2</span>, </span>
 <span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>                          <span class="at">method.args =</span> <span class="fu">list</span>(<span class="at">degree =</span> <span class="dv">1</span>) )</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output-display">
 <p><img src="bivariate_files/figure-html/unnamed-chunk-25-1.png" class="img-fluid" width="240"></p>

diff --git a/docs/bivariate_files/figure-html/unnamed-chunk-25-1.png b/docs/bivariate_files/figure-html/unnamed-chunk-25-1.png