Skip to content

Commit

Permalink
Some final edits in supplemental.
Browse files Browse the repository at this point in the history
  • Loading branch information
sampottinger committed Dec 18, 2024
1 parent 5c878ee commit 39bcc03
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions paper/supplemental.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,16 +80,16 @@ Further investigation finds that that a minority population of neighborhoods cau
We offer additional information about the specific neural network configuration chosen.

### Input vector
As this empirically leads to generally better performance, we allow the model to use the count of growing condition estimations. This may serve as a possible measure of uncertainty. We also allow inclusion of the year. However, as can be executed in our open source pipeline, we find that including absolute year generally increases overfitting. Therefore, we use a relative measure (years since the start of the series within the simulations).
Empirically leading to generally better performance, we allow the model to use the count of growing condition estimations. This may serve as a possible measure of uncertainty. We also allow inclusion of the year. However, as can be executed in our open source pipeline, we find that including absolute year generally increases overfitting. Therefore, we use a relative measure (years since the start of the series within the simulations). Our simulations run for 17 relative years for each series.

### Included years and areas
To further document how we structure our consideration of timeseries variables, we emphasize that we sample for 17 individual years in the 2030 CHC-CMIP6 series and 17 individual years in 2050 CHC-CMIP6 series. Importantly, projections in these series are not necessarily intended as specific predictions in specific years. We do not provide a year by year timeseries for this reason. Instead, our analysis produces distributions of anticipated outcomes at the 2030 and 2050 timeframes. Note that our choice to create these two series follows a similar structure to CHC-CMIP6. Finally, note that many growers engage in even simple crop rotations so the effective average crop yield for a field used to define yield expectations may span 10 crop years but possibly more than 10 consecutive calendar years.
To further document how we structure our consideration of timeseries variables, we emphasize that we sample for 17 individual years in the 2030 CHC-CMIP6 series and 17 individual years in 2050 CHC-CMIP6 series. Importantly, projections in these series are not necessarily intended as specific predictions in specific years. We do not provide a year by year timeseries for this reason. Instead, our analysis produces distributions of anticipated outcomes at the 2030 and 2050 timeframes. Note that our choice to create these two series follows a similar structure to CHC-CMIP6. Finally, note that many growers engage in even simple crop rotations so the effective average crop yield for a field used to define yield expectations may span 10 crop years but possibly more than 10 consecutive calendar years. This is reflected in Monte Carlo sampling.

### Instance weight
We document that we build our model with instance weighting. Specifically, we use the number (not value) of SCYM pixels in a neighborhood to weight each neighborhood. In other words, the weight is higher in neighborhoods with more maize growing acreage.

### Error and residuals
Table @tbl:retrain provides mean absolute error for the selected model from the sweep. A drop in error observed from validation to test with retrain^[Test with retrain specifically refers to retraining a model from scratch using the model configuration selected from our hyper-parameter sweep across both training and validation data together. In both the "with retrain" and "without retrain" cases, the test set remains fully hidden.] performance may be explained by the increased training set size. This may indicate that the model is specifically data constrained by the number of years available for training. Our open source data pipeline can and will be used to rerun analysis as input datasets are updated to include additional years in the future.
Table @tbl:retrain provides mean absolute error for the selected model from the sweep. A drop in error observed from validation to test with retrain^[Test with retrain specifically refers to retraining a model from scratch using the model configuration selected from our hyper-parameter sweep. This training spans across both training and validation data together. In both the "with retrain" and "without retrain" cases, the test set remains fully hidden.] performance may be explained by the increased training set size. This may indicate that the model is specifically data constrained by the number of years available for training. Our open source data pipeline can and will be used to rerun analysis as input datasets are updated to include additional years in the future.

| **Set** | **MAE for Mean Prediction** | **MAE for Std Prediction** |
| -------------------- | ----------------------- | ---------------------- |
Expand Down

0 comments on commit 39bcc03

Please sign in to comment.