-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor SoilGrids functions to handle multiple properties and … #3455
base: develop
Are you sure you want to change the base?
feat: refactor SoilGrids functions to handle multiple properties and … #3455
Conversation
…improve efficiency
I made changes to refactor SoilGrids functions for improved efficiency and scalability; please take a look,i think this works for you. |
@DongchenZ @Qianyuxuan Can you review this PR and I think it solves the mentioned issue.Just i want you to go through it once and let me know if you feel any changes required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR improves the ability to extract soil properties other than SOC. However, I worry that those functions (which heavily rely on the terra
package) will be limited to broader spatial extents (e.g., North America, Asia, Africa, etc.) where the computation demand and the memory usage will be unrealistic high. Given that, I think it will be helpful to have a prior check on the volume of data requested (e.g., if the size of the data is too large, then never try it and just return an error message).
For larger spatial extents, it would be more efficient to download the tiled SoilGrids data directly (I usually use the wget
command) and merge the tiled images using the gdalwarp
command.
#' @param local_raster Optional. The file path of a local raster to query. If NULL, queries the remote VRT. | ||
#' @return A data frame containing the queried data | ||
#' @export | ||
query_soilgrids <- function(points, soil_property, depth, quantile, local_raster = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my previous comments for projecting the points from lat/lon degrees to the SoilGrids coordinate system.
dplyr::summarize( | ||
mean = mean(value, na.rm = TRUE), | ||
sd = sd(value, na.rm = TRUE), | ||
quantile_05 = quantile(value, probs = quantiles[1], na.rm = TRUE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not have those hardcoded names for different quantiles. Instead, you might consider calculating them first and naming them afterward based on the quantile we choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'll adjust the code to calculate and name the quantiles dynamically
#' @param quantiles A vector of quantiles to estimate uncertainties (default: c(0.05, 0.5, 0.95)) | ||
#' @return A data frame containing the uncertainties | ||
#' @export | ||
soilgrids_uncertainty <- function(data, quantiles = c(0.05, 0.5, 0.95)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the calculation of uncertainty here is correct or not. Our current calculations will first simulate the uncertainty of soil properties across the vertical profile using the mean
and different quantiles from SoilGrids directly and then summarize the uncertainty once we have a good fit on that (See
cgamma <- function(theta, val, stat) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @Qianyuxuan might have better opinions on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about how you calculate sd here: "sd = sd(value, na.rm = TRUE)" with known quantiles and means. Usually, we estimate means and sds based on the best fits. For organic carbon density (ocd), it was found to follow gamma distribution best based on reported means and quantiles. But for soil texture data (sand, clay, and silt fractions), they follow Dirichlet distribution and estimate of corresponding parameters can be found in my pending PR here: https://github.com/PecanProject/pecan/pull/3406/files#diff-8b26bae8174687f7e2bf631c5188b979a5b8e4bdacc5c604da84e09a2abc0058
…ds_carbonstocks function
That's an important point! Implementing a data volume check could help manage resource usage effectively. Your approach of using tiled downloads and merging with |
upper_bound <- segment[2] | ||
|
||
total_soc <- soc_data %>% | ||
dplyr::filter(depths >= lower_bound & depths < upper_bound) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is “soc_data” soil organic carbon content (SOC, in g/kg) or organic carbon density (OCD, in kg/m³)? I guess it is OCD if you multiply it by thickness (see https://www.isric.org/explore/soilgrids/faq-soilgrids). For total_SOC, it should be something like "total_SOC=sum(OCDthickness(1-coarse_fraction))" but can you explain your calculation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The soc_data
refers to organic carbon density (OCD) in kg/m³. The calculation for total_SOC
is performed as follows:
[
\text{total_SOC} = \sum(\text{OCD} \times \text{thickness} \times (1 - \text{coarse_fraction}))
]
This accounts for the volume of soil represented by the thickness and adjusts for the coarse fraction to provide an accurate estimate of total SOC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Qianyuxuan @DongchenZ any insights on this changes? |
fixes #3416
Refactored the SoilGrids functions in
modules/data.land/R/soilgrids_utils.R
to enhance efficiency and scalability. The changes include:download_soilgrids_raster
to handle multiple soil properties.query_soilgrids
to query data for multiple soil properties.soilgrids_carbonstocks
to include an optional coarse fraction in carbon stock calculations.