New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: refactor SoilGrids functions to handle multiple properties and … #3455

Open

AritraDey-Dev wants to merge 4 commits into PecanProject:develop from AritraDey-Dev:fix/refactor-soilgrids

Contributor

AritraDey-Dev commented Feb 24, 2025 •

edited

Loading

fixes #3416

Refactored the SoilGrids functions in modules/data.land/R/soilgrids_utils.R to enhance efficiency and scalability. The changes include:

A new function download_soilgrids_raster to handle multiple soil properties.
Generalization of query_soilgrids to query data for multiple soil properties.
An update to soilgrids_carbonstocks to include an optional coarse fraction in carbon stock calculations.


          feat: refactor SoilGrids functions to handle multiple properties and …

f7fd1f5

…improve efficiency

github-actions bot added the Modules label

AritraDey-Dev mentioned this pull request

Refactor SoilGrids functions #3416

Open

Contributor Author

AritraDey-Dev commented Feb 24, 2025

I made changes to refactor SoilGrids functions for improved efficiency and scalability; please take a look,i think this works for you.

mdietze assigned Qianyuxuan and DongchenZ

Contributor Author

AritraDey-Dev commented Feb 25, 2025

@DongchenZ @Qianyuxuan Can you review this PR and I think it solves the mentioned issue.Just i want you to go through it once and let me know if you feel any changes required.

DongchenZ reviewed

View reviewed changes

Contributor

DongchenZ left a comment

This PR improves the ability to extract soil properties other than SOC. However, I worry that those functions (which heavily rely on the terra package) will be limited to broader spatial extents (e.g., North America, Asia, Africa, etc.) where the computation demand and the memory usage will be unrealistic high. Given that, I think it will be helpful to have a prior check on the volume of data requested (e.g., if the size of the data is too large, then never try it and just return an error message).
For larger spatial extents, it would be more efficient to download the tiled SoilGrids data directly (I usually use the wget command) and merge the tiled images using the gdalwarp command.

modules/data.land/R/soilgrids_utils.R Outdated Show resolved Hide resolved

modules/data.land/R/soilgrids_utils.R Outdated

+              #' @param local_raster Optional. The file path of a local raster to query. If NULL, queries the remote VRT.
+              #' @return A data frame containing the queried data
+              #' @export
+              query_soilgrids <- function(points, soil_property, depth, quantile, local_raster = NULL) {

Contributor

DongchenZ Feb 25, 2025

See my previous comments for projecting the points from lat/lon degrees to the SoilGrids coordinate system.

modules/data.land/R/soilgrids_utils.R Outdated

+                  dplyr::summarize(
+                    mean = mean(value, na.rm = TRUE),
+                    sd = sd(value, na.rm = TRUE),
+                    quantile_05 = quantile(value, probs = quantiles[1], na.rm = TRUE),

Contributor

DongchenZ Feb 25, 2025

I would not have those hardcoded names for different quantiles. Instead, you might consider calculating them first and naming them afterward based on the quantile we choose.

Contributor Author

AritraDey-Dev Feb 25, 2025

yes, I'll adjust the code to calculate and name the quantiles dynamically

modules/data.land/R/soilgrids_utils.R

+              #' @param quantiles A vector of quantiles to estimate uncertainties (default: c(0.05, 0.5, 0.95))
+              #' @return A data frame containing the uncertainties
+              #' @export
+              soilgrids_uncertainty <- function(data, quantiles = c(0.05, 0.5, 0.95)) {

Contributor

DongchenZ Feb 25, 2025

I wonder if the calculation of uncertainty here is correct or not. Our current calculations will first simulate the uncertainty of soil properties across the vertical profile using the mean and different quantiles from SoilGrids directly and then summarize the uncertainty once we have a good fit on that (See

pecan/modules/data.land/R/soilgrids_soc_extraction.R

Line 123 in 72e09bc

cgamma <- function(theta, val, stat) {

).

Contributor

DongchenZ Feb 25, 2025

I think @Qianyuxuan might have better opinions on that.

Collaborator

Qianyuxuan Feb 25, 2025

I am confused about how you calculate sd here: "sd = sd(value, na.rm = TRUE)" with known quantiles and means. Usually, we estimate means and sds based on the best fits. For organic carbon density (ocd), it was found to follow gamma distribution best based on reported means and quantiles. But for soil texture data (sand, clay, and silt fractions), they follow Dirichlet distribution and estimate of corresponding parameters can be found in my pending PR here: https://github.com/PecanProject/pecan/pull/3406/files#diff-8b26bae8174687f7e2bf631c5188b979a5b8e4bdacc5c604da84e09a2abc0058

modules/data.land/R/soilgrids_utils.R Outdated Show resolved Hide resolved


          feat: Add customizable depth segments for SOC calculations in soilgri…

…ds_carbonstocks function

Contributor Author

AritraDey-Dev commented Feb 25, 2025

This PR improves the ability to extract soil properties other than SOC. However, I worry that those functions (which heavily rely on the terra package) will be limited to broader spatial extents (e.g., North America, Asia, Africa, etc.) where the computation demand and the memory usage will be unrealistic high. Given that, I think it will be helpful to have a prior check on the volume of data requested (e.g., if the size of the data is too large, then never try it and just return an error message). For larger spatial extents, it would be more efficient to download the tiled SoilGrids data directly (I usually use the wget command) and merge the tiled images using the gdalwarp command.

That's an important point! Implementing a data volume check could help manage resource usage effectively. Your approach of using tiled downloads and merging with gdalwarp is a practical solution for larger extents.

Qianyuxuan reviewed

View reviewed changes

modules/data.land/R/soilgrids_utils.R

+                  upper_bound <- segment[2]
+                  total_soc <- soc_data %>%
+                    dplyr::filter(depths >= lower_bound & depths < upper_bound) %>%

Collaborator

Qianyuxuan Feb 26, 2025

Is “soc_data” soil organic carbon content (SOC, in g/kg) or organic carbon density (OCD, in kg/m³)? I guess it is OCD if you multiply it by thickness (see https://www.isric.org/explore/soilgrids/faq-soilgrids). For total_SOC, it should be something like "total_SOC=sum(OCDthickness(1-coarse_fraction))" but can you explain your calculation here?

Contributor Author

AritraDey-Dev Feb 26, 2025

The soc_data refers to organic carbon density (OCD) in kg/m³. The calculation for total_SOC is performed as follows:

[
\text{total_SOC} = \sum(\text{OCD} \times \text{thickness} \times (1 - \text{coarse_fraction}))
]

This accounts for the volume of soil represented by the thickness and adjusts for the coarse fraction to provide an accurate estimate of total SOC.

Contributor Author

AritraDey-Dev Feb 26, 2025


          Merge branch 'develop' into fix/refactor-soilgrids

126c3b4

AritraDey-Dev requested review from Qianyuxuan and DongchenZ

March 1, 2025 10:25

Contributor Author

AritraDey-Dev commented Mar 2, 2025

@Qianyuxuan @DongchenZ any insights on this changes?


          Merge branch 'develop' into fix/refactor-soilgrids

63a0b0b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels