Skip to content

Commit

Permalink
Update index.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ddooley authored Nov 13, 2024
1 parent d5ebca5 commit 417f8a2
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions docs/Data_Standardization/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@ nav_order: 6
1. TOC
{:toc}

The vision of FAIR **discovery and reuse** of datasets has a number of [motivations and challenges](https://github.com/ClimateSmartAgCollab/Documentation-en/blob/main/docs/Data_Standardization/motivation.md). Standardization opportunities exist at both the project data format (syntax) layer, at provenance/process/workflow/file/folder levels and as well in terms of semantic or equivalent meaning or "aboutness" which often requires references to experimental design, protocol, and other contextual data. This allows us to determine when datasets or data points are comparable or not, and so is important dataset information for downstream data catalogues to absorb and leverage in their search interfaces.
The vision of FAIR **discovery and reuse** of datasets has a number of [motivations and challenges](https://github.com/ClimateSmartAgCollab/Documentation-en/blob/main/docs/Data_Standardization/motivation.md). Standardization opportunities exist at both the project data format (syntax) layer, at provenance/process/workflow/file/folder levels and as well in terms of dataset semantic meaning or "aboutness" which often requires references to experimental design, protocol, and other contextual data. This allows us to determine when datasets or data points are comparable or not, and so is important dataset information for downstream data catalogues to absorb and leverage in their search interfaces.

Ultimately, a key requirement for success is a **well-coordinated technical language to describe project research aims and methods, and dataset tables and fields**. Artificial Intelligence will likely be introduced to help pair users' plain language descriptions of sought-after data types and context to the catalogue's holdings, but either way, the discovery and reuse vision requires project data managers to provide sufficient information at various layers as shown below. First we work through a description and recommendations on the mixture of language currently present in describing data standard content at various layers and in various software applications and storage technologies, and then discuss how ontologies can be layered on to help determine semantic comparability.
Ultimately, a key requirement for success is a **well-coordinated technical language to describe project research aims and methods, and dataset tables and fields**. In the future artificial intelligence will likely pair users' plain language descriptions of sought-after data types and context to data catalogue holdings, but either way, the discovery and reuse vision requires project data managers to provide sufficient information at various layers as shown below. First we work through a description and recommendations on the mixture of language currently present in describing data standard content at various layers and in various software applications and storage technologies, and then discuss how ontologies can be layered on to help determine semantic comparability.
Much standardization work can be done in advance of introducing ontology id’s to a schema. In a way ontologies provide the comparable fruits of interoperability, but a data schema is the practical tree that needs to be built on which ontology terms hang.

* **Values**: A form field, record field, table row field, spreadsheet cell, computational object/class attribute or property or slot, or variable can hold a **value** (aka data item or datum).

Expand Down

0 comments on commit 417f8a2

Please sign in to comment.