diff --git a/docs/Data_Standardization/schemas.md b/docs/Data_Standardization/schemas.md index 929fa92..24885ac 100644 --- a/docs/Data_Standardization/schemas.md +++ b/docs/Data_Standardization/schemas.md @@ -42,8 +42,8 @@ Encountering a value that has a syntactic structure beyond random characters sug * **Standardization reference**: As detailed in the permanent identifiers section below, an attribute specification can have one or more purl identifiers which point to ontology or other structured vocabulary terms which indicate that the attribute is machine comparable with any similarly marked attribute. Each purl should point to a resource that indicates a term's semantic definition, synonymy and logical constraints. For example the Cell Ontology [cell type](http://purl.obolibrary.org/obo/CL_0000000) purl can be the standardized reference for a "cell_type" attribute, and in fact that term's subordinate terms can be used as the list of possible values a cell_type attribute can hold. -* **Attribute value(s)**: An attribute's schema specification allows at least one value datatype but in some schemas it may have more than one, such as a birthdate integer plus "null value list" (a picklist of missing, not collected, etc. data collection statuses). Standards such as NCBI's [missing value reporting](https://www.ddbj.nig.ac.jp/biosample/submission-e.html#missing-value-reporting) cover this. - +* **Attribute value(s)**: An attribute's schema specification allows at least one value datatype but in some schemas it may have more than one, such as a "birthdate" attribute which can have both a date datatype value but also a "null value list" (a picklist of missing, not collected, etc. data collection statuses). Standards such as NCBI's [missing value reporting](https://www.ddbj.nig.ac.jp/biosample/submission-e.html#missing-value-reporting) cover this. + * **Attribute semantics**: An attribute has a narrower fundamental **datatype** which conveys the value syntax related to how it was measured, as well as a contextual **semantic type** that indicates what was being measured. For example while an attribute might simply be called "age" (with datatype integer and implicit year unit), it is not intended to be compared with any dataset's "age" attribute out there in the world, as shown by this list of [age kinds](http://purl.obolibrary.org/obo/OBI_0001167). An attribute plain text definition might state what particular kind of age it is or entity it applies to, but computers are blind to that. This is why data schemas benefit from the addition of both entity and attribute standardization references. Ideally a data schema draws upon a shared library of attributes organized by their semantic differentiae. * **Measures**: Attributes aren't always measured in the same way, even within the same schema, so in that case different attributes (with different coding and plain names) should be defined that include the precise unit measure or scale in question (such as "age_in_years" or "age_in_weeks" or "gestational_age" or "trimester_age"). This solves computational ambiguity.