Skip to content

Commit

Permalink
Update index.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ddooley authored Nov 13, 2024
1 parent 417f8a2 commit b9c22a3
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/Data_Standardization/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ Much standardization work can be done in advance of introducing ontology id’s
* **Values**: A form field, record field, table row field, spreadsheet cell, computational object/class attribute or property or slot, or variable can hold a **value** (aka data item or datum).

* **Fundamental datatypes**: Crucial to machine readability, a value can be of a certain fundamental "literal" or syntactic **datatype**, like a string, date, time, integer or decimal number, boolean, categorical value or URL reference type. A few common standard "data-interchange languages" exist that express these: [XML](https://www.w3.org/TR/xmlschema11-2/#built-in-datatypes), [JSON](https://json-schema.org/understanding-json-schema/reference/type) and [SQL](https://www.digitalocean.com/community/tutorials/sql-data-types).
* **Units**: Numeric values may be accompanied by units (e.g. "1m" for a meter, or "2d" for 2 days). Whether a unit is bundled with a number as a single string datatype value, or whether they are separated out into separate datatype values is a matter for the schema developers to settle. By themselves, units need a string or coding representation, such as provided by [UCUM codes](https://units-of-measurement.org/) or an ontology of units (e.g. [QUDT](http://qudt.org/), [OM](http://www.ontology-of-units-of-measure.org/), [UO](https://obofoundry.org/ontology/uo)).
* **Units**: Numeric values may be accompanied by units (e.g. "1m" for a meter, or "2d" for 2 days). Whether a unit is bundled with a number as a single string datatype value, or whether it is stored separately from a value is a matter for the schema developers to settle. By themselves, units need a string or coding representation, such as provided by [UCUM codes](https://units-of-measurement.org/) or an ontology of units (e.g. [QUDT](http://qudt.org/), [OM](http://www.ontology-of-units-of-measure.org/), [UO](https://obofoundry.org/ontology/uo)).
* A data schema can also provide more complex string data type extensions by imposing further constraints on their syntax in order to express for example the [ISO 19115-1:2014
Geographic information — Metadata](https://www.iso.org/standard/53798.html) for latitude and longitude coordinates. The standard way of doing this is with [regular expressions](https://en.wikipedia.org/wiki/Regular_expression).
A data specification meant for just one project or infrastructure's workflows might allow a looser description of some kinds of datatype, for example allowing dates having different formats to be a "date" type, or numbers of different precisions to be a "numeric" type. However, the transition from data specification to data standard ideally minimizes such ambiguities, so that "04/05/22" doesn't get confused about month, day and year, or a "10.5" value doesn't throw an error because one database chose to store it as an integer, while another chose a decimal format. Its best to be as precise as possible up front, acknowledging however that characteristics can be measured in different ways (as noted in attributes section below).
* A data specification meant for just one project or infrastructure's workflows might allow a looser description of some kinds of datatype, for example allowing dates having different formats to be a "date" type, or numbers of different precisions to be a "numeric" type. However, the transition from data specification to data standard ideally minimizes such ambiguities, so that "04/05/22" doesn't get confused about month, day and year, or a "10.5" value doesn't throw an error because one database chose to store it as an integer, while another chose a decimal format. Its best to be as precise and granular about intended datatypes up front, acknowledging however that characteristics can be measured in different ways (as noted in attributes section below).
* Note: An OCA schema documents all kinds of number as a "numeric" datatype, and so requires a regular expression to provide finer granularity, matching to decimal or integer types.

Encountering a value that has a syntactic structure beyond random characters suggests that it has some meaning about something, which leads to the topic of attributes.
Expand Down

0 comments on commit b9c22a3

Please sign in to comment.