Defines the core metadata model for iSamples.
src/schemas/isamples_core.yaml
defines the iSamples core model in linkml. It references vocabularies contained in [isamplesorg/vocabularies//vocabulary](https://github.com/isamplesorg/vocabularies/tree/develop/vocabulary)
which define terms for the Material Type, Sampled Feature, and Material Sample Object Type vocabularies.
The following artifacts are generated from the linkml and vocabulary sources:
- Documentation in HTML, available at https://isamplesorg.github.io/metadata/
Linkml and associated tools require a python environment, version 3.9 or newer, and uses poetry for dependency management. Poetry can be installed with pip install poetry
.
To work on project contents and run artifact generators, first grab the source and switch to the develop branch:
git clone https://github.com/isamplesorg/metadata.git
cd metadata
checkout develop
pull
Setup a virtual environment (e.g. using poetry or mkvirtualenv):
poetry shell
poetry install
(To exit poetry shell, use exit
).
Artifacts in the generated/
folder are produced by running make
or make all
.
Documentation is rendered with Quarto rather than the defaults mkdocs
or Sphinx
(Quarto offers many additional features for including computed examples which are planned). To generate the documentation, install a version of Quarto >= 1.2, then run make
, make all
or make gen-docs
.
This will generate markdown intermediate files in the build/docs
folder then invoke quarto render
to generate the HTML docs in the docs/
folder.
Note that this project uses a version of the linkml
docgen
tool and templates modified to render markdown for quarto
. The modified docgen
and templates is located in the tools/
folder.
Collation of metadata examples and notes for the project
- background: contains diagrams and information about some existing models that include metadata for samples; files are organized broadly by domain.
- examples: example metadata documents from different systems. Subfolders are
- raw: metadata from the originating system
- test: corresponding records generated manually using the iSamples basic template
- transform: corresponding records generated by automated ETL process from raw records
- vocabulary: vocabularies related to sample metadata from various systems
This branch implments how to use linkML to generate various output and operations for iSamples.
We could use the following command to convert iSamples YAML schema to JSON schema.
gen-json-schema -t PhysicalSampleRecord --not-closed iSamplesSchemaBasic0.3.yaml > iSamplesSchemaBasic0.3.schema.json
In this command, -t PhysicalSampleRecord
means to make "physicalSampleRecord" class become the top level class. And the prepoerties of the class become the top level properties in the JSON-schema. The converted JSON scheme file is "iSamplesSchemaBasic0.3.schema.json".
gen-jsonld-context iSamplesSchemaBasic0.3.yaml > iSampleSchemaBasic0.3.jsonld
The command will save the result in the jsonld file. After we have the converted JSON-LD context. The enumeration part of JSON-context should be modified by us manually.
Modified JSON-LD context example
"@context": { "dct": "http://purl.org/dc/terms/", "isam": "http://resource.isamples.org/schema/", "mat": "http://resource.isamples.org/vocabulary/material/", "pur": "http://resource.isamples.org/vocabulary/samplepurpose/", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "sf": "http://resource.isamples.org/vocabulary/sampledFeature/", "skos": "http://www.w3.org/2004/02/skos/core#", "spt": "http://resource.isamples.org/vocabulary/sampleobjecttype/", "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#", "xsd": "http://www.w3.org/2001/XMLSchema#", "@vocab": "http://resource.isamples.org/schema/", "curation": { "@type": "@id" }, "hasContextCategory": { "@type":"contextcategory" }, "hasMaterialCategory": { "@type":"materialtype" }, "has_sample_object_type": { "@type":"specimencategory" }, "id": "@id", "latitude": { "@type": "xsd:decimal" }, "location": { "@type": "@id" }, "longitude": { "@type": "xsd:decimal" }, "producedBy": { "@type": "@id" }, "relatedResource": { "@type": "@id" }, "resultTime": { "@type": "xsd:date" }, "samplingSite": { "@type": "@id" } }
Before we valideting all instance files, we need to add modified JSON-LD context to the front of instances properties.
Full instance example
{ "@context": { "dct": "http://purl.org/dc/terms/", "isam": "http://resource.isamples.org/schema/", "mat": "http://resource.isamples.org/vocabulary/material/", "pur": "http://resource.isamples.org/vocabulary/samplepurpose/", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "sf": "http://resource.isamples.org/vocabulary/sampledFeature/", "skos": "http://www.w3.org/2004/02/skos/core#", "spt": "http://resource.isamples.org/vocabulary/sampleobjecttype/", "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#", "xsd": "http://www.w3.org/2001/XMLSchema#", "@vocab": "http://resource.isamples.org/schema/", "curation": { "@type": "@id" }, "hasContextCategory": { "@type":"contextcategory" }, "hasMaterialCategory": { "@type":"materialtype" }, "has_sample_object_type": { "@type":"specimencategory" }, "id": "@id", "latitude": { "@type": "xsd:decimal" }, "location": { "@type": "@id" }, "longitude": { "@type": "xsd:decimal" }, "producedBy": { "@type": "@id" }, "relatedResource": { "@type": "@id" }, "resultTime": { "@type": "xsd:date" }, "samplingSite": { "@type": "@id" } },"@schema": "../../iSamplesSchemaBasic0.2.json", "@id": "metadata/21547/Car2PIRE_0334", "label": "PIRE_0334", "sampleidentifier": "ark:/21547/Car2PIRE_0334", "description": "", "hasContextCategory": ["Marine Biome"], "hasMaterialCategory": ["Organic Material"], "has_sample_object_type": ["Whole Organism"], "informalClassification": ["Gastropoda"], "keywords": ["Aceh", "Sumatra","Indonesia","Asia", "Mollusca"], "producedBy": { "@id":"ark:/21547/Cas2INDO_2016_SEU_1B", "label": "INDO_2016_SEU_1B", "description": "expeditionCode: INDO_PIRE | samplingProtocol: ARMS | taxonomy team: MINV | projectId: 80", "hasFeatureOfInterest": "coral reef", "responsibility": ["Aji Wahyu Anggoro","Andrianus Sembiring"], "resultTime": "2016-08-09", "samplingSite": { "description": "Shallow, coastal reef. Apparent exposure to current, Porites dominated. Less impacted bleaching site, high recruitment, 12 m.", "label": "", "location": { "elevation": "maximumDepthInMeters: 12", "latitude": 5.89430, "longitude": 95.25293 }, "placeName": ["Pulau Seulako"] } }, "registrant": "Chris Meyer", "samplingPurpose": "genomic analysis", "curation": { "accessConstraints": "", "curationLocation": "", "responsibility": "" }, "relatedResource": { "label":"subsample tissue", "description":"", "target":"ark:/21547/Cat2INDO106431.1", "relationship":"subsample" }
}
We need to use the following command to validate our instance files with schema.
linkml-validate -s iSamplesSchemaBasic0.3.yaml instance.json
jsonschema -i instance.json iSamplesSchemaBasic0.3.schema.json
The first command is to validate instance file with yaml schema. The second command is to validate instance file with json schema.
The iSamples Metadata Docker container is based on the Docker container from the LinkML project [https://hub.docker.com/r/monarchinitiative/linkml/tags]
First you'll build the image:
docker build -t isamples_linkml .
Then, running it will open a bash shell opened to /work
, which is the Docker container volume representing the iSamples metadata repository:
docker run -a stdin -a stdout -i -t -v `pwd`:/work isamples_linkml
Then use the following commands to generate LinkML:
- Command 1
- Command 2
- Command 3
- We still focus on implementing the iSamples schema under linkML requirements.
- There are some bugs or unimplemented parts in the linkML.
- The different pc platform will have different results or errors. We prefer to use docker to run linkML. Please follow the linkML tutorial