-
Notifications
You must be signed in to change notification settings - Fork 6
2. FAIR Data Cube
The FAIR Data Cube is a set of tools and services that helps researchers in distinct stages of the research data life cycle with (1) creating rich and machine readable metadata for single and multi-omics datasets; (2) making -omics data findable and accessible for reuse; (3) facilitating analysis and integration of -omics data from different sources.
Fig 2. illustrates the concept idea of how FAIR Data Cube helps different stakeholders, covering distinct aspects of creating and using omics metadata. A detailed introduction can be found in this poster.
Fig 2. The concept idea of the FAIR Data Cube.
The FAIR Data Cube helps the stakeholders along the data life cycle as follows:
Dataset owner
A dataset owner would register his/her dataset via publishing the metadata on the FDP. Considering the various metadata format adopted by different X-omics communities, it is reasonable to adopt a standard metadata format as a template for submitting the metadata.
In our cases, the Investigation/Study/Assay metadata schema is adopted. Pipelines are provided to help researchers using the Investigation-Study-Assay (ISA) metadata framework to capture experimental metadata and share it on the FAIR Data Point servers, which serve as a metadata registry.
The Investigation parts of ISA are made DCAT compatible and are used to fill the standard set of metadata of the FDP. The remaining info in ISA (Study and Assay) is uploaded and stored in an extra triplestore like GraphDB.
Researcher
A researcher is a potential dataset user. The researcher can search/browse the FDP for an interesting dataset. With the metadata encoded in knowledge graph representation, the researchers can conduct semantic searches on samples, phenotypes, omics measurements, and features by using the SPARQL(which stands for SPARQL Protocol and RDF Query Language) query interface to ask interesting questions like:
-
Find all studies which use MS-based metabolomics and study a specific metabolic disorder
-
Find data sets with more than two omics types and more than 100 individuals
-
Find measurements for proteins and metabolites that belong to a particular metabolic pathway
To answer these questions, the researcher raises a computation request to the dataset owner. After receiving the computation request, the dataset owner supervises the execution of the computation request and returns the results to the requester.
Readers can also get an overview of the FAIR Data Cube via a flash talk presented at ISMB/ECCB 2023.