-
Notifications
You must be signed in to change notification settings - Fork 6
Home
The Netherlands X-omics initiative determined an innovation to expand the strong Dutch ‘omics’ technology footprint is to realize a common data integration and analysis framework to drive high-quality multi omics studies.
Unaddressed challenge is limited access to large-scale well-annotated clinical and population series. Numerous large omics analyses of simple and complex systems have been published in the past, yet the associated data is not always findable, accessible, interoperable and reusable (FAIR). This and the lack of harmonization between instruments greatly hampers the re-use of existing data and causes that omics studies have to be re-run unnecessarily.
The data analysis, integration and stewardship pillar of the Netherlands X-omics initiative aims to contribute to the realization of an integrated X-omics infrastructure and to facilitate multi-omics research by providing means for the creation, analysis and integration of FAIR -omics data.
We propose guidelines and tools to make -omics data FAIR at the source, to facilitate reuse of -omics data, and to facilitate the integration of different layers of -omics data. The FAIR Data Cube is an important infrastructure component to achieve this.
1. Preliminary
1.1 FAIR Principles
The FAIR principles were proposed 1 to guide researchers to describe and share their data to increase data reuse and research reproducibility. FAIR stands for Findable, Accessible, Interoperable and Reusable, with its meaning specifically given below:
Findable: The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services.
Accessible: Once the user finds the required data, she/he needs to know how they can be accessed, including authentication and authorisation.
Interoperable: The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
Reusable: The goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
1.2 Metadata in multi-omics studies
To enable researchers to find and understand multi-omics data, it is crucial to describe the data using rich human- and machine-readable metadata. This metadata includes general information about a study, such as study title, authors, or experimental design. It further includes descriptions of study samples, e.g., sample properties and provenance of samples and derived extracts (DNA, RNA, proteins). Information about samples can also include phenotypic or clinical data about specimens or individuals. The experimental metadata also comprises information about the conducted assays, .i.e, information about the used platforms, technologies, experimental protocols, and protocol parameters. It is also important to be able to understand the provenance of the acquired and processed data. Hence, metadata on input and (intermediate) output files, data processing pipelines, and analysis parameters should be reported. Finally, to understand the diverse types of omics data, it is necessary to describe the molecular features (genes, proteins, metabolites) in an unambiguous way.
1.3 Investigation-Study-Assay
It is important to use standardized and open metadata formats, ensure that data and metadata have globally unique and persistent identifiers (PIDs), and register the metadata so that it can be found. Interoperability is achieved by mapping metadata to commonly used controlled vocabularies and ontologies. To capture general study metadata, sample (including basic sample characteristics), and assay metadata, we employ the Investigation-Study-Assay (ISA) metadata framework 2. It supports ontology-based descriptions of experiments and has been specifically designed to capture metadata of omics and multi-omics experiments. It is already used in the research community, e.g., submission of data to EMBL-EBI’s MetaboLights database requires metadata to be captured in the ISA format. Other databases such as EMBL-EBI’s ArrayExpress or PRIDE support ISA-Tab via conversion tools. To capture molecular features metadata, we propose an extension of the framework, where each feature is annotated unambiguously with a unique and persistent database identifier including a reference to the database and its version. Phenotypic or clinical data is captured in the Phenopackets format.
We adopt and develop metadata schemas for different omics data types and make use of the Investigation-Study-Assay (ISA) metadata framework 2 to capture experimental metadata.
The Investigation/Study/Assay (ISA) Metadata Framework 3 is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences.
The concept map below shows the ISA objects/entities and their relation to one another.