Skip to content
This repository has been archived by the owner on Jun 24, 2022. It is now read-only.

How to structure the samples metadata file 2.0

Tram Ngo edited this page Aug 2, 2021 · 1 revision

Background

SODA helps you prepare the samples metadata file conveniently. While SODA automatically generates the file in the required structure, we explain here how it must be structured according to the SPARC rules in order to provide some insight about the structure of the file generated by SODA.

How to

  • Format: The samples file is accepted in either xlsx, csv, or json format. SODA generates it in the xlsx format based on the template provided by the Curation Team.

  • Location in the dataset: The samples file is typically expected in the high-level dataset folder.

  • Content: The subject_id and sample_id are mandatory (highlighted in bold and italic below) for all datasets and must be provided with one "Value". The other "experimental setup" elements (highlighted in bold-only below) are also mandatory when available.

    • subject id: Lab-based schema for identifying each subject. This field should match the primary's sub-folder names. The subject_id must be unique.

    • sample id: Lab-based schema for identifying each sample. The sample_id must be unique across the whole dataset.

    • was derived from: This is the sample_id of the sample from which the current sample was derived (e.g., slice, tissue punch, biopsy, etc.).

    • pool_id: If data is collected on multiple subjects at the same time include the identifier of the pool where the data file will be found. If this is included it should be the name of the top level folder inside primary.

    • sample experimental group: This field refers to the experimental group that a subject is assigned to in the research project.

    • Sample type: This refers to the physical type of the sample from which the data were extracted.

    • Member of: For cases where we need to include a specimen in a population.

    • Also in dataset: For including the Pennsieve id(s) for other datasets that have data about the same specimen.

    • Laboratory internal id: Provide a mapping for groups that have incompatible internal identifier conventions.

    • Sample anatomical location: This is the organ, or sub-region of organ from which the data were extracted.

    • Date of derivation:

    • Pathology:

    • Laterality:

    • Cell type:

    • Plane of section:

    • Protocol title: This field refers to the title of the protocol within Protocols.io once the research protocol is uploaded to Protocols.io. In SODA, users can connect to their protocols.io account by clicking on "Help me with my protocol information" under the Protocol Information tab. An login interface will instruct users to sign in their account in the browser at protocols.io. An access token is required for automatic extraction of the protocol titles and links and can be easily obtained from the provided website once they are signed in. Once users successfully connect their account with with SODA, they can search in the input field for their protocol titles.

    • Protocol url or doi: This refers to the Protocol.io URL for the protocol title. Once the protocol is uploaded to Protocols.io, the protocol must be shared with the SPARC group and the Protocol.io URL is noted in this field. Please share with the SPARC group. In SODA, when users select a protocol title in the previous field (Protocol title), the protocol location or link will be automatically filled out for this field.

    • Reference atlas: Enter here the reference atlas and organ.

    • Experimental log file path: This is a file containing experimental records for each sample, whenever applicable.

Clone this wiki locally