How to structure the dataset description metadata file 2.0

Background

SODA helps you prepare the dataset description metadata file conveniently. While SODA automatically generates the file in the required structure, we explain here how it must be structured according to the SPARC rules in order to provide some insight about the structure of the file generated by SODA.

How to

Format: The dataset description file is accepted in either xlsx, csv, or json format. SODA generates it in the xlsx format based on the template provided by the Curation Team.
Location in the dataset: The dataset_description file must be included in the high-level dataset folder.
Content: The "Metadata element" and "Value" columns are mandatory as well as the "Definition" and "Example" columns, which should remain unchanged. Some of the "Metadata element" are mandatory (highlighted in bold below) for all datasets and must be provided at least one "Value". The mandatory items in the dataset_description file are marked in green while the optional items are marked in yellow. If more than one "Value" is to be provided (for instance multiple "Contributors") for a "Metadata Item", each subsequent value column must be named "Value 2", "Value 3", "Value 4", and so forth.
- Type: The type of data present in this dataset. This should be either 'experimental' or 'computational' depending on your study.
- Name: Descriptive title for the dataset. This field should match exactly with your dataset name on Pennsieve.
- Description: Brief description of the study and the data set. Equivalent to the abstract of a scientific paper. This could match the subtitle provided on Pennsieve.
- Keywords: A set of 3-5 keywords other than already mentioned in the above elements that will aid in search of your dataset once published on the SPARC portal. Each keyword must be provided in a separate column.
- Funding: Specify the number of your SPARC award (mandatory in the OT2OD0XXXXX format if you are a SPARC researcher) and other funding award if applicable (optional). If multiple award numbers are specified, each award number must be specified in a separate column.
- Acknowledgements: Acknowledgements beyond funding and contributors
- Number of subjects: Number of unique subjects in this dataset, should match subjects metadata file.
- Study purpose: A description of the study purpose. This should be identical to the relevant section from your dataset's description on Pennsieve.
- Study data collection: A description of the study purpose. This should be identical to the relevant section from your dataset's description on Pennsieve.
- Study primary conclusion: A description of the study purpose. This should be identical to the relevant section from your dataset's description on Pennsieve.
- Study collection title: This is the collection your dataset will be a part of as selected on Pennsieve.
- Number of samples: Number of unique samples in this dataset, should match samples metadata file. Set to zero if there are no samples.
- Contributors: Name of any contributors to the dataset. These individuals need not have been authors on any publications describing the data, but should be acknowledged for their role in producing and publishing the dataset. If more than one, add each contributor in a new column. For each contributor it is mandatory at least one affiliation, at least one role, and role as contact person.
- Contributor ORCID ID: This is the contributor's ORCID ID number. If you do not have one, you can sign up for one at https://orcid.org. It must be in the format "https://orcid.org/xxxx-xxxx-xxxx-xxxx". It is not mandatory but highly recommended.
- Contributor Affiliation: Institutional affiliation for contributors. A ror ID in the "https://ror.org/xxxxxxxxx" format could be provided if available. If multiple affiliations, each must be semi-colon separated in a single cell.
- Contributor Role: Role(s) of the contributor. It must one of the following roles provided by the Data Cite schema: PrincipleInvestigator, Creator, CoInvestigator, CorrespondingAuthor, DataCollector, DataCurator, DataManager, Distributor, Editor, Producer, ProjectLeader, ProjectManager, ProjectMember, RelatedPerson, Researcher, ResearchGroup, Sponsor, Supervisor, WorkPackageLeader, Other. The definition of each of these roles is provided in the document here. The role "CorrespondingAuthor" must be provided for the person marked as "Yes" for the person who will serve as the primary contact of the dataset (this is done automatically in SODA). If more than one role is to be specified for a contributor, each must be comma-separated in a single cell.
- Identifier description: A short description of an related identifier. Multiple identifiers must be split into individual columns.
- Relation type: A prespecified list of relations for common identifiers used in SPARC datasets. The value in this field must be read as the 'relationship that this dataset has to the specified identifier'
- Identifier: Specify your actual identifier. This can be web links to repositories, protocols or papers(DOI).
- Identifier type: This will state whether your identifier is a 'URL' or 'DOI' item. Use one of those two items to reference the type of identifier

Home | Main Manual Page | Download | Contact Us

Overview

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to structure the dataset description metadata file 2.0

Background

How to

Overview

Getting started

Manage Datasets

Prepare Metadata

Prepare Datasets

Disseminate Datasets

Common errors and their solutions

Developer guidelines

"How to" SPARC series

Clone this wiki locally