-
Notifications
You must be signed in to change notification settings - Fork 4
Organize and submit a new SPARC dataset with SODA
This is the suggested workflow for preparing and submitting your SPARC datasets with SODA using the Free Form Mode features (accessible from the sidebar). All these steps are mandatory (unless marked otherwise) if you wish to satisfy the SPARC requirements. Once you complete these steps, your datasets will be curated according to the SPARC guidelines and summited to the SPARC Curation Team for review.
This is only required the first time you use SODA. This step will be automatically requested from you when you start SODA if you haven't connected to Pennsieve. You can also trigger this step manually by clicking the edit (pencil icon ) button at any of the
Current account
fields in SODA. We would suggest using the one found under the Create a new dataset
option at Free Form Mode > Manage Datasets
.
To create a new dataset on Pennsieve, use the Create a new dataset
option under Free Form Mode > Manage Datasets
.
You can then confirm your details in the account step. Type out the name of the dataset you want to create on Pennsieve and click on Create dataset
.
Note: You can use the navigation buttons in the bottom right corner of the app to go through all the mandatory steps needed to curate a dataset. These buttons will follow the same steps as detailed in this guide.
By default, the creator of the dataset (you) is the owner. As per SPARC guidelines, the PI of your award needs to be the owner of the dataset. You can use the feature Make PI owner of dataset
under the Manage Datasets
tab to accomplish this task. The dataset you created during step A2 should be automatically selected (you can select another one by clicking on the pencil symbol next to Current dataset
). Simply select your dataset PI from the dropdown list and click on Make PI owner
(read and click Yes
in the warning prompt) to make them the owner.
Your dataset may require additional people or teams to be given permission to access the dataset. You can use the feature Add/edit permissions
under the Manage Datasets
tab to accomplish this task. Currently given permissions will be visible under Current permissions
. To add additional permissions for teammates, click on Add/edit user permissions
, select your teammate's name from the first dropdown list, select their desired permission/role from the second dropdown list, and click on Add permissions for user
. Repeat to add more teammates.
If you need to learn more about the types of permissions and their restrictions, click here.
You can use the feature Add/edit subtitle
under the Manage Datasets
tab to add a subtitle to your dataset. The dataset you created during step A2 should be automatically selected (you can select another one by clicking on the pencil symbol next to Current dataset
). In the textbox, add two or three sentences (limit of 256 characters) that describe the content of your dataset and click on Add subtitle
.
You can use the feature Add/edit description
under the Manage Datasets
tab to add a description to your dataset. The dataset you created during step A2 should be automatically selected (you can select another one by clicking on the pencil symbol next to Current dataset
). In the textbox, provide a detailed description of your dataset and click on Add description
. It is typically recommended to include three sections: Study Purpose, Data Collected, and Primary Conclusion. You can see published datasets on sparc.science for inspiration.
You can use the feature Upload a banner image
under the Manage Datasets
tab to add a banner image to your dataset. The dataset you created during step A2 should be automatically selected (you can select another one by clicking on the pencil symbol next to Current dataset
).
- Click on "Edit banner image"
- Click on "Import image" in the new pop-up window.
- Select the image file you want to use as a banner image.
- Crop the file as desired. Note that all banner images must be square, have a minimum display size of 512x512 px (1024x1024 px preferred), and have a maximum file size of 5 MB.
- Click on "Save changes".
You can use the feature Assign a license
under the Manage Datasets
to assign the SPARC mandated Creative Commons Attribution (CC-BY) license to your dataset. The dataset you created during step A2 should be automatically selected (you can select another one by clicking on the pencil symbol next to Current dataset
). Simply click on Assign Creative Commons Attribution (CC-BY) license
to assign the license.
You can use the feature Create submission.xlsx
under the Prepare Metadata
tab to accomplish this task. Select I want to prepare a new submission.xlsx file
, then fill out the three required fields for this metadata file.
- Enter your award number. We recommend connecting your Airtable account with SODA so you can import your award number automatically and without error from the SPARC Airtable sheet (this will also come in handy later on when adding contributors to the dataset_description metadata file). To do that:
- a. Click on
Click here to import my SPARC award from Airtable
- b. Click
Yes
on the pop-up for adding an Airtable account connection. You need to enter your Airtable API key in SODA to connect your account. To find your Airtable key, please visit your account (login if necessary) and click theGenerate API Key
button. If you see aRegenerate API key option
you may click on the box with the dots to reveal your API key. Copy it and paste it in the dedicated field in the SODA pop-up box (clickYes
if a warning prompt shows up).
- c. Select your SPARC award from the dropdown and click the 'Confirm' button
- Enter the milestone associated with your dataset and the corresponding completion date. The milestone and date should be exactly as reported in the Data Deliverables document associated with your award (see here to find out more). We recommend that you import your Data Deliverables document in SODA to automatically extract milestone information. To do so:
- a. Click on
Import milestones from my Data Deliverables Document
-
b. You can now click on the
Yes, let's import it
button in the first pop-up. In the second pop-up, click on theBrowse here
field to select the path of your Data Deliverable document. This will import all the relevant milestones and submission dates associated with that milestone -
c. You can then select the
milestone(s)
andcompletion date
associated with your dataset.
Click on the Generate
button and select where you want to store the submission.xlsx file on your computer. You will be asked to include this file when organizing your dataset in the later steps.
The expected structure of this file, generated automatically by SODA, is explained in our corresponding "How to" page if you would like to learn about it.
You can find this feature under Create dataset_description.xlsx
under the Prepare Metadata
tab. Select I want to prepare a new dataset_description.xlsx file
. The subsequent interface divides the dataset description file into six convenient sections to facilitate your task. Go through them successively and populate the various fields as indicated (Mandatory fields are indicated):
- Dataset information
- Dataset name: The name of the dataset you created during step A2 should be automatically listed. If not, click on the
Click here to select my dataset from Pennsieve
option and select your dataset.
- Brief description/subtitle of your dataset: This field should be populated automatically with the subtitle of your dataset. It is not required to change it.
- Study information
-
Keywords: Provide at least three keywords (press
Enter
on your keyboard after each). -
Provide the number of subjects and samples in your dataset (numerical value).
- Award and contributor information
- Award number: Click on the
Click here to select my award number and import contributor info
and select your award.
- Contributor Information: Click on Add contributor. In the new pop-up box, select the contributor from the list automatically pulled from the SPARC Airtable sheet. All the information should be automatically populated if it is available in the Airtable sheet. You only need to specify if the contributor is the contact person. Click on
Add contributor
. Repeat to add more contributors. In the contributors' table, you can drag and drop rows to organize contributors in the order that they should appear in the dataset_description file. You can also remove/edit one with the respective delete/edit buttons.
- Protocol Information
Click on Add a protocol
and enter the URL to your protocol on protocols.io.
All other information is optional.
When done, click on the Generate
button to create the dataset_description.xlsx file on your computer. You will be asked to include this file when organizing your dataset in the later steps.
The expected structure of this file, generated automatically by SODA, is explained in our corresponding "How to" page if you would like to learn about it.
You can find this feature under Create subjects.xlsx
under the Prepare Metadata
tab. Click on I want to start a new subjects file
, then click on Add a subject
. In the new interface, enter first the unique subject ID for this subject.
-
Experimental setup Add the
pool_id
andexperimental_group
if applicable/available. -
Species information
- Sex: Select one
- Species: Type and select the applicable option from the suggestions in the dropdown list. SODA will automatically fill with the correct scientific name defined by NCBI Taxonomy as per the SPARC requirements. If not pre-registered in SODA, click on the
Find the scientific name for xxx
dropdown option to look for the standard terminology of your species from the NCBI Taxonomy.
You may just type the name of the animal and click on the dropdown option to get the correct species terminology.
- Strain: Similarly to the species, type and select the applicable options from the suggestions in the dropdown list. If not pre-registered in SODA, click on the
Click here to check xxx
dropdown option to look for the standard strain on Scicrunch as per the SPARC requirements. SODA will automatically pull out the RRID and include it in your subjects file when it is generated.
You may use the predetermined options to retrieve the correct RRID for your metadata file.
If SODA does not already have the strain within the suggestion, please click on the dropdown option to allow SODA to retrieve the appropriate RRIDs.
- Exact age: enter a numerical value in the text field and select the unit from the dropdown list.
All other information is optional. When done, click on the Add subject
button. The added subject will be included in a table of subjects. You can edit/delete/copy existing subjects from the table using the buttons in the last column.
If all the subjects in your dataset have the same characteristics, you can copy information from one subject to another by clicking on the "Copy" icon in the last column and providing the subject id for the new subject.
When all the subjects are added, click on the Generate
button to create the subjects.xlsx file on your computer. You will be asked to include this file when organizing your dataset in the later steps.
The expected structure of this file, generated automatically by SODA, is explained in our corresponding "How to" page if you would like to learn about it.
You can find this feature under Create samples.xlsx
under the Prepare Metadata
tab. The interface is very similar to the subjects.xlsx file feature. Click on I want to start a new samples file
, then click on Add a sample
. In the new interface, enter first the subject ID for this sample is derived from, then enter the unique samples ID for this sample.
- Experimental setup
Enter applicable/available information.
- Specimen Information
- Specimen type: Select one from the dropdown list
- Specimen anatomical location: Type the location
- Species/strain/age: Follow instructions from the subjects.xlsx file feature
All other information is optional. When done, click on the Add sample
button. The added sample will be included in a table of samples. You can edit/delete/copy existing samples from the table using the buttons in the last column.
If all the samples in your dataset have the same characteristics, you can copy information from one sample to another by clicking on the "Copy" icon in the last column and providing the subject and sample ids for the new sample.
When all the samples are added, click on the Generate
button to create the subjects.xlsx file on your computer. You will be asked to include this file when organizing your dataset in the later steps.
The expected structure of this file, generated automatically by SODA, is explained in our corresponding "How to" page if you would like to learn about it.
C. Organize dataset according to the SPARC Dataset Structure (under the Prepare Datasets tab of the Free Form Mode)
All SPARC datasets must follow the top-level SPARC folder structure imposed by the SPARC Dataset Structure. This top-level folder structure is shown in the figure below. If your data organization doesn't follow this structure inherently, you can create it virtually with SODA and then generate it directly on Pennsieve.
Overview of the top level folder structure required for all SPARC datasets (taken from the instructions prepared by the SPARC Curation Team).
You can use the feature Organize dataset
under the Prepare Datasets
tab to accomplish this task.
Click on Prepare a new dataset
. This option is used to start organizing and curating a new dataset. SODA will take you step-by-step through the curation process to organize your dataset, adding your metadata files, and generating your dataset onto Pennsieve.
Select the high-level folder(s) to be included in your dataset. Refer to the description provided in the figure here about the content of each folder to determine which folder you need for your dataset. A high-level folder can only be included from Step 2 and removed from Step 3. You can always come back to this step to include more folders.
Virtually structure your dataset using this interface as if you were organizing it on your computer but without actually modifying any local files. All your requested actions (files to be included, folders to be created, metadata information to be added, etc.) will be registered by SODA and only implemented when the dataset is generated during Step 6.
These are some of the functions you can do while you are in this step.
- Go inside a folder by double-clicking on it.
- Import files/folders inside a folder using drag-and-drop or the "Import" menu located in the upper right corner.
- Create a new folder using the "New folder" button located in the upper right corner. Note that this is only possible inside a high-level SPARC folder. To create a new high-level SPARC folder, go back to Step 2.
- Rename files/folders using the right-click menu option "Rename".
- Remove files/folders using the right-click menu option "Delete".
- Move files/folders using the right-click menu option "Move".
- Multiple-select files/folders by either drag-selecting items or holding Ctrl and clicking items.
- Use the "Details" option from the right-click menu to see the actual path of the file and include metadata (description, Additional Metadata) which will be included in the manifest files if you request SODA to generate them automatically for you (Step 5).
- Use the arrow located in the upper left corner to move up a folder. The current location in the dataset is indicated right next to the arrow.
Click on the applicable panel to include the high-level metadata files of your choice. Note that submission, dataset_description, and subjects files are mandatory for all datasets. The samples file is mandatory if applicable. The other files are optional.
To generate and include the mandatory manifest files automatically, simply toggle the option to "Yes". Note that any existing manifest files at the target location for generating the dataset will be replaced.
Click on the Generate directly on Pennsieve option
and confirm your account.
Select I want to generate on an existing dataset
. If your dataset is not already selected, click on the pencil
icon next to
Current dataset
and select the dataset that you created in Step A2.
Let SODA know how to handle any existing files/folders specified in your dataset that may already exist on the selected Pennsieve dataset. You can use the Replace
option for both if your dataset is empty.
This step serves as a confirmation page before SODA generates your dataset. You can preview your dataset organization with specified SPARC metadata files and specified dataset generate options. This is how your dataset will look once it is generated either on Pennsieve. To edit any details from this step, simply click on the Edit icon next to a section. This will bring you back to the associated section for edits.
When you are ready to generate your dataset, click the Generate button. Wait here until the dataset is generated.
Once the dataset is generated, you will be prompted to share it with the Curation Team. Click on Yes
You can find this feature under Share with Curation Team
under the Disseminate Datasets
tab. The dataset you created during step A2 should be automatically selected (you can select another one by clicking on the pencil symbol next to Current dataset
). Simply click on Share now
to share your dataset with the Curation Team for review. You are now done! Wait to hear back from the Curation Team.
Organize and submit SPARC datasets with SODA
Connect your Pennsieve account with SODA
Upload a local dataset to Pennsieve
Connect your Airtable account with SODA
Create dataset_description.xlsx
Submit for pre-publishing review
Installing the Pennsieve Agent
Pennsieve Agent is already running
Sending log files to SODA Team
Issues regarding hidden files or folders
How to structure the submission metadata file
How to structure the dataset_description metadata file
How to structure the subjects metadata file
How to structure the samples metadata file