Skip to content

Commit

Permalink
Update motivation.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ddooley authored Nov 6, 2024
1 parent 18fd0de commit 11ab2d1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/Data_Standardization/motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Challenges faced by researchers and other data consumers:
* **Search result precision and recall**: Searching for a few general dataset domain keywords in a data catalogue will increasingly yield too many results to humanly sift through (false positives, and not suitably ranked to facilitate a cutoff point). Conversley, providing more keywords, and more specific ones, may winnow down candidate datasets too sharply, excluding good ones that simply used some different keyword synonyms (false negatives). Use of keywords within or across data catalogues is often inconsistent and is a fraction of the terms needed to differentiate what datasets are about.
* **Dimensional filters**: Filters specific to experimental design and protocol, subject demographics, and biological or social context are often missing, leading to further investigation to see if specific data types are being collected, with comparable methods, and in a comparable context. Some efforts to provide standard fields that pertain to database scope exist (e.g. fields for [temporalCoverage](https://schema.org/temporalCoverage), [countryOfOrigin](https://schema.org/countryOfOrigin), and
[contentLocation](https://schema.org/contentLocation)).
* **Data normalization**: Once a researcher has located fit-for-purpose databases for federation towards answering research questions, it has been estimated that 80% of their analytic time is consumed in field-level preparatory harmonization of the data. FAIR data proponents advocate for an initial 5% investment of project budget towards standardized data products to lessen this downstream burden as well as the lost-opportunity costs of failure to discover data. [FAIR DATA REF - Barens]
* **Data normalization**: Once a researcher has located fit-for-purpose databases for federation towards answering research questions, it has been estimated that 80% of their analytic time is consumed in field-level preparatory harmonization of the data. FAIR data proponents advocate for an initial 5% investment of project budget towards standardized data products to lessen this downstream burden as well as the lost-opportunity costs of failure to discover data. [[Invest 5% of research funds in ensuring data are reusable](https://www.nature.com/articles/d41586-020-00505-7)]

Up-front attention to data standardization encourages reuse and avoids later costly work required for peer-to-peer dataset mapping as new downstream users of project data are encountered.

Expand Down

0 comments on commit 11ab2d1

Please sign in to comment.