https://docs.cbioportal.org/file-formats/#case-lists
+diff --git a/articles/bringing-portal-data-to-other-platforms-cbioportal.html b/articles/bringing-portal-data-to-other-platforms-cbioportal.html index d63c8c7..3e0db1b 100644 --- a/articles/bringing-portal-data-to-other-platforms-cbioportal.html +++ b/articles/bringing-portal-data-to-other-platforms-cbioportal.html @@ -68,12 +68,20 @@
Document Status: Draft
Estimated Reading Time: 8 min
Document Status: Working
Estimated Reading Time: 8 min
Functionality demonstrated in this vignette benefited greatly from -code originally written by hhunterzinck.
+Utils demonstrated in this vignette benefited greatly from code +originally written by hhunterzinck.
+The requirements for cBioPortal change, just like with any software +or database. The package is updated to keep up on a yearly submission +basis, but there may be occasional points in time when the workflow is +out-of-date with this external system.
First create the study dataset “package” where we can put together the data. Each study dataset combines multiple data types – clinical, -gene expression, gene variants, etc.
+gene expression, gene variants, etc. Meta can be edited after the file +has been created. This will also set the working directory to the new +study directory.
cbp_new_study(cancer_study_identifier = "npst_nfosi_ntap_2022",
@@ -124,7 +134,7 @@ Add data types to studyNote that:
- These should be run with the working directory set to the study
-dataset directory as set up above to ensure consistent metadata.
+directory as set up above to ensure consistent metadata.
-
Defaults are for known NF-OSI processed data
outputs.
@@ -140,9 +150,8 @@ Add mutations data
@@ -188,8 +197,6 @@ Add expression dataAdd clinical data
-- Clinical data should be added last, after all other
-data has been added, for sample checks to work properly.
-
clinical_data
is prepared from an existing Synapse
table. The table can be a subsetted version of those released in the
@@ -205,13 +212,11 @@ Add clinical data
clinical_data <- "select * from syn43278088" # query when the table already contains just the releasable patients
-ref_map <- "https://raw.githubusercontent.com/nf-osi/nf-metadata-dictionary/main/mappings/cBioPortal.yaml"
+ref_map <- "https://raw.githubusercontent.com/nf-osi/nf-metadata-dictionary/main/mappings/cBioPortal/cBioPortal.yaml"
cbp_add_clinical(clinical_data, ref_map)
There are additional steps such as generating case lists and -validation that have to be done outside of the package with a -cBioPortal backend, where each portal may have specific configurations -(such as genomic reference) to validate against. See the general -docs for dataset validation.
-For the public portal, the suggested step using the public -server is given below.
-Assuming your present working directory is
-~/datahub/public
and a study folder called
+
Validation has to be done with a cBioPortal instance. Each portal may +have specific configurations (such as genomic reference) to validate +against.
+For an example simple offline validation, assuming you are
+at ~/datahub/public
and a study folder called
npst_nfosi_ntap_2022
has been placed into it, mount the
dataset into the container and run validation like:
STUDY=npst_nfosi_ntap_2022
-sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:5.4.7 validateStudies.py -d /datahub -l $STUDY -u http://cbioportal.org -html /datahub/$STUDY/html_report
-The html report will list issues by data types to help with any -corrections needed.
+sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:6.0.25 validateData.py -s datahub/$STUDY -n -v +See the general +docs for dataset validation for more examples.
diff --git a/pkgdown.yml b/pkgdown.yml index a9cf8d1..cdae681 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -7,4 +7,4 @@ articles: bringing-portal-data-to-other-platforms-cbioportal: bringing-portal-data-to-other-platforms-cbioportal.html revalidation-workflows: revalidation-workflows.html survey-public-files: survey-public-files.html -last_built: 2025-02-13T19:55Z +last_built: 2025-02-18T21:05Z diff --git a/reference/cbp_add_expression.html b/reference/cbp_add_expression.html index 9fa6d17..09a7f4a 100644 --- a/reference/cbp_add_expression.html +++ b/reference/cbp_add_expression.html @@ -1,11 +1,9 @@This should be run in an existing dataset package root.
Note that there are a number of different options generated by the STAR Salmon pipeline.
-cBioPortal has confirmed that they prefer normalized counts gene_tpm.tsv
and,
-though not used, find it helpful to also have raw counts gene_counts.tsv
.
gene_tpm.tsv
.
(Optional) Syn id of raw counts results. See details.
(Optional) Syn id of raw counts if curators explicitly ask for it.
https://docs.cbioportal.org/file-formats/#case-lists
+This relies on a ref_map
specification to know which clinical data to include for cBioPortal
+
This depends on a ref_map
specification to know which clinical data to include for cBioPortal
and how to segregate the clinical attributes into the right files.
-For example, say df
contains clinical variables A-X, but mappings are only specified for
-variables A-C, L-M and others are not meant to be surfaced/made public. This will subset the df
to what's specified in the mapping.
-Conversely, if there is a mapping for variable Z that is not in the clinical data, this will throw error.
ref_map
decides what variables can be made public and how they should be represented in cBioPortal.
+For example, given a table T
on Synapse with variables A-Z and mappings in ref_map
for A-C + L-M,
+we take the intersection of variables present.
+But first, check that required variables in ref_map are present.
+So first the subset df
is created from T
.