diff --git a/articles/bringing-portal-data-to-other-platforms-cbioportal.html b/articles/bringing-portal-data-to-other-platforms-cbioportal.html index d63c8c7..3e0db1b 100644 --- a/articles/bringing-portal-data-to-other-platforms-cbioportal.html +++ b/articles/bringing-portal-data-to-other-platforms-cbioportal.html @@ -68,12 +68,20 @@

Bringing Portal Data to Other Platforms: cBioPortal

-

Document Status: Draft
Estimated Reading Time: 8 min

+

Document Status: Working
Estimated Reading Time: 8 min

Special acknowledgments

-

Functionality demonstrated in this vignette benefited greatly from -code originally written by hhunterzinck.

+

Utils demonstrated in this vignette benefited greatly from code +originally written by hhunterzinck.

+
+
+

Important note +

+

The requirements for cBioPortal change, just like with any software +or database. The package is updated to keep up on a yearly submission +basis, but there may be occasional points in time when the workflow is +out-of-date with this external system.

Intro @@ -107,7 +115,9 @@

Create a new study dataset

First create the study dataset “package” where we can put together the data. Each study dataset combines multiple data types – clinical, -gene expression, gene variants, etc.

+gene expression, gene variants, etc. Meta can be edited after the file +has been created. This will also set the working directory to the new +study directory.

@@ -219,21 +224,17 @@

Add clinical data

Validation

-

There are additional steps such as generating case lists and -validation that have to be done outside of the package with a -cBioPortal backend, where each portal may have specific configurations -(such as genomic reference) to validate against. See the general -docs for dataset validation.

-

For the public portal, the suggested step using the public -server is given below.

-

Assuming your present working directory is -~/datahub/public and a study folder called +

Validation has to be done with a cBioPortal instance. Each portal may +have specific configurations (such as genomic reference) to validate +against.

+

For an example simple offline validation, assuming you are +at ~/datahub/public and a study folder called npst_nfosi_ntap_2022 has been placed into it, mount the dataset into the container and run validation like:

STUDY=npst_nfosi_ntap_2022
-sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:5.4.7 validateStudies.py -d /datahub -l $STUDY -u http://cbioportal.org -html /datahub/$STUDY/html_report
-

The html report will list issues by data types to help with any -corrections needed.

+sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:6.0.25 validateData.py -s datahub/$STUDY -n -v
+

See the general +docs for dataset validation for more examples.

diff --git a/pkgdown.yml b/pkgdown.yml index a9cf8d1..cdae681 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -7,4 +7,4 @@ articles: bringing-portal-data-to-other-platforms-cbioportal: bringing-portal-data-to-other-platforms-cbioportal.html revalidation-workflows: revalidation-workflows.html survey-public-files: survey-public-files.html -last_built: 2025-02-13T19:55Z +last_built: 2025-02-18T21:05Z diff --git a/reference/cbp_add_expression.html b/reference/cbp_add_expression.html index 9fa6d17..09a7f4a 100644 --- a/reference/cbp_add_expression.html +++ b/reference/cbp_add_expression.html @@ -1,11 +1,9 @@ Export and add expression data to cBioPortal dataset — cbp_add_expression • nfportalutils +cBioPortal has confirmed that they prefer normalized counts gene_tpm.tsv."> Skip to contents @@ -49,8 +47,7 @@

Export and add expression data to cBioPortal dataset

This should be run in an existing dataset package root. Note that there are a number of different options generated by the STAR Salmon pipeline. -cBioPortal has confirmed that they prefer normalized counts gene_tpm.tsv and, -though not used, find it helpful to also have raw counts gene_counts.tsv.

+cBioPortal has confirmed that they prefer normalized counts gene_tpm.tsv.

@@ -67,7 +64,7 @@

Argumentsexpression_data_raw -

(Optional) Syn id of raw counts results. See details.

+

(Optional) Syn id of raw counts if curators explicitly ask for it.

verbose
diff --git a/reference/make_case_list_maf.html b/reference/make_case_list_maf.html new file mode 100644 index 0000000..8e99025 --- /dev/null +++ b/reference/make_case_list_maf.html @@ -0,0 +1,71 @@ + +Case lists for mutation samples — make_case_list_maf • nfportalutils + Skip to contents + + +
+
+
+ +
+

https://docs.cbioportal.org/file-formats/#case-lists

+
+ +
+

Usage

+
make_case_list_maf(cancer_study_identifier, verbose = TRUE)
+
+ + +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/write_cbio_clinical.html b/reference/write_cbio_clinical.html index aae48d1..0bcfe35 100644 --- a/reference/write_cbio_clinical.html +++ b/reference/write_cbio_clinical.html @@ -95,11 +95,13 @@

Arguments

Details

-

This relies on a ref_map specification to know which clinical data to include for cBioPortal +

This depends on a ref_map specification to know which clinical data to include for cBioPortal and how to segregate the clinical attributes into the right files. -For example, say df contains clinical variables A-X, but mappings are only specified for -variables A-C, L-M and others are not meant to be surfaced/made public. This will subset the df to what's specified in the mapping. -Conversely, if there is a mapping for variable Z that is not in the clinical data, this will throw error.

+Basically, ref_map decides what variables can be made public and how they should be represented in cBioPortal. +For example, given a table T on Synapse with variables A-Z and mappings in ref_map for A-C + L-M, +we take the intersection of variables present. +But first, check that required variables in ref_map are present. +So first the subset df is created from T.