From 6baff6488fafacdc5915666530dddf0fd4deeba2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Szczepanik?= Date: Mon, 27 Nov 2023 18:43:08 +0100 Subject: [PATCH 01/21] Fix script names in documentation This fixes a previous omission and updates usage help snippets to match documentation headings (and actual script names). --- docs/source/admin.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/admin.rst b/docs/source/admin.rst index ad2605d..c0e5094 100644 --- a/docs/source/admin.rst +++ b/docs/source/admin.rst @@ -112,8 +112,8 @@ files, in JSON format, in the study directory: .. code-block:: console - $ icf-utils getmeta_studyvisit -h - usage: getmeta_studyvisit [-h] [-o PATH] --id STUDY-ID VISIT-ID + $ icf-utils deposit_visit_metadata -h + usage: deposit_visit_metadata [-h] [-o PATH] --id STUDY-ID VISIT-ID ``deposit_visit_dataset`` """"""""""""""""""""""""" @@ -136,8 +136,8 @@ represents the actual dataset as a compressed archive. .. code-block:: console - $ icf-utils dataladify_studyvisit_from_meta -h - usage: dataladify_studyvisit_from_meta [-h] [-o PATH] --id STUDY-ID VISIT-ID + $ icf-utils deposit_visit_dataset -h + usage: deposit_visit_dataset [-h] --id STUDY-ID VISIT-ID [-o PATH] [--store-url URL] ``catalogify_studyvisit_from_meta`` """"""""""""""""""""""""""""""""""" @@ -149,4 +149,4 @@ folder in the study directory. .. code-block:: console $ icf-utils dataladify_studyvisit_from_meta --help - usage: dataladify_studyvisit_from_meta [-h] [-o PATH] --id STUDY-ID VISIT-ID + usage: catalogify_studyvisit_from_meta [-h] [-o PATH] --id STUDY-ID VISIT-ID From a723682c7ed023bcc1d072e23966e4b2ed47bb76 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Szczepanik?= Date: Wed, 29 Nov 2023 13:59:24 +0100 Subject: [PATCH 02/21] Update docs requirements Seeing that a build on Sphinx 7.0.1 succeeded, but a build on Sphinx 7.2.6 failed with Furo 2023.5.20 theme, I decided to bump furo version and narrow down Sphinx to a minor release. Furo changelog: https://pradyunsg.me/furo/changelog/ If you are reading this commit message and considering updating or loosening the dependencies, then you probably should do it without issues. I suppose even now an unpinned dependency would work; it's probably on rare occasions when the released version of the theme needs to catch up to Sphinx development, which likely caused us to introduce pinning in first place. In either case, it would seem that pinning Furo to a specific version, but Sphinx only to major release is not a good idea. --- docs/requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/requirements.txt b/docs/requirements.txt index bed3e43..ed6ac1f 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,2 +1,2 @@ -Sphinx >= 7.0, < 8.0 -furo==2023.5.20 +Sphinx == 7.2.* +furo == 2023.9.10 From 79fee18fbd2cb111435fb0829760e0e42722be5c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Szczepanik?= Date: Wed, 29 Nov 2023 16:43:40 +0100 Subject: [PATCH 03/21] Draft the user docs for generating DataLad datasets --- docs/source/user/datalad-generate.rst | 64 +++++++++++++++++++++++++++ docs/source/user/index.rst | 1 + 2 files changed, 65 insertions(+) create mode 100644 docs/source/user/datalad-generate.rst diff --git a/docs/source/user/datalad-generate.rst b/docs/source/user/datalad-generate.rst new file mode 100644 index 0000000..37f459d --- /dev/null +++ b/docs/source/user/datalad-generate.rst @@ -0,0 +1,64 @@ +.. _dl-generate: + +Generate DataLad datasets +------------------------- + +The ICF archive for a given project contains DICOM files packaged in +tar archives (DICOM tarballs). In this section we describe creating +DataLad datasets, which index content and location of these tarballs, +for DataLad-based access on institute-local infrastructure. + +In principle, such datasets are *lightweight*, meaning that they only +index the content that can be retrieved from the ICF archive (all +access restrictions apply). Using DataLad can simplify local access, +allow raw data versioning, and enable logical transformations of the +DICOM folder structure - see :ref:`dl-advanced` for examples of the +latter. + +Obtain the tarball +^^^^^^^^^^^^^^^^^^ + +First, create an empty directory to be the local dataset store. The +last path component must be the ``project-ID`` used by the ICF store, +because following commands use project and visit IDs to determine +paths. + +.. code-block:: bash + + mkdir -p local_dicomstore/ + +Download the visit tarball: + +.. code-block:: bash + + cd local_dicomstore/ + datalad download ... + cd ../.. + +Deposit visit metadata alongside tarball +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: bash + + singularity run -B $STORE_DIR icf.sif deposit_visit_metadata --store-dir $STORE_DIR --id + +Deposit dataset alongside tarball +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +DataLad dataset is created based on the metadata extracted in the +previous step. Additionally, you need to provide the base URL of the +ICF store, ```` (this base URL should not contain study +or visit ID). The URL, combined with respective IDs, will be +registered in the dataset as the source of the DICOM tarball, and used +for retrieval by dataset clones. + +.. code-block:: bash + + singularity run -B $STORE_DIR icf.sif deposit_visit_dataset --store-dir $STORE_DIR --store-url + +This will produce two files, ... + +Remove the tarball +^^^^^^^^^^^^^^^^^^ + +The DICOM tarball can be safely removed. diff --git a/docs/source/user/index.rst b/docs/source/user/index.rst index fcee949..cc60f64 100644 --- a/docs/source/user/index.rst +++ b/docs/source/user/index.rst @@ -15,5 +15,6 @@ Please contact `ICF personnel`_ to get access and for any authentication-related :caption: Contents: browser + datalad-generate datalad datalad-advanced From 9064bcb601c3a07ef3ee7af5db3d23abdb6fbe56 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Szczepanik?= Date: Tue, 12 Dec 2023 18:54:31 +0100 Subject: [PATCH 04/21] Fill in most blanks in DataLad dataset generation --- docs/source/user/datalad-generate.rst | 40 ++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/docs/source/user/datalad-generate.rst b/docs/source/user/datalad-generate.rst index 37f459d..035cbb7 100644 --- a/docs/source/user/datalad-generate.rst +++ b/docs/source/user/datalad-generate.rst @@ -35,12 +35,31 @@ Download the visit tarball: datalad download ... cd ../.. +For the following examples, the *absolute path* to the local dicom +store will be represented by ``$STORE_DIR``: + +.. code-block:: bash + + export STORE_DIR=$PWD/local_dicomstore + + Deposit visit metadata alongside tarball ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Information required to create a DataLad dataset needs to be extracted +from the tarball: + .. code-block:: bash - singularity run -B $STORE_DIR icf.sif deposit_visit_metadata --store-dir $STORE_DIR --id + singularity run -B $STORE_DIR icf.sif deposit_visit_metadata \ + --store-dir $STORE_DIR --id + +This will generate two files, ``_metadata_dicoms.json`` and +``_metadata_tarball.json``, and place them alongside the +tarball. The former contains metadata describing individual files +within the tarball (relative path, MD5 checksum, size, and a small +subset of DICOM headers describing acquisition type), and the latter +describes the tarball itself. Deposit dataset alongside tarball ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -54,11 +73,24 @@ for retrieval by dataset clones. .. code-block:: bash - singularity run -B $STORE_DIR icf.sif deposit_visit_dataset --store-dir $STORE_DIR --store-url + singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \ + --store-dir $STORE_DIR --store-url -This will produce two files, ... +This will produce two files, ``_XDLA--refs`` and ``_XDLA--repo-export`` (text file and zip archive +respectively). Together, they are a representation of a (lightweight) +DataLad dataset, and contain the information necessary to retrieve the +data content with DataLad (but do not contain the data content +itself). Remove the tarball ^^^^^^^^^^^^^^^^^^ -The DICOM tarball can be safely removed. +Finally, the DICOM tarball can be safely removed. + +.. code-block:: bash + + rm local_dicomstore//_dicom.tar + +The local dicom store can be used as a DataLad entry point for +obtaining the dicom files. From 82026ff5328d7016fa112b6af63a2f560202bd24 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Szczepanik?= Date: Fri, 15 Dec 2023 17:49:25 +0100 Subject: [PATCH 05/21] docs: rename datalad-based access --- docs/source/user/{datalad.rst => datalad-access.rst} | 0 docs/source/user/index.rst | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) rename docs/source/user/{datalad.rst => datalad-access.rst} (100%) diff --git a/docs/source/user/datalad.rst b/docs/source/user/datalad-access.rst similarity index 100% rename from docs/source/user/datalad.rst rename to docs/source/user/datalad-access.rst diff --git a/docs/source/user/index.rst b/docs/source/user/index.rst index cc60f64..3625e02 100644 --- a/docs/source/user/index.rst +++ b/docs/source/user/index.rst @@ -16,5 +16,5 @@ Please contact `ICF personnel`_ to get access and for any authentication-related browser datalad-generate - datalad + datalad-access datalad-advanced From e4bc0f5f40ce4c2f973027705204a189c3dcdbd5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Szczepanik?= Date: Fri, 15 Dec 2023 18:23:51 +0100 Subject: [PATCH 06/21] Rewrite datalad-based access This changes the store base URL used in the examples from https://data.inm-icf.de to file:///data/group/..., reflecting the fact that the DataLad datasets are not provided by the ICF -- but they can be generated on local infrastructure instead. Title is changed (making it more similar to the previous section), and some explanations are slightly tweaked. --- docs/source/user/datalad-access.rst | 42 ++++++++++++++++++----------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/docs/source/user/datalad-access.rst b/docs/source/user/datalad-access.rst index a4a3001..f03343d 100644 --- a/docs/source/user/datalad-access.rst +++ b/docs/source/user/datalad-access.rst @@ -1,5 +1,7 @@ -DataLad-based access --------------------- +.. _dl-access: + +Access data with DataLad +------------------------ Software requirements ^^^^^^^^^^^^^^^^^^^^^ @@ -44,41 +46,43 @@ to DataLad. Clone & get ^^^^^^^^^^^ -A visit dataset can be cloned with DataLad from a URL containing the -following components: +If a visit dataset has been prepared (procedure described in +:ref:`dl-generate`) and saved in an accessible location, it can be +cloned with DataLad from a URL containing the following components: -* store base URL (e.g., ``https://data.inm-icf.de``) +* a set of configuration parameters, always constant +* store base URL (e.g., ``file:///data/group/groupname/local_dicom_store``) [2]_ * study ID (e.g., ``my-study``) * visit ID (e.g., ``P000123``) -* a set of additional parameters, always constant +* a file name suffix / template, ``_annex{{annex_key}}`` (verbatim), always constant The pattern for the URL is:: - 'datalad-annex::?type=external&externaltype=uncurl&url=//_{{annex_key}}&encryption=none' + 'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=//_{{annex_key}}' Given the exemplary values above, the pattern would expand to .. code-block:: - 'datalad-annex::?type=external&externaltype=uncurl&url=https://data.inm-icf.de/my-study/P000123_{{annex_key}}&encryption=none' + 'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/my-study/P000123_{{annex_key}}' -.. note:: The URL is arguably a bit clunky. A convenience short cut can be provided via configuration item ``datalad.clone.url-substitute.