Skip to content

Commit

Permalink
Merge pull request #113 from datalad-handbook/yoda
Browse files Browse the repository at this point in the history
YODA principles
  • Loading branch information
mih authored Sep 1, 2019
2 parents e14667e + 98798df commit e25a68d
Show file tree
Hide file tree
Showing 12 changed files with 13,539 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/basics/101-108-run.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ to generate a text file that lists speaker and title
name instead.

To do this, we're following a best practice that will reappear in the
later section on YODA principles (todo: link): Collecting all
later section on `YODA principles <101-123-yoda.html>`_ : Collecting all
additional scripts that work with content of a subdataset *outside*
of this subdataset, in a dedicated ``code/`` directory,
and collating the output of the execution of these scripts
Expand Down
23 changes: 23 additions & 0 deletions docs/basics/101-122-intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
.. _intromidterm:

A Data Analysis Project with DataLad
------------------------------------


Time flies and the semester rapidly approaches the midterms.
In DataLad-101, students are not given an exam -- instead, they are
asked to complete and submit a data analysis project with DataLad.

The lecturer hands out the requirements: The projects needs to

- be prepared in the form of a DataLad dataset
- needs to contain a data analysis performed with Python tools
- should incorporate DataLad whenever possible (data retrieval, publication,
script execution, general version control) and
- needs to comply to the YODA principles

Luckily, the midterms are only in a couple of weeks, and a lot of the
requirements of the project will be taught in the upcoming sessions.
Therefore, there's little you can do to prepare for the midterm
than to be extra attentive on the next lectures on the YODA
principles and DataLads Python API.
461 changes: 461 additions & 0 deletions docs/basics/101-123-yoda.rst

Large diffs are not rendered by default.

53 changes: 53 additions & 0 deletions docs/basics/101-124-summary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. _summary_yoda:

Summary: YODA principles
------------------------

The YODA principles are a small set of guidelines that can make a huge
difference towards reproducibility, comprehensibility, and transparency
in a data analysis project.

These standards are not complex -- quite the opposite, they are very
intuitive. They structure essential components of a data analysis project --
data, code, computational environments, and lastly also the results --
in a modular and practical way, and use basic principles and commands
of DataLad you are already familiar with.

There are many advantages to this organization of contents.

- Having input data as independent dataset(s) that are not influenced (only
consumed) by an analysis allows for a modular reuse of pure data datasets,
and does not conflate the data of an analysis with the results or the code.

- Keeping code within an independent, version-controlled directory, but as a part
of the analysis dataset, makes sharing code easy and transparent, and helps
to keep directories neat and organized. Moreover,
with the data as subdatasets, data and code can be automatically shared together.

- Including the computational environment into an analysis dataset encapsulates
software and software versions, and thus prevents re-computation failures
(or sudden differences in the results) once
software is updated, and software conflicts arising on different machines
than the one the analysis was originally conducted on.

- Having all of these components as part of a DataLad dataset allows version
controlling all pieces within the analysis regardless of their size, and
generates provenance for everything, especially if you make use of the tools
that DataLad provides.

- The yoda procedure is a good starting point to build your next data analysis
project up on.

Now what can I do with it?
^^^^^^^^^^^^^^^^^^^^^^^^^^

Using tools that DataLad provides you are able to make the most out of
your data analysis project. The YODA principles are a guide to accompany
you on your path to reproducibility.

What should have become clear in this section is that you are already
equipped with enough DataLad tools and knowledge that complying to these
standards will feel completely natural and effortless in your next analysis
project.
The next section will add to your existing skills by demonstrating how to
use DataLad also within Python scripts.
12 changes: 12 additions & 0 deletions docs/contents.rst.inc
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,18 @@ Help yourself

basics/101-135-help

#########################
Data analyses in datasets
#########################

.. toctree::
:maxdepth: 1
:caption: Organizational principles and best practices for data analyses

basics/101-122-intro
basics/101-123-yoda
basics/101-124-summary

#########
Use Cases
#########
Expand Down
303 changes: 303 additions & 0 deletions docs/img/data_origin.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,153 changes: 5,153 additions & 0 deletions docs/img/dataset_modules.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit e25a68d

Please sign in to comment.