Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YODA principles #113

Merged
merged 31 commits into from
Sep 1, 2019
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
11fa1a4
add placeholder for YODA principles
adswa Aug 19, 2019
a1b4eb4
add an intro to the book part
adswa Aug 20, 2019
e65b97f
rename yoda file
adswa Aug 20, 2019
149a911
update contents
adswa Aug 21, 2019
fa90777
YODA and call the DataLad team geeks
adswa Aug 21, 2019
7a77819
summarize YODA practices
adswa Aug 21, 2019
7cfa376
add yoda
adswa Aug 21, 2019
567d965
include yoda
adswa Aug 21, 2019
16b022b
add svg for yoda and modular datasets
adswa Aug 21, 2019
13b7423
start restructuring
adswa Aug 21, 2019
38bd4db
add ref to siblings
adswa Aug 21, 2019
dd17f9f
svg instead of png
adswa Aug 21, 2019
06c94dd
add full YODA wf image
adswa Aug 21, 2019
2b9af3b
WIP on P1 & P2
adswa Aug 21, 2019
86b1a81
add data_origin.svg for P2
adswa Aug 21, 2019
bcb6d78
Merge branch 'master' of github.com:datalad-handbook/book into yoda
adswa Aug 22, 2019
f775916
Merge branch 'master' of github.com:datalad-handbook/book into yoda
adswa Aug 22, 2019
2aa1883
start with a bad example
adswa Aug 22, 2019
e87d789
finalize a first draft
adswa Aug 22, 2019
8a6603c
typos, formatting, tweaks
adswa Aug 26, 2019
ba07a6d
move summary into dedicated page
adswa Aug 26, 2019
103af4e
add yoda procedure
adswa Aug 26, 2019
69694d2
WIP: attempting an outlook in the introduction
adswa Aug 27, 2019
aefd539
first round of comments
adswa Aug 27, 2019
c7fd16d
P1: link figures a bit better
adswa Aug 28, 2019
f52b4a9
finish a high-level overview of what will be learned with the handbook
adswa Aug 29, 2019
a47fe56
upper case heading
adswa Aug 29, 2019
8ef19b9
link FAIR website
adswa Aug 29, 2019
6d93554
explicitly state dataset nesting
adswa Aug 29, 2019
a58c663
Merge branch 'master' of github.com:datalad-handbook/book into yoda
adswa Aug 29, 2019
98798df
add missing links to yoda elsewhere
adswa Aug 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/basics/101-122-intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
.. _intromidterm:

A Data Analysis Project with DataLad
------------------------------------


Time flies and the semester rapidly approaches the midterms.
In DataLad-101, students are not given an exam -- instead, they are
asked to complete and submit a data analysis project with DataLad.

The lecturer hands out the requirements: The projects needs to

- be prepared in the form of a DataLad dataset
- needs to contain a data analysis performed with Python tools
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't read in full yet, but this wants to tell me that something in here is Python-specific, and I can ignore it for other projects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I get your point. I will try to reduce that feeling, if we keep this.
Can I get your take on the general idea? It was to put the YODA principles into the context of the narrative, and this I thought was easiest possible in the context of a data analysis. My initial idea was:

However, thinking about this now, it also feels like a lot in a single chapter (Yoda, Python API, datalad publish). The alternative would be to have individual parts in their own chapters or as parts of other chapters, and then combine/apply them in a single section.

I'm undecided yet, so if anyone has preferences...

- should incorporate DataLad whenever possible (data retrieval, publication,
script execution, general version control) and
- needs to comply to the YODA principles

Luckily, the midterms are only in a couple of weeks, and a lot of the
requirements of the project will be taught in the upcoming sessions.
Therefore, there's little you can do to prepare for the midterm
than to be extra attentive on the next lectures on the YODA
principles and DataLads Python API.
433 changes: 433 additions & 0 deletions docs/basics/101-123-yoda.rst

Large diffs are not rendered by default.

53 changes: 53 additions & 0 deletions docs/basics/101-124-summary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. _summary_yoda:

Summary: YODA principles
------------------------

The YODA principles are a small set of guidelines that can make a huge
difference towards reproducibility, comprehensibility, and transparency
in a data analysis project.

These standards are not complex -- quite the opposite, they are very
intuitive. They structure essential components of a data analysis project --
data, code, computational environments, and lastly also the results --
in a modular and practical way, and use basic principles and commands
of DataLad you are already familiar with.

There are many advantages to this organization of contents.

- Having input data as independent dataset(s) that are not influenced (only
consumed) by an analysis allows for a modular reuse of pure data datasets,
and does not conflate the data of an analysis with the results or the code.

- Keeping code within an independent, version-controlled directory, but as a part
of the analysis dataset, makes sharing code easy and transparent, and helps
to keep directories neat and organized. Moreover,
with the data as subdatasets, data and code can be automatically shared together.

- Including the computational environment into an analysis dataset encapsulates
software and software versions, and thus prevents re-computation failures
(or sudden differences in the results) once
software is updated, and software conflicts arising on different machines
than the one the analysis was originally conducted on.

- Having all of these components as part of a DataLad dataset allows version
controlling all pieces within the analysis regardless of their size, and
generates provenance for everything, especially if you make use of the tools
that DataLad provides.

- The yoda procedure is a good starting point to build your next data analysis
project up on.

Now what can I do with it?
^^^^^^^^^^^^^^^^^^^^^^^^^^

Using tools that DataLad provides you are able to make the most out of
your data analysis project. The YODA principles are a guide to accompany
you on your path to reproducibility.

What should have become clear in this section is that you are already
equipped with enough DataLad tools and knowledge that complying to these
standards will feel completely natural and effortless in your next analysis
project.
The next section will add to your existing skills by demonstrating how to
use DataLad also within Python scripts.
12 changes: 12 additions & 0 deletions docs/contents.rst.inc
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,18 @@ Help yourself

basics/101-135-help

#########################
Data analyses in datasets
#########################

.. toctree::
:maxdepth: 1
:caption: Organizational principles and best practices for data analyses

basics/101-122-intro
basics/101-123-yoda
basics/101-124-summary

#########
Use Cases
#########
Expand Down
303 changes: 303 additions & 0 deletions docs/img/data_origin.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,153 changes: 5,153 additions & 0 deletions docs/img/dataset_modules.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading