You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The original thinking was to create a new API layer around a new run-record with a new executioner (special remote). It is worth taking a step back and reevaluate the encapsulation with CWL having entered the picture. Rationale: If we adopt CWL, we might as well make it maximally useful, and not just an internal tool.
A main attraction of CWL is that it is its own ecosystem, and connecting rather than reimplementing is good. Having compute instruction defined as CWL "steps" that could be linked into larger workflows and executed (outside datalad) via standard batch systems would be great. In such a scenario, we would need to make sure that the versioning precision and data provisioning capabilities of datalad remain available.
One way to achieve this would be to have a dedicated provisioning workflow step. It would use a dedicated tool to create a suitable working environment for a subsequent payload computation step. This datalad-tool could obtain/checkout/pre-populate a dataset from any supported source/identifier. And then hand-over to the next step, in which standard CWL input types like File make sense, and are sufficient.
This also had the advantage that also the provisioning provenance would be automatically captured.
It would also be a "loose" CWL integration because using any of these tools inside or outside CWL makes sense, and is possible without making either system aware of the other.
There is also no need to have an exclusive datalad-tool as the provisioning solution. It would be perfectly fine to have a series of git-clone/annex-init/annex-get commands.
A remake special remote could then make smart decisions. It could
run cwltool directly on the worktree of a dataset (whenever it has the right version and all needed content present)
auto-generate a workflow that uses a provisioning helper to build an adequate worktree for a CWL payload-workflow to run
The text was updated successfully, but these errors were encountered:
The original thinking was to create a new API layer around a new run-record with a new executioner (special remote). It is worth taking a step back and reevaluate the encapsulation with CWL having entered the picture. Rationale: If we adopt CWL, we might as well make it maximally useful, and not just an internal tool.
A main attraction of CWL is that it is its own ecosystem, and connecting rather than reimplementing is good. Having compute instruction defined as CWL "steps" that could be linked into larger workflows and executed (outside datalad) via standard batch systems would be great. In such a scenario, we would need to make sure that the versioning precision and data provisioning capabilities of datalad remain available.
One way to achieve this would be to have a dedicated provisioning workflow step. It would use a dedicated tool to create a suitable working environment for a subsequent payload computation step. This datalad-tool could obtain/checkout/pre-populate a dataset from any supported source/identifier. And then hand-over to the next step, in which standard CWL input types like
File
make sense, and are sufficient.This also had the advantage that also the provisioning provenance would be automatically captured.
It would also be a "loose" CWL integration because using any of these tools inside or outside CWL makes sense, and is possible without making either system aware of the other.
There is also no need to have an exclusive datalad-tool as the provisioning solution. It would be perfectly fine to have a series of git-clone/annex-init/annex-get commands.
A
remake
special remote could then make smart decisions. It couldcwltool
directly on the worktree of a dataset (whenever it has the right version and all needed content present)The text was updated successfully, but these errors were encountered: