-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can admins enable parallel processing for qiime2 tools #58
Comments
It probably makes the most sense for I think one question is what are the exact semantics of GALAXY_SLOTS? Are these available threads/cores or something more abstract? |
Found the docs here: Looks straight-forward. I think we can update I think we would not do anything for our |
Also, out of curiosity, is there any mechanism to submit a new job from inside a galaxy job and retain some kind of reference/future to it? @Oddant1 is refactoring some stuff with our parallel processing and there's an outside chance we could make this happen if such an API existed and server admins were amenable to the concept of it. |
This sounds good to me. I think it would be good to add resource requirements to the tools. Since otherwise admins (or dynamic job rules) have no means to judge which tools support parallelism. <requirements>
...
<resource type="min_cores">X</resource>
<resource type="max_cores">Y</resource>
</requirements> For completeness, there are more resource types .. see here. Edit: For instance, you could add
|
There is an API that allows to execute Galaxy tools that are installed on a Galaxy instance. But doing this from inside tools seems to be a bad idea, because it will be difficult for users to trace what has been executed. You could only run existing Galaxy tools and I think this would be much better implemented as a workflow. Also I do not know if we can assume that the Galaxy instance can be reached from the executing host. One also needs to keep in mind the diversity of Galaxy job runners (local, SLURM, AWS, pulsar, ...). So I do not think that there can be a single mechanism and intuitively I would think that subprocesses / threads are the way to go. Might it be an option to tweak the granularity of the tools in case you need parallelism beyond a single compute node? E.g. by splitting inputs / making the jobs that are subprocesses separate tools? |
That makes sense and I think we could test for it and do something else. But this felt like a long-shot either way.
Yep that all makes sense. I think this just leaves us where we were anticipating. For cross-node parallelism in Galaxy, the answer is partition your data into a Collection (which maps to the Galaxy Collection) and then just do things normally. Since most of our metagenomic tools are written in this split-apply-combine style inside QIIME 2 pipeline actions, it means that these inner methods exist, so users should just be in the habit of using them directly instead of the simpler 1-shot pipeline actions. |
Perfect, that should map to our |
Hi @ebolyen and @colinvwood
In #47 parameters setting the number of cores have been removed from the XML (which was the right thing to do.
I'm wondering how can the number of cores now be set (by admins). Typically Galaxy tools use the
GALAXY_SLOTS
environment variable (e.g. here) and pass it via CLI parameter. Alternatively qiime2 tools could of course directly access the variable.The text was updated successfully, but these errors were encountered: