Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can admins enable parallel processing for qiime2 tools #58

Open
bernt-matthias opened this issue Nov 19, 2024 · 7 comments
Open

How can admins enable parallel processing for qiime2 tools #58

bernt-matthias opened this issue Nov 19, 2024 · 7 comments
Assignees

Comments

@bernt-matthias
Copy link

Hi @ebolyen and @colinvwood

In #47 parameters setting the number of cores have been removed from the XML (which was the right thing to do.

I'm wondering how can the number of cores now be set (by admins). Typically Galaxy tools use the GALAXY_SLOTS environment variable (e.g. here) and pass it via CLI parameter. Alternatively qiime2 tools could of course directly access the variable.

@ebolyen
Copy link
Member

ebolyen commented Nov 19, 2024

It probably makes the most sense for q2galaxy to directly access that variable, it'll be much simpler than templating anything specifically, and since the parameters are marked with their own special type, we can handle this robustly.

I think one question is what are the exact semantics of GALAXY_SLOTS? Are these available threads/cores or something more abstract?

@ebolyen ebolyen moved this from Needs Triage to Awaiting Info in QIIME 2 - Triage 🚑 Nov 19, 2024
@ebolyen
Copy link
Member

ebolyen commented Nov 19, 2024

Found the docs here:
https://planemo.readthedocs.io/en/master/writing_advanced.html#developing-for-clusters-galaxy-slots-galaxy-memory-mb-and-galaxy-memory-mb-per-slot

Looks straight-forward. I think we can update action_kwargs here to add $GALAXY_SLOTS as the argument to a parameter when it has the Thread primitive type.

I think we would not do anything for our Job type, as that is better handled by Galaxy running the same action over a collection.

@ebolyen ebolyen moved this from Awaiting Info to Needs Prioritization in QIIME 2 - Triage 🚑 Nov 19, 2024
@ebolyen
Copy link
Member

ebolyen commented Nov 19, 2024

Also, out of curiosity, is there any mechanism to submit a new job from inside a galaxy job and retain some kind of reference/future to it? @Oddant1 is refactoring some stuff with our parallel processing and there's an outside chance we could make this happen if such an API existed and server admins were amenable to the concept of it.

@bernt-matthias
Copy link
Author

bernt-matthias commented Nov 20, 2024

This sounds good to me. I think it would be good to add resource requirements to the tools. Since otherwise admins (or dynamic job rules) have no means to judge which tools support parallelism.

<requirements>
   ...
   <resource type="min_cores">X</resource>
   <resource type="max_cores">Y</resource>
</requirements>

For completeness, there are more resource types .. see here.

Edit: For instance, you could add

  • <resource type="max_cores">1</resource> for tools not supporting parallelism
  • <resource type="min_cores">1</resource> for tools supporting parallelism (<resource type="max_cores">X</resource> if more than X cores would be inefficient).

@bernt-matthias
Copy link
Author

Also, out of curiosity, is there any mechanism to submit a new job from inside a galaxy job and retain some kind of reference/future to it? @Oddant1 is refactoring some stuff with our parallel processing and there's an outside chance we could make this happen if such an API existed and server admins were amenable to the concept of it.

There is an API that allows to execute Galaxy tools that are installed on a Galaxy instance. But doing this from inside tools seems to be a bad idea, because it will be difficult for users to trace what has been executed. You could only run existing Galaxy tools and I think this would be much better implemented as a workflow. Also I do not know if we can assume that the Galaxy instance can be reached from the executing host.

One also needs to keep in mind the diversity of Galaxy job runners (local, SLURM, AWS, pulsar, ...). So I do not think that there can be a single mechanism and intuitively I would think that subprocesses / threads are the way to go. Might it be an option to tweak the granularity of the tools in case you need parallelism beyond a single compute node? E.g. by splitting inputs / making the jobs that are subprocesses separate tools?

@ebolyen
Copy link
Member

ebolyen commented Nov 20, 2024

Also I do not know if we can assume that the Galaxy instance can be reached from the executing host.

That makes sense and I think we could test for it and do something else. But this felt like a long-shot either way.

One also needs to keep in mind the diversity of Galaxy job runners (local, SLURM, AWS, pulsar, ...). So I do not think that there can be a single mechanism and intuitively I would think that subprocesses / threads are the way to go. Might it be an option to tweak the granularity of the tools in case you need parallelism beyond a single compute node? E.g. by splitting inputs / making the jobs that are subprocesses separate tools?

Yep that all makes sense. I think this just leaves us where we were anticipating. For cross-node parallelism in Galaxy, the answer is partition your data into a Collection (which maps to the Galaxy Collection) and then just do things normally. Since most of our metagenomic tools are written in this split-apply-combine style inside QIIME 2 pipeline actions, it means that these inner methods exist, so users should just be in the habit of using them directly instead of the simpler 1-shot pipeline actions.

@ebolyen
Copy link
Member

ebolyen commented Nov 20, 2024

Edit: For instance, you could add

  • <resource type="max_cores">1</resource> for tools not supporting parallelism
  • <resource type="min_cores">1</resource> for tools supporting parallelism (<resource type="max_cores">X</resource> if more than X cores would be inefficient).

Perfect, that should map to our Thread % Range(...) predicate, so we can populate these for all tools and provide min_cores+max_cores whenever an action indicates such on a Thread type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants