Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reclaim: Have the ability to rank victim nodes based on what's running in them #3997

Open
raravena80 opened this issue Feb 5, 2025 · 6 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@raravena80
Copy link
Contributor

What is the problem you're trying to solve

A follow to this thread: https://cloud-native.slack.com/archives/C011GJDQS0N/p1738726065356349?thread_ts=1738361347.510989&cid=C011GJDQS0N

Reclaim goes note by node without having any ranking function in terms of what's running in them. Is there a way to rank the nodes based on what's running in them. For example if tasks from jobs that are in lower priority queues?

Describe the solution you'd like

For example before running this: https://github.com/volcano-sh/volcano/blob/master/pkg/scheduler/actions/reclaim/reclaim.go#L141

Could we rank the nodes in a certain order?

Thanks!

Additional context

No response

@raravena80 raravena80 added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 5, 2025
@JesseStutler
Copy link
Member

This is a useful question that needs to be enhanced in v1.12 or in the patch version of v1.11, we can filter through all the victims first and then prioritize them accordingly

@JesseStutler
Copy link
Member

cc @Monokaix, this should be planned for v1.12 milestone?

@raravena80
Copy link
Contributor Author

Btw, we experienced a similar issue with preemption in where it didn't have context of what was running in the nodes. I'm just adding a comment here in case the solutions are the same.

This is what we tried to fix here #3960 but there may be a better fix than that.

@JesseStutler
Copy link
Member

Btw, we experienced a similar issue with preemption in where it didn't have context of what was running in the nodes. I'm just adding a comment here in case the solutions are the same.

This is what we tried to fix here #3960 but there may be a better fix than that.

In preempt action?

@raravena80
Copy link
Contributor Author

In preempt action?

Correct.

@sfc-gh-raravena
Copy link
Contributor

Something that might be useful here too is if we allow reclaim to not happen after scoring the nodes and finding out that all jobs are gang jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants