Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The task-topology plugin does not implement DeallocateFunc, which may cause the scheduling result to be unexpected #4003

Open
liuyuanchun11 opened this issue Feb 11, 2025 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@liuyuanchun11
Copy link
Contributor

Description

The plugin task-topology implements the AllocateFunc (resource allocation function) but does not implement the DeallocateFunc (resource deallocation function). If a statement.Discard operation occurs during the scheduling process, the actions performed in allocateFunc cannot be rolled back. When there are multiple pending jobs to be scheduled within a single openSession, this may lead to scheduling results that deviate from expectations.

Steps to reproduce the issue

1.Enable the task-topology plug-in
2.Create two jobs: Job-A contains 2 pods, but the current cluster's available resources can only fulfill the request for 1 pod. Job-B contains 1 pod, and the available resources are sufficient to satisfy Job-B's request.Job-A has a higher priority than Job-B.
3.After scheduling begins, Job-A first schedules one pod. However, since the cluster resources are insufficient to fulfill the second pod's request, this triggers a statement.Discard operation to roll back the allocation. Because the DeallocateFunc is not implemented, the JobManager.TaskBound state (or resource binding) for Job-A is not rolled back. This incomplete rollback may block or interfere with the subsequent scheduling of Job-B, even though Job-B's resource requirements could otherwise be met.

Describe the results you received and expected

The task-topology plugin needs to be implemented in addition to deallocate

What version of Volcano are you using?

volcano 1.10

Any other relevant information

No response

@liuyuanchun11 liuyuanchun11 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant