Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin backend scheduler/worker for consistent compaction #4770

Open
wants to merge 95 commits into
base: main
Choose a base branch
from

Conversation

zalegrala
Copy link
Contributor

@zalegrala zalegrala commented Feb 28, 2025

What this PR does:

Here we add two new target modules BackendScheduler and BackendWorker. At present, these modules work together to replace the function of the current Compactor module. These modules are not included in any of the default targets and require enablement in order to function.

The BackendScheduler is responsible for scheduling compaction jobs, tracking their status and persisting that status to the backend for reload at startup. The BackendWorker is responsible for picking up these jobs and executing them. The BackendWorker is also responsible for writing the tenant index to the object storage. Sharding for which tenants are owned by which worker is handled by the ring, which is copied from the current Compactor module. Small duplication of effort during scaling events for the tenant index is a non-issue.

Tenant fairness is acheived with a new tenantselector package which aims to ensure that all tenants get compacted, but also not neglect any tenant for too long. This is a simple approach which is not perfect, but should work for our use case. With the current implementation, the blocklist, outstanding blocks, and the last compaction time are taken into account to determmine which tenants need to have jobs scheduled. Eventually, if a tenant has not been compacted for a long time, it will become the priority no matter the lengh of the blocklist.

Why we need it:

During sclaing events of the compactors, the ring may not be fully propogated and there is a race between compactors that may already be executing a compaction job for a given block and new compactors just joining the ring. This can lead to a small amount of duplicated data in the backend until the ring is fully propgated and stable. For small environments, this may never surface as an issue, but in large environments that wish to autoscale their compactors with load, this can be problematic. Additionally, we want to rely on RF1 data in the backend for future work.

Completed and failed jobs are dropped from the state after 1 hour.

Known issues to follow up:

The output blocks are not idempotent, since the desitntion block ID is not known until the compaction is complete. This is a smallish change to the encoding package to include the target block ID in the output block. This is not included in this PR, but we can follow up.

The worker does not wait to complete the in flight jobs before shutting down. This is something I would like to resolve and is not included in this PR.

Tenant retention jobs are not included in this PR, but I expect this to be a fast follow with the current pattern.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zalegrala zalegrala force-pushed the beingBackendScheduler branch from 5f0f32f to a8a7ca5 Compare February 28, 2025 15:57
@zalegrala zalegrala force-pushed the beingBackendScheduler branch from 007fc0c to 4614192 Compare March 5, 2025 16:54
var reader backend.RawReader
var writer backend.RawWriter

switch t.cfg.StorageConfig.Trace.Backend {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this. I copied the pattern we have in the usagestats module, but perhaps a storage.Store interface extention makes sense here. I'm open to thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants