Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure GroupByHash does not error when trying to spill (calling try_resize where error is not acceptible) #14851

Open
Tracked by #14077
EmilyMatt opened this issue Feb 24, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@EmilyMatt
Copy link

EmilyMatt commented Feb 24, 2025

Describe the bug

In the GroupedHashAggregateStream, update_memory_reservation() has a try_resize() call, an error returned from this function is usually handled gracefully, as we are usually in the middle of emitting and can just emit again, or are accumulating and can spill(unless not even a single batch fits in memory, in which case this can be a justified panic).

However, when creating a merge stream using the spill data, there is no viable error fallback, this early exits on error if the memory pool denied the reservation.
Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()?

This is the problematic line:

self.update_memory_reservation()?;

I believe in line 910 as well,

match self.update_memory_reservation() {
            // Here we can ignore `insufficient_capacity_err` because we will spill later,
            // but at least one batch should fit in the memory
            Err(DataFusionError::ResourcesExhausted(_))
                if self.group_values.len() >= self.batch_size =>
            {
                Ok(())
            }
            other => other,
        }

This should check preemptively if we have a possibility of spilling here, and if not, it should be a resize(), not a try_resize() as update_memory_reservation() does.
But this one actually has a possibility of not aborting so seems more agreeable.

It's possible that there are other design ideas taken into consideration here, but this is my belief at least.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@EmilyMatt EmilyMatt added the bug Something isn't working label Feb 24, 2025
@alamb alamb changed the title Calling try_resize where error is not acceptible Ensure GroupByHash does not error when trying to spill (calling try_resize where error is not acceptible) Feb 24, 2025
@alamb
Copy link
Contributor

alamb commented Feb 24, 2025

Thanks @EmilyMatt -- I agree with the seeming goal of this ticket that when operating in a memory constrained environment it is best to avoid erroring during the spilling process itself

It's possible that there are other design ideas taken into consideration here, but this is my belief at least.

Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()?

I think another classic solution to this is to reduce the memory requirements (though then also decrease the performance) using a mutli-pass merge, as described on #14692 (comment)

There is some overlap with the discussions that @2010YOUY01 has been having on this ticket (though that one is related to sorting)

So it may be that your idea of improving the reservation calls can work / improve your usecase

However, to solve the problem of "being able to merge in an arbitrary amount of data / spill files given limited memory" will require multi-level merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants