You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the GroupedHashAggregateStream, update_memory_reservation() has a try_resize() call, an error returned from this function is usually handled gracefully, as we are usually in the middle of emitting and can just emit again, or are accumulating and can spill(unless not even a single batch fits in memory, in which case this can be a justified panic).
However, when creating a merge stream using the spill data, there is no viable error fallback, this early exits on error if the memory pool denied the reservation.
Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()?
match self.update_memory_reservation() {
// Here we can ignore `insufficient_capacity_err` because we will spill later,
// but at least one batch should fit in the memory
Err(DataFusionError::ResourcesExhausted(_))
if self.group_values.len() >= self.batch_size =>
{
Ok(())
}
other => other,
}
This should check preemptively if we have a possibility of spilling here, and if not, it should be a resize(), not a try_resize() as update_memory_reservation() does.
But this one actually has a possibility of not aborting so seems more agreeable.
It's possible that there are other design ideas taken into consideration here, but this is my belief at least.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
alamb
changed the title
Calling try_resize where error is not acceptible
Ensure GroupByHash does not error when trying to spill (calling try_resize where error is not acceptible)
Feb 24, 2025
Thanks @EmilyMatt -- I agree with the seeming goal of this ticket that when operating in a memory constrained environment it is best to avoid erroring during the spilling process itself
It's possible that there are other design ideas taken into consideration here, but this is my belief at least.
Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()?
I think another classic solution to this is to reduce the memory requirements (though then also decrease the performance) using a mutli-pass merge, as described on #14692 (comment)
There is some overlap with the discussions that @2010YOUY01 has been having on this ticket (though that one is related to sorting)
So it may be that your idea of improving the reservation calls can work / improve your usecase
However, to solve the problem of "being able to merge in an arbitrary amount of data / spill files given limited memory" will require multi-level merge
Describe the bug
In the GroupedHashAggregateStream, update_memory_reservation() has a try_resize() call, an error returned from this function is usually handled gracefully, as we are usually in the middle of emitting and can just emit again, or are accumulating and can spill(unless not even a single batch fits in memory, in which case this can be a justified panic).
However, when creating a merge stream using the spill data, there is no viable error fallback, this early exits on error if the memory pool denied the reservation.
Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()?
This is the problematic line:
datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs
Line 1050 in 6c5f214
I believe in line 910 as well,
This should check preemptively if we have a possibility of spilling here, and if not, it should be a resize(), not a try_resize() as update_memory_reservation() does.
But this one actually has a possibility of not aborting so seems more agreeable.
It's possible that there are other design ideas taken into consideration here, but this is my belief at least.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: