-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject RESPECT NULLS
and IGNORE NULLS
for aggregate functions
#15014
base: main
Are you sure you want to change the base?
Conversation
Adds validation to prevent using RESPECT NULLS and IGNORE NULLS with aggregate functions
@@ -5893,15 +5889,11 @@ SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t; | |||
---- | |||
NULL | |||
|
|||
query I | |||
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions | |||
SELECT FIRST_VALUE(column1 ORDER BY column2) RESPECT NULLS FROM t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this change -- the query seems valid to me as there is a null and an argument order, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of the weirdness I pointed out in #15006, which is that DataFusion has first_value
as both an aggregate and window function, whereas most other engines only have a window function form.
This invocation is technically the aggregate function form, because there is no OVER clause which would be expected for a window function invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- I think spark behaves this way which is why someone added the feature to DataFusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, there's always another engine 😅
It looks like Spark defines afirst_value(expr[, isIgnoreNull])
function which takes an optional boolean parameter to control whether it ignore nulls.
Spark Docs: https://spark.apache.org/docs/latest/api/sql/index.html#first_value
@@ -349,6 +349,12 @@ impl<S: ContextProvider> SqlToRel<'_, S> { | |||
} else { | |||
// User defined aggregate functions (UDAF) have precedence in case it has the same name as a scalar built-in function | |||
if let Some(fm) = self.context_provider.get_aggregate_meta(&name) { | |||
// Reject RESPECT NULLS and IGNORE NULLS for aggregate functions | |||
// See https://github.com/apache/datafusion/issues/15006 | |||
if null_treatment.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider it to be a parser responsibility, for example duckDB
D SELECT LAST_VALUE(column1) RESPECT NULLS FROM t
;
Parser Error: syntax error at or near "RESPECT"
LINE 1: SELECT LAST_VALUE(column1) RESPECT NULLS FROM t
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I totally agree! Does this mean I should make change in https://github.com/apache/datafusion-sqlparser-rs ?
Which issue does this PR close?
Rationale for this change
As per the issue says
What changes are included in this PR?
Adds validation to prevent using
RESPECT NULLS
andIGNORE NULLS
with aggregate functionsOne thing to note that is:
Accroding to the sqllogic test, this should only affect aggregate function.
Are these changes tested?
Essisting tests have coverd the cases
Are there any user-facing changes?
Maybe? If one uses
RESPECT NULLS
andIGNORE NULLS
with aggregate functions then this is a breaking change