Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject RESPECT NULLS and IGNORE NULLS for aggregate functions #15014

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion datafusion/sql/src/expr/function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,12 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
} else {
// User defined aggregate functions (UDAF) have precedence in case it has the same name as a scalar built-in function
if let Some(fm) = self.context_provider.get_aggregate_meta(&name) {
// Reject RESPECT NULLS and IGNORE NULLS for aggregate functions
// See https://github.com/apache/datafusion/issues/15006
if null_treatment.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider it to be a parser responsibility, for example duckDB

D SELECT LAST_VALUE(column1) RESPECT NULLS FROM t
  ;
Parser Error: syntax error at or near "RESPECT"
LINE 1: SELECT LAST_VALUE(column1) RESPECT NULLS FROM t
                                   ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I totally agree! Does this mean I should make change in https://github.com/apache/datafusion-sqlparser-rs ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think someone needs to go see what spark does too -- as I recall this was some feature from spark that someone added explicitly...

Maybe @andygrove or @huaxingao remembers 🤔

return plan_err!("RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions");
}

let order_by = self.order_by_to_sort_expr(
order_by,
schema,
Expand All @@ -369,7 +375,7 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
distinct,
filter,
order_by,
null_treatment,
null_treatment: None, // See https://github.com/apache/datafusion/issues/15006
};
for planner in self.context_provider.get_expr_planners().iter() {
match planner.plan_aggregate(aggregate_expr)? {
Expand Down
32 changes: 8 additions & 24 deletions datafusion/sqllogictest/test_files/aggregate.slt
Original file line number Diff line number Diff line change
Expand Up @@ -5863,15 +5863,11 @@ SELECT FIRST_VALUE(column1) FROM t;
----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT FIRST_VALUE(column1) RESPECT NULLS FROM t;
----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t;
----
3

statement ok
DROP TABLE t;
Expand All @@ -5893,15 +5889,11 @@ SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t;
----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT FIRST_VALUE(column1 ORDER BY column2) RESPECT NULLS FROM t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this change -- the query seems valid to me as there is a null and an argument order, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of the weirdness I pointed out in #15006, which is that DataFusion has first_value as both an aggregate and window function, whereas most other engines only have a window function form.

This invocation is technically the aggregate function form, because there is no OVER clause which would be expected for a window function invocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- I think spark behaves this way which is why someone added the feature to DataFusion

Copy link
Contributor

@vbarua vbarua Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, there's always another engine 😅

It looks like Spark defines afirst_value(expr[, isIgnoreNull]) function which takes an optional boolean parameter to control whether it ignore nulls.

Spark Docs: https://spark.apache.org/docs/latest/api/sql/index.html#first_value

----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT FIRST_VALUE(column1 ORDER BY column2) IGNORE NULLS FROM t;
----
4

statement ok
DROP TABLE t;
Expand All @@ -5915,15 +5907,11 @@ SELECT LAST_VALUE(column1) FROM t;
----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT LAST_VALUE(column1) RESPECT NULLS FROM t;
----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT LAST_VALUE(column1) IGNORE NULLS FROM t;
----
4

statement ok
DROP TABLE t;
Expand All @@ -5945,15 +5933,11 @@ SELECT LAST_VALUE(column1 ORDER BY column2 DESC) FROM t;
----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT LAST_VALUE(column1 ORDER BY column2 DESC) RESPECT NULLS FROM t;
----
NULL

query I
query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
SELECT LAST_VALUE(column1 ORDER BY column2 DESC) IGNORE NULLS FROM t;
----
3

statement ok
DROP TABLE t;
Expand Down