Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions #15014

qazxcdswe123 · 2025-03-05T04:30:01Z

Which issue does this PR close?

Partial fix for (RESPECT NULLS / IGNORE NULLS is syntax for window functions, not aggregate functions #15006

Rationale for this change

As per the issue says

In the SQL standard, RESPECT NULLS and IGNORE NULLS are options to be set for the lead, lag, first_value, last_value and nth_value window functions.

What changes are included in this PR?

Adds validation to prevent using RESPECT NULLS and IGNORE NULLS with aggregate functions

One thing to note that is:

That being said, part of the weirdness here is that DataFusion defines first_value both as an aggregate function and as a window function.

Accroding to the sqllogic test, this should only affect aggregate function.

Are these changes tested?

Essisting tests have coverd the cases

Are there any user-facing changes?

Maybe? If one uses RESPECT NULLS and IGNORE NULLS with aggregate functions then this is a breaking change

Adds validation to prevent using RESPECT NULLS and IGNORE NULLS with aggregate functions

alamb · 2025-03-05T11:06:18Z

datafusion/sqllogictest/test_files/aggregate.slt

@@ -5893,15 +5889,11 @@ SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t;
 ----
 NULL

-query I
+query error DataFusion error: Error during planning: RESPECT NULLS and IGNORE NULLS are not supported for aggregate functions
 SELECT FIRST_VALUE(column1 ORDER BY column2) RESPECT NULLS FROM t;


I am not sure about this change -- the query seems valid to me as there is a null and an argument order, right?

This is a bit of the weirdness I pointed out in #15006, which is that DataFusion has first_value as both an aggregate and window function, whereas most other engines only have a window function form.

This invocation is technically the aggregate function form, because there is no OVER clause which would be expected for a window function invocation.

I agree -- I think spark behaves this way which is why someone added the feature to DataFusion

Ha, there's always another engine 😅

It looks like Spark defines afirst_value(expr[, isIgnoreNull]) function which takes an optional boolean parameter to control whether it ignore nulls.

Spark Docs: https://spark.apache.org/docs/latest/api/sql/index.html#first_value

comphead · 2025-03-05T22:56:37Z

datafusion/sql/src/expr/function.rs

@@ -349,6 +349,12 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
        } else {
            // User defined aggregate functions (UDAF) have precedence in case it has the same name as a scalar built-in function
            if let Some(fm) = self.context_provider.get_aggregate_meta(&name) {
+                // Reject RESPECT NULLS and IGNORE NULLS for aggregate functions
+                // See https://github.com/apache/datafusion/issues/15006
+                if null_treatment.is_some() {


I would consider it to be a parser responsibility, for example duckDB

D SELECT LAST_VALUE(column1) RESPECT NULLS FROM t ; Parser Error: syntax error at or near "RESPECT" LINE 1: SELECT LAST_VALUE(column1) RESPECT NULLS FROM t ^

Yes I totally agree! Does this mean I should make change in https://github.com/apache/datafusion-sqlparser-rs ?

Reject RESPECT NULLS and IGNORE NULLS for aggregate functions

5eca271

Adds validation to prevent using RESPECT NULLS and IGNORE NULLS with aggregate functions

github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Mar 5, 2025

qazxcdswe123 marked this pull request as draft March 5, 2025 04:37

qazxcdswe123 marked this pull request as ready for review March 5, 2025 04:39

alamb reviewed Mar 5, 2025

View reviewed changes

comphead reviewed Mar 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions #15014

Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions #15014

qazxcdswe123 commented Mar 5, 2025

alamb Mar 5, 2025

vbarua Mar 5, 2025

alamb Mar 5, 2025

vbarua Mar 6, 2025 •

edited

Loading

comphead Mar 5, 2025

qazxcdswe123 Mar 8, 2025

Reject RESPECT NULLS and IGNORE NULLS for aggregate functions #15014

Are you sure you want to change the base?

Reject RESPECT NULLS and IGNORE NULLS for aggregate functions #15014

Conversation

qazxcdswe123 commented Mar 5, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb Mar 5, 2025

Choose a reason for hiding this comment

vbarua Mar 5, 2025

Choose a reason for hiding this comment

alamb Mar 5, 2025

Choose a reason for hiding this comment

vbarua Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

comphead Mar 5, 2025

Choose a reason for hiding this comment

qazxcdswe123 Mar 8, 2025

Choose a reason for hiding this comment

Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions #15014

Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions #15014

vbarua Mar 6, 2025 •

edited

Loading