Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix array_has_all and array_has_any with empty array #15039

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

LuQQiu
Copy link

@LuQQiu LuQQiu commented Mar 6, 2025

Which issue does this PR close?

What changes are included in this PR?

Fix array_has_any and array_has_all with empty array input

Are these changes tested?

Added the unit test in array.slt

Are there any user-facing changes?

NA

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 6, 2025
@LuQQiu
Copy link
Author

LuQQiu commented Mar 6, 2025

@jayzhan211 Could you help approve the test workflows and review the PR? Thanks in advance

Comment on lines 444 to 445
ComparisonType::All => true,
ComparisonType::Any => false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to me that All is true and Any is false but I agree it matches the descriptions:

array_has_all: Returns true if all elements of sub-array exist in array.
array_has_any: Returns true if any elements exist in both arrays.

I guess, from a set perspective, we have "true if the intersection of the two sets has the same length as needle" (all) and "true if the intersection of the two sets is non-empty" (any).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D select array_has_any([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_any(main.list_value(1, 2, 3), main.list_value()) │
│                          boolean                           │
├────────────────────────────────────────────────────────────┤
│ false                                                      │
└────────────────────────────────────────────────────────────┘
D select array_has_all([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_all(main.list_value(1, 2, 3), main.list_value()) │
│                          boolean                           │
├────────────────────────────────────────────────────────────┤
│ true                                                       │
└────────────────────────────────────────────────────────────┘

LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes. I just meant it was strange intuition (I was rambling). I was not arguing correctness. I agree this is good.

ComparisonType::All => true,
ComparisonType::Any => false,
};
let result = BooleanArray::from(vec![result_value; haystack.len()]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BooleanBuffer::new_set and BooleanBuffer::new_unset should be slightly more efficient.

Copy link
Author

@LuQQiu LuQQiu Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using

    if needle.values().len() == 0 {
        return match comparison_type {
            ComparisonType::All => Ok(Arc::new(BooleanBuffer::new_set(haystack.len()))),
            ComparisonType::Any =>  Ok(Arc::new(BooleanBuffer::new_unset(haystack.len())))
        };
    }

Runs into

the trait bound `BooleanBuffer: arrow::array::Array` is not satisfied
the following other types implement trait `arrow::array::Array`:
  &T
  BooleanArray
  DictionaryArray<T>
  FixedSizeBinaryArray
  FixedSizeListArray
  GenericByteArray<T>
  GenericByteViewArray<T>
  GenericListArray<OffsetSize>
and 10 others
required for the cast from `std::sync::Arc<BooleanBuffer>` to `std::sync::Arc<dyn arrow::array::Array>`rustc[Click for full compiler diagnostic](rust-analyzer-diagnostics-view:/diagnostic%20message%20[0]?0#file:///Users/lu/Projects/Github/Work/datafusion/datafusion/functions-nested/src/array_has.rs)

So confusing about the different types loll

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use BooleanArray::from(BooleanBuffer::new_/* ... */)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Changed to use BooleanBuffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

array_has_any(column, []) with empty array throws RowConverter column schema mismatch, expected Utf8 got Int64
4 participants