-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regression in CASE expression #14283
Conversation
In the unit test let then = Arc::new(Literal::new(ScalarValue::Null)); I suggest finding such test reproducible in SLT since it runs end-to-end, ensuring the arguments passed through the function are as expected. For example, in this case, I think we should convert then from For example, the
If there is any reason that we really need to handle let then_value = self.when_then_expr[0]
.1
.evaluate_selection(batch, &when_value)?;
let then_value = if let ColumnarValue::Scalar(ScalarValue::Null) = then_value {
new_null_array(&return_type,batch.num_rows())
} else {
then_value.into_array(batch.num_rows())?
}; |
I also looked into this a little bit. Here's my observations: I personally would have applied the transformation let e = &self.when_then_expr[0].1;
let then_expr = try_cast(Arc::clone(e), &batch.schema(), return_type.clone())?;
let then_value = then_expr
.evaluate_selection(batch, &when_value)?
.into_array(batch.num_rows())?; since to me this seems more obvious what's the intended goal, i.e. "make sure that the then_expr returns the correct datatype for the case expression" and seems to be a little bit more generic. If the suggested fix here is to be accepted I would rather remove the Also: I was surprised to see that I can't reproduce this issue when using datafusion-cli, so from an end-to-end perspective the type coercion seems to work somehow 🤔 > CREATE TABLE batch2(a int[]) AS VALUES ([1, 2, 3]), (null);
> SELECT CASE WHEN a IS NULL THEN NULL ELSE a END FROM batch2;
+--------------------------------------------------------+
| CASE WHEN batch2.a IS NULL THEN NULL ELSE batch2.a END |
+--------------------------------------------------------+
| [1, 2, 3] |
| NULL |
+--------------------------------------------------------+ |
Sorry I just realized that the wording might be a little ambiguous. "suggested fix" refers to this commit |
let then_expr = try_cast(Arc::clone(e), &batch.schema(), return_type.clone())?; I think type handling should be handled correctly in logical layer before physical expression creation, so this doesn't make sense to me. Even checking type in |
@jayzhan211 in the Comet use case we do not have a DataFusion logical plan but are instead translating Spark physical plans to DataFusion physical plans. |
How do you handle types mismatch issue? Does Comet has another type handling logic to find the correct types for datafusion physical plan? |
We map Spark types to Arrow types. For UDFs, we call DataFusion's This is the DataFusion plan that was failing.
|
The function pub fn get_coerce_type_for_case_expression(
when_or_then_types: &[DataType],
case_or_else_type: Option<&DataType>,
) -> Option<DataType> {
let case_or_else_type = match case_or_else_type {
None => when_or_then_types[0].clone(),
Some(data_type) => data_type.clone(),
};
when_or_then_types
.iter()
.try_fold(case_or_else_type, |left_type, right_type| {
// TODO: now just use the `equal` coercion rule for case when. If find the issue, and
// refactor again.
comparison_coercion(&left_type, right_type)
})
} Therefore, I think Comet also need to applies such coercion before calling In datafusion,
|
I have two questions:
|
|
@jayzhan211 Thanks, I will look into using @aweltsch Adding type checks in Tomorrow, I will try updating the Comet PR to use |
Ideally, invoking a physical optimization rule to apply coercion as needed would be nice. |
I agree -- this would be helpful for any system that only uses the Physical layer. I reviewed this PR and I think it looks like it isn't waiting on review (Andy is going to try |
Using Thanks @aweltsch @jayzhan211 @alamb |
Which issue does this PR close?
Closes #14277
Rationale for this change
Add unit test that demonstrates the regression reported in #14277
What changes are included in this PR?
arguments need to have the same data type
Are these changes tested?
Are there any user-facing changes?