Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add proto roundtrips for Spark tests and fix issues it surfaces #315

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Oct 28, 2024

Adds testing for substrait-spark that going from POJO (ie. substrait-java plan) -> Proto -> POJO results in the same POJO.

The test showed a bunch of cases where that assertion fails, mainly due to the java pojos containing a derived outputType which was in many cases incorrect when created from the proto.

// count only needs to be set when it is not -1
builder.count(rel.getCount());
}
var builder = Fetch.builder().input(input).offset(rel.getOffset()).count(rel.getCount());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the idea of not setting count if it's -1 is fine, this makes roundtrip tests fail if count is set in the pojo. Alternative fix is to ensure in the pojo it's never set if -1.

@@ -131,7 +131,7 @@ class ToSubstraitRel extends AbstractLogicalPlanVisitor with Logging {
val aggregates = collectAggregates(actualResultExprs, aggExprToOutputOrdinal)
val aggOutputMap = aggregates.zipWithIndex.map {
case (e, i) =>
AttributeReference(s"agg_func_$i", e.dataType)() -> e
AttributeReference(s"agg_func_$i", e.dataType, nullable = e.nullable)() -> e
Copy link
Contributor Author

@Blizzara Blizzara Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were causing wrong nullability for the type in the created pojos. I don't think that type field is used anywhere so it didn't cause harm, but still failed roundtrip tests as the type isn't written in proto and then it got correctly evaluated from other fields on read.

@Blizzara Blizzara force-pushed the avo/stronger-testing branch 2 times, most recently from 623ee12 to e374c85 Compare November 21, 2024 17:31
@@ -312,7 +312,7 @@ void scalarSubquery() {
Stream.of(
Expression.ScalarSubquery.builder()
.input(relWithEnhancement)
.type(TypeCreator.REQUIRED.struct(TypeCreator.REQUIRED.I64))
.type(TypeCreator.REQUIRED.I64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% sure about this, is it actually meant to return a struct type? given it's scalar that seems a bit weird

@Blizzara Blizzara force-pushed the avo/stronger-testing branch from 13a3b99 to ad8c73b Compare March 5, 2025 10:07
@Blizzara Blizzara force-pushed the avo/stronger-testing branch from b6b2307 to e7be830 Compare March 5, 2025 15:23
@Blizzara Blizzara changed the title [wip] fix: add proto roundtrips for Spark tests and fix issues it surfaces fix: add proto roundtrips for Spark tests and fix issues it surfaces Mar 5, 2025
@Blizzara Blizzara force-pushed the avo/stronger-testing branch 2 times, most recently from 58ac64e to 4451193 Compare March 5, 2025 15:43
@Blizzara Blizzara marked this pull request as ready for review March 5, 2025 15:48
@Blizzara Blizzara force-pushed the avo/stronger-testing branch from 4451193 to 2945e97 Compare March 5, 2025 16:06
@Blizzara Blizzara force-pushed the avo/stronger-testing branch from 2945e97 to fb788e1 Compare March 5, 2025 16:16
@Blizzara
Copy link
Contributor Author

Blizzara commented Mar 5, 2025

@vbarua @andrew-coleman this has been open for a while, but now finally ready for review! The testing change collides a bit with Andrew's #333, but either should be trivial to rebase once the other is in.

@@ -42,7 +46,38 @@ public static Set.SetOp fromProto(SetRel.SetOp proto) {

@Override
protected Type.Struct deriveRecordType() {
return getInputs().get(0).getRecordType();
// The different inputs may have schemas that differ in nullability, but not in type.
// In that case we should return a schema that is nullable where any of the inputs is nullable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the docs for this (https://substrait.io/relations/logical_relations/#set-operation-types), the output nullability depends on which set operation is being performed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I realized that as well but forgot to fix 😅 I'll try to tomorrow..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants