-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add proto roundtrips for Spark tests and fix issues it surfaces #315
base: main
Are you sure you want to change the base?
Conversation
// count only needs to be set when it is not -1 | ||
builder.count(rel.getCount()); | ||
} | ||
var builder = Fetch.builder().input(input).offset(rel.getOffset()).count(rel.getCount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while the idea of not setting count if it's -1 is fine, this makes roundtrip tests fail if count is set in the pojo. Alternative fix is to ensure in the pojo it's never set if -1.
@@ -131,7 +131,7 @@ class ToSubstraitRel extends AbstractLogicalPlanVisitor with Logging { | |||
val aggregates = collectAggregates(actualResultExprs, aggExprToOutputOrdinal) | |||
val aggOutputMap = aggregates.zipWithIndex.map { | |||
case (e, i) => | |||
AttributeReference(s"agg_func_$i", e.dataType)() -> e | |||
AttributeReference(s"agg_func_$i", e.dataType, nullable = e.nullable)() -> e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these were causing wrong nullability for the type in the created pojos. I don't think that type field is used anywhere so it didn't cause harm, but still failed roundtrip tests as the type isn't written in proto and then it got correctly evaluated from other fields on read.
spark/src/main/scala/io/substrait/spark/logical/ToSubstraitRel.scala
Outdated
Show resolved
Hide resolved
623ee12
to
e374c85
Compare
@@ -312,7 +312,7 @@ void scalarSubquery() { | |||
Stream.of( | |||
Expression.ScalarSubquery.builder() | |||
.input(relWithEnhancement) | |||
.type(TypeCreator.REQUIRED.struct(TypeCreator.REQUIRED.I64)) | |||
.type(TypeCreator.REQUIRED.I64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not 100% sure about this, is it actually meant to return a struct type? given it's scalar that seems a bit weird
13a3b99
to
ad8c73b
Compare
b6b2307
to
e7be830
Compare
58ac64e
to
4451193
Compare
4451193
to
2945e97
Compare
2945e97
to
fb788e1
Compare
@vbarua @andrew-coleman this has been open for a while, but now finally ready for review! The testing change collides a bit with Andrew's #333, but either should be trivial to rebase once the other is in. |
@@ -42,7 +46,38 @@ public static Set.SetOp fromProto(SetRel.SetOp proto) { | |||
|
|||
@Override | |||
protected Type.Struct deriveRecordType() { | |||
return getInputs().get(0).getRecordType(); | |||
// The different inputs may have schemas that differ in nullability, but not in type. | |||
// In that case we should return a schema that is nullable where any of the inputs is nullable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the docs for this (https://substrait.io/relations/logical_relations/#set-operation-types), the output nullability depends on which set operation is being performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I realized that as well but forgot to fix 😅 I'll try to tomorrow..
Adds testing for substrait-spark that going from POJO (ie. substrait-java plan) -> Proto -> POJO results in the same POJO.
The test showed a bunch of cases where that assertion fails, mainly due to the java pojos containing a derived outputType which was in many cases incorrect when created from the proto.