-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(isthmus): sql day/time interval conversion #335
base: main
Are you sure you want to change the base?
Conversation
78d6755
to
fff4b6b
Compare
RexLiteral day/time intervals were converted to Substrait using their scale. Despite all day/time interval RexLiteral created from SQL by Calcite having a (default) scale of 6, their value is always in milliseconds. This resulted in intervals generated by Calcite from SQL being wrong by a factor of 1000 seconds. This change treats day/time interval RexLiteral values as milliseconds regardless of their claimed scale. Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
fff4b6b
to
b2d204a
Compare
I am not entirely satisfied with this as a fix, although there look to be other places in the existing code that assumes that Calcite intervals are millisecond precision. I will continue to see if there is a non-breaking way to change the Calcite behaviour so that substrait-java can use the RexLiteral scale value. If that turns out to be possible then I can revisit this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks reasonable to me.
@Blizzara could you take a look as well as you've worked on this in the past.
case FLOAT, DOUBLE -> fp64(n, ((Number) literal.getValue()).doubleValue()); | ||
case REAL -> fp32(n, ((Number) literal.getValue()).floatValue()); | ||
case FLOAT, DOUBLE -> fp64(n, literal.getValueAs(Double.class)); | ||
case REAL -> fp32(n, literal.getValueAs(Float.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about getValueAs
var seconds = interval.minusDays(days).toSeconds(); | ||
var micros = interval.toMillisPart() * 1000; | ||
|
||
yield intervalDay(n, (int) days, (int) seconds, micros, 6); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also consistent with the conversion from Substrait to Calcite in
substrait-java/isthmus/src/main/java/io/substrait/isthmus/expression/ExpressionRexConverter.java
Lines 236 to 246 in fe08fd4
@Override | |
public RexNode visit(Expression.IntervalDayLiteral expr) throws RuntimeException { | |
long milliseconds = | |
expr.precision() > 3 | |
? (expr.subseconds() / (int) Math.pow(10, expr.precision() - 3)) | |
: (expr.subseconds() * (int) Math.pow(10, 3 - expr.precision())); | |
return rexBuilder.makeIntervalLiteral( | |
// Current Calcite behavior is to store milliseconds since Epoch | |
new BigDecimal((expr.days() * MILLIS_IN_DAY + expr.seconds() * 1_000L + milliseconds)), | |
DAY_SECOND_INTERVAL); | |
} |
var microseconds = totalMicroseconds - 1_000_000 * seconds; | ||
yield intervalDay(n, (int) days, (int) seconds, microseconds, 6 /* micros */); | ||
// Calcite represents day/time intervals in milliseconds, despite a default scale of 6. | ||
var totalMillis = Objects.requireNonNull(literal.getValueAs(Long.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is the case here that Calcite has always
- literal.getType().getScale() = 6
- literal.getValue() returns millis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you actually already answered that in the PR description:
Despite all day/time interval RexLiteral created from SQL by Calcite having a (default) scale of 6, their value is always in milliseconds.
👌
var seconds = interval.minusDays(days).toSeconds(); | ||
var micros = interval.toMillisPart() * 1000; | ||
|
||
yield intervalDay(n, (int) days, (int) seconds, micros, 6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively you could return a precision of 3 and the millis value 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know which approach is better. Given the current Calcite behaviour, 3 is correct. However, Calcite has a default fractional second precision of 6. My view is that Calcite is not behaving correctly and really should be returning the precision it says it is. If that gets fixed at some point, the precision here will revert to 6. I figured it was better not to change the precision (for now) to minimise the impact of this change to isthmus users, but very happy to take your guidance on the best approach. Just let me know what you prefer.
@Test | ||
void tIntervalDay() { | ||
// Calcite always uses milliseconds | ||
BigDecimal bd = new BigDecimal(TimeUnit.DAYS.toMillis(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use some value that's days + seconds + millis rather than just days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is specifically testing day intervals since that is the case I hit the issue. It wouldn't be a good test of day intervals if it actually tested other time units. I am happy to add additional test cases though. There is already a day/second test immediately above this one in the file. What other cases do you think it would be valuable to cover? Just day + second + millis?
@@ -155,7 +154,7 @@ public Expression.Literal convert(RexLiteral literal) { | |||
yield timestamp(n, ldt); | |||
} | |||
case INTERVAL_YEAR, INTERVAL_YEAR_MONTH, INTERVAL_MONTH -> { | |||
var intervalLength = literal.getValueAs(BigDecimal.class).longValue(); | |||
long intervalLength = Objects.requireNonNull(literal.getValueAs(Long.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious why this change is okay but getValueAs() returns the object casted if the type matches, otherwise it has some conversion logic for some types, so here it'll first cast the object to BigDecimal and then call .longValue()
on it, so basically what the previous code was doing 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the literal.getValueAs(BigDecimal.class)
that was there previously calls liternal.getValueAs(Long.class)
internally before then converting it into a BigDecimal... which we then convert back to a long. It seems more sensible to just ask for the type we want. Functionally it is the same.
I'm not familiar with Calcite, but seems okay to me! I left couple notes/questions just in case they're helpful |
RexLiteral day/time intervals were converted to Substrait using their scale. Despite all day/time interval RexLiteral created from SQL by Calcite having a (default) scale of 6, their value is always in milliseconds. This resulted in intervals generated by Calcite from SQL being wrong by a factor of 1000 seconds.
This change treats day/time interval RexLiteral values as milliseconds regardless of their claimed scale.
Closes #327