Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(isthmus): sql day/time interval conversion #335

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bestbeforetoday
Copy link
Contributor

@bestbeforetoday bestbeforetoday commented Feb 28, 2025

RexLiteral day/time intervals were converted to Substrait using their scale. Despite all day/time interval RexLiteral created from SQL by Calcite having a (default) scale of 6, their value is always in milliseconds. This resulted in intervals generated by Calcite from SQL being wrong by a factor of 1000 seconds.

This change treats day/time interval RexLiteral values as milliseconds regardless of their claimed scale.

Closes #327

@bestbeforetoday bestbeforetoday changed the title fix(isthmus): SQL day/time interval conversion fix(isthmus): sql day/time interval conversion Feb 28, 2025
RexLiteral day/time intervals were converted to Substrait using their
scale. Despite all day/time interval RexLiteral created from SQL by
Calcite having a (default) scale of 6, their value is always in
milliseconds. This resulted in intervals generated by Calcite from SQL
being wrong by a factor of 1000 seconds.

This change treats day/time interval RexLiteral values as milliseconds
regardless of their claimed scale.

Signed-off-by: Mark S. Lewis <Mark.S.Lewis@outlook.com>
@bestbeforetoday bestbeforetoday marked this pull request as ready for review February 28, 2025 18:29
@bestbeforetoday
Copy link
Contributor Author

I am not entirely satisfied with this as a fix, although there look to be other places in the existing code that assumes that Calcite intervals are millisecond precision. I will continue to see if there is a non-breaking way to change the Calcite behaviour so that substrait-java can use the RexLiteral scale value. If that turns out to be possible then I can revisit this.

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks reasonable to me.

@Blizzara could you take a look as well as you've worked on this in the past.

case FLOAT, DOUBLE -> fp64(n, ((Number) literal.getValue()).doubleValue());
case REAL -> fp32(n, ((Number) literal.getValue()).floatValue());
case FLOAT, DOUBLE -> fp64(n, literal.getValueAs(Double.class));
case REAL -> fp32(n, literal.getValueAs(Float.class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about getValueAs

var seconds = interval.minusDays(days).toSeconds();
var micros = interval.toMillisPart() * 1000;

yield intervalDay(n, (int) days, (int) seconds, micros, 6);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also consistent with the conversion from Substrait to Calcite in

@Override
public RexNode visit(Expression.IntervalDayLiteral expr) throws RuntimeException {
long milliseconds =
expr.precision() > 3
? (expr.subseconds() / (int) Math.pow(10, expr.precision() - 3))
: (expr.subseconds() * (int) Math.pow(10, 3 - expr.precision()));
return rexBuilder.makeIntervalLiteral(
// Current Calcite behavior is to store milliseconds since Epoch
new BigDecimal((expr.days() * MILLIS_IN_DAY + expr.seconds() * 1_000L + milliseconds)),
DAY_SECOND_INTERVAL);
}

var microseconds = totalMicroseconds - 1_000_000 * seconds;
yield intervalDay(n, (int) days, (int) seconds, microseconds, 6 /* micros */);
// Calcite represents day/time intervals in milliseconds, despite a default scale of 6.
var totalMillis = Objects.requireNonNull(literal.getValueAs(Long.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is the case here that Calcite has always

  • literal.getType().getScale() = 6
  • literal.getValue() returns millis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you actually already answered that in the PR description:

Despite all day/time interval RexLiteral created from SQL by Calcite having a (default) scale of 6, their value is always in milliseconds.

👌

var seconds = interval.minusDays(days).toSeconds();
var micros = interval.toMillisPart() * 1000;

yield intervalDay(n, (int) days, (int) seconds, micros, 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively you could return a precision of 3 and the millis value 🤷

Copy link
Contributor Author

@bestbeforetoday bestbeforetoday Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know which approach is better. Given the current Calcite behaviour, 3 is correct. However, Calcite has a default fractional second precision of 6. My view is that Calcite is not behaving correctly and really should be returning the precision it says it is. If that gets fixed at some point, the precision here will revert to 6. I figured it was better not to change the precision (for now) to minimise the impact of this change to isthmus users, but very happy to take your guidance on the best approach. Just let me know what you prefer.

@Test
void tIntervalDay() {
// Calcite always uses milliseconds
BigDecimal bd = new BigDecimal(TimeUnit.DAYS.toMillis(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use some value that's days + seconds + millis rather than just days?

Copy link
Contributor Author

@bestbeforetoday bestbeforetoday Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is specifically testing day intervals since that is the case I hit the issue. It wouldn't be a good test of day intervals if it actually tested other time units. I am happy to add additional test cases though. There is already a day/second test immediately above this one in the file. What other cases do you think it would be valuable to cover? Just day + second + millis?

@@ -155,7 +154,7 @@ public Expression.Literal convert(RexLiteral literal) {
yield timestamp(n, ldt);
}
case INTERVAL_YEAR, INTERVAL_YEAR_MONTH, INTERVAL_MONTH -> {
var intervalLength = literal.getValueAs(BigDecimal.class).longValue();
long intervalLength = Objects.requireNonNull(literal.getValueAs(Long.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious why this change is okay but getValueAs() returns the object casted if the type matches, otherwise it has some conversion logic for some types, so here it'll first cast the object to BigDecimal and then call .longValue() on it, so basically what the previous code was doing 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the literal.getValueAs(BigDecimal.class) that was there previously calls liternal.getValueAs(Long.class) internally before then converting it into a BigDecimal... which we then convert back to a long. It seems more sensible to just ask for the type we want. Functionally it is the same.

@Blizzara
Copy link
Contributor

Blizzara commented Mar 6, 2025

I'm not familiar with Calcite, but seems okay to me! I left couple notes/questions just in case they're helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[isthmus] INTERVAL DAY is interpreted incorrectly
3 participants