Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(isthmus): sql day/time interval conversion #335

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,8 @@ public RexNode visit(Expression.NullLiteral expr) throws RuntimeException {
@Override
public RexNode visit(Expression.UserDefinedLiteral expr) throws RuntimeException {
var binaryLiteral = rexBuilder.makeBinaryLiteral(new ByteString(expr.value().toByteArray()));
return rexBuilder.makeReinterpretCast(
typeConverter.toCalcite(typeFactory, expr.getType()),
binaryLiteral,
rexBuilder.makeLiteral(false));
var type = typeConverter.toCalcite(typeFactory, expr.getType());
return rexBuilder.makeReinterpretCast(type, binaryLiteral, rexBuilder.makeLiteral(false));
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.util.List;
import java.util.Objects;
import java.util.Optional;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rex.RexLiteral;
Expand All @@ -32,8 +32,6 @@ public LiteralConverter(TypeConverter typeConverter) {
this.typeConverter = typeConverter;
}

private static final long MICROS_IN_DAY = TimeUnit.DAYS.toMicros(1);

static final DateTimeFormatter CALCITE_LOCAL_DATE_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE;
static final DateTimeFormatter CALCITE_LOCAL_TIME_FORMATTER =
new DateTimeFormatterBuilder()
Expand Down Expand Up @@ -96,16 +94,17 @@ public Expression.Literal convert(RexLiteral literal) {
case SMALLINT -> i16(n, i(literal).intValue());
case INTEGER -> i32(n, i(literal).intValue());
case BIGINT -> i64(n, i(literal).longValue());
case BOOLEAN -> bool(n, (Boolean) literal.getValue());
case BOOLEAN -> bool(n, literal.getValueAs(Boolean.class));
case CHAR -> {
var val = literal.getValue();
if (val instanceof NlsString nls) {
yield fixedChar(n, nls.getValue());
}
throw new UnsupportedOperationException("Unable to handle char type: " + val);
}
case FLOAT, DOUBLE -> fp64(n, ((Number) literal.getValue()).doubleValue());
case REAL -> fp32(n, ((Number) literal.getValue()).floatValue());
case FLOAT, DOUBLE -> fp64(n, literal.getValueAs(Double.class));
case REAL -> fp32(n, literal.getValueAs(Float.class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about getValueAs


case DECIMAL -> {
BigDecimal bd = bd(literal);
yield decimal(n, bd, literal.getType().getPrecision(), literal.getType().getScale());
Expand Down Expand Up @@ -155,7 +154,7 @@ public Expression.Literal convert(RexLiteral literal) {
yield timestamp(n, ldt);
}
case INTERVAL_YEAR, INTERVAL_YEAR_MONTH, INTERVAL_MONTH -> {
var intervalLength = literal.getValueAs(BigDecimal.class).longValue();
long intervalLength = Objects.requireNonNull(literal.getValueAs(Long.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious why this change is okay but getValueAs() returns the object casted if the type matches, otherwise it has some conversion logic for some types, so here it'll first cast the object to BigDecimal and then call .longValue() on it, so basically what the previous code was doing 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the literal.getValueAs(BigDecimal.class) that was there previously calls liternal.getValueAs(Long.class) internally before then converting it into a BigDecimal... which we then convert back to a long. It seems more sensible to just ask for the type we want. Functionally it is the same.

var years = intervalLength / 12;
var months = intervalLength - years * 12;
yield intervalYear(n, (int) years, (int) months);
Expand All @@ -170,19 +169,15 @@ public Expression.Literal convert(RexLiteral literal) {
INTERVAL_MINUTE,
INTERVAL_MINUTE_SECOND,
INTERVAL_SECOND -> {
// we need to convert to microseconds.
// TODO: don't need to anymore
int scale = literal.getType().getScale();
var intervalLength = literal.getValueAs(BigDecimal.class).longValue();
var adjustedLength =
scale > 6
? intervalLength / ((int) Math.pow(10, scale - 6))
: intervalLength * ((int) Math.pow(10, 6 - scale));
var days = adjustedLength / MICROS_IN_DAY;
var totalMicroseconds = adjustedLength - days * MICROS_IN_DAY;
var seconds = totalMicroseconds / 1_000_000;
var microseconds = totalMicroseconds - 1_000_000 * seconds;
yield intervalDay(n, (int) days, (int) seconds, microseconds, 6 /* micros */);
// Calcite represents day/time intervals in milliseconds, despite a default scale of 6.
var totalMillis = Objects.requireNonNull(literal.getValueAs(Long.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is the case here that Calcite has always

  • literal.getType().getScale() = 6
  • literal.getValue() returns millis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you actually already answered that in the PR description:

Despite all day/time interval RexLiteral created from SQL by Calcite having a (default) scale of 6, their value is always in milliseconds.

👌

var interval = Duration.ofMillis(totalMillis);

var days = interval.toDays();
var seconds = interval.minusDays(days).toSeconds();
var micros = interval.toMillisPart() * 1000;

yield intervalDay(n, (int) days, (int) seconds, micros, 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively you could return a precision of 3 and the millis value 🤷

Copy link
Contributor Author

@bestbeforetoday bestbeforetoday Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know which approach is better. Given the current Calcite behaviour, 3 is correct. However, Calcite has a default fractional second precision of 6. My view is that Calcite is not behaving correctly and really should be returning the precision it says it is. If that gets fixed at some point, the precision here will revert to 6. I figured it was better not to change the precision (for now) to minimise the impact of this change to isthmus users, but very happy to take your guidance on the best approach. Just let me know what you prefer.

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also consistent with the conversion from Substrait to Calcite in

@Override
public RexNode visit(Expression.IntervalDayLiteral expr) throws RuntimeException {
long milliseconds =
expr.precision() > 3
? (expr.subseconds() / (int) Math.pow(10, expr.precision() - 3))
: (expr.subseconds() * (int) Math.pow(10, 3 - expr.precision()));
return rexBuilder.makeIntervalLiteral(
// Current Calcite behavior is to store milliseconds since Epoch
new BigDecimal((expr.days() * MILLIS_IN_DAY + expr.seconds() * 1_000L + milliseconds)),
DAY_SECOND_INTERVAL);
}


case ROW -> {
Expand Down
23 changes: 23 additions & 0 deletions isthmus/src/test/java/io/substrait/isthmus/CalciteLiteralTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,29 @@ void tIntervalMillisecond() {
bitest(intervalDaySecondExpr, intervalDaySecond);
}

@Test
void tIntervalDay() {
// Calcite always uses milliseconds
BigDecimal bd = new BigDecimal(TimeUnit.DAYS.toMillis(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use some value that's days + seconds + millis rather than just days?

Copy link
Contributor Author

@bestbeforetoday bestbeforetoday Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is specifically testing day intervals since that is the case I hit the issue. It wouldn't be a good test of day intervals if it actually tested other time units. I am happy to add additional test cases though. There is already a day/second test immediately above this one in the file. What other cases do you think it would be valuable to cover? Just day + second + millis?

RexLiteral intervalDayLiteral =
rex.makeIntervalLiteral(
bd,
new SqlIntervalQualifier(
org.apache.calcite.avatica.util.TimeUnit.DAY, -1, null, -1, SqlParserPos.ZERO));
var intervalDayExpr = intervalDay(false, 5, 0, 0, 6);

// rex --> expression
var convertedExpr = intervalDayLiteral.accept(rexExpressionConverter);
assertEquals(intervalDayExpr, convertedExpr);

// expression -> rex
RexLiteral convertedRex = (RexLiteral) intervalDayExpr.accept(expressionRexConverter);

// Compare value only. Ignore the precision in SqlIntervalQualifier in comparison.
assertEquals(
intervalDayLiteral.getValueAs(BigDecimal.class), convertedRex.getValueAs(BigDecimal.class));
}

@Test
void tIntervalYear() {
BigDecimal bd = new BigDecimal(123 * 12); // '123' year(3)
Expand Down
Loading