Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrects tests regarding coercion, indexing, bag operators, and missing attributes #112

Merged
merged 6 commits into from
May 21, 2024

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented Apr 16, 2024

Description

  • Corrects 80+ incorrect tests.
  • The following sections break down each test and why they're wrong according to the PartiQL Specification.

FROM RHS Coercion

The excerpt below comes from the PartiQL Specification and states that, in strict mode, the query will FAIL when the RHS of a FROM is a scalar. In permissive, it'll be coerced to a bag holding the scalar.

In the following cases the expression in the FROM clause item has the wrong type. Under the type checking option, all of these cases raise an error and the query fails. Under the permissive option, the cases proceed as follows:

Iteration over a scalar value: Consider the query:
FROM s AS v AT p
or the query:
FROM s AS v

where s is a scalar value. Then s coerces into the bag << s >>, i.e., the bag that has a single element, the s. The
rest of the semantics is identical to what happens when the lhs of the FROM item is a bag.
-- PartiQL Specification Section 5.1.1

This is applicable to the following 11 tests:

  • dateTimePartsAsVariableNames
  • various types in from clause
  • rangeTwiceOverScalar
  • single source FROM with scalar
  • single source FROM with scalar and AT clause
  • single source FROM with tuple
  • single source FROM with tuple and AT clause
  • single source FROM with absent value null
  • single source FROM with absent value null and AT clause
  • single source FROM with absent value missing
  • single source FROM with absent value missing and AT clause

Negative Indexing

From the PartiQL Specification:

In the permissive mode, an array navigation evaluation a[i] will result into missing in each of the following cases:
• a does not evaluate into an array, or
• i does not evaluate into a positive integer within the array’s bounds.
...
In type checking mode, the query will fail in each one of the cases above.

This is applicable to the following 2 tests:

  • ordinalAccessWithNegativeIndex
  • ordinalAccessWithNegativeIndexAndBindings

SFW Coercion

The excerpt below comes from the PartiQL Specification and states that SELECT queries shall be coerced.

In each of the following cases a SFW subquery coerces into a scalar...
if it is an SFW subquery expression that (a) is not the collection expression of a FROM clause item and (b) is not
the rhs of an IN. (If it is the rhs of an IN then it should not be coerced; see note on semantics of IN, Section ??.)

This is applicable to the following 4 tests:

  • SELECT VALUE with nested aggregates : SELECT VALUE (SELECT SUM(outerFromSource.col1) AS the_sum FROM <<1>>) FROM simple_1_col_1_group as outerFromSource
  • selectListMultipleAggregatesNestedQuery
  • Expression with multiple subqueriees containing aggregates : CAST((SELECT COUNT(1) FROM products) AS LIST)[0]._1 / CAST((SELECT COUNT(1) FROM suppliers) AS LIST)[0]._1
  • Aggregates with subquery containing another aggregate : SELECT COUNT(1) + CAST((SELECT SUM(numInStock) FROM products) AS LIST)[0]._1 as a_number FROM products

Note: I had to rename some of these due to the inclusion of the queries in their names.

EXCEPT DISTINCT Bug

For more information, please see partiql/partiql-lang#75

This is applicable to the following 1 test:

  • outerExceptDistinct.

Tuple Path Navigation Failures

From the PartiQL Specification:

In the case of tuple paths, since PartiQL does not assume a schema, the semantics must also specify the return value
when:

  1. t is not a tuple (i.e., when the expression t does not evaluate into a tuple), or
  2. t is a tuple that does not have an a attribute.
    ...
    PartiQL can operate in a permissive mode or in a conventional type checking mode, where the
    query fails once typing errors (such as the above mentioned ones) happen. In the permissive mode, typing errors are
    typically neglected by using the semantics outlined next...
    In all of the above cases PartiQL returns the special value missing.

Notably, it seems that, in type checking mode, grouping does NOT coerce any type checking errors to NULL/MISSING. However, if a MISSING is encountered (not an error), then it will be coerced to NULL.

If a grouping expression evaluates to MISSING, it is first coerced into NULL, thus bringing MISSING and NULL in the same group.

Similarly, aggregate functions are modeled as scalar functions that take in BAGS and output scalars. It seems that a type check exception occurring within the arguments of an aggregate function will result in a query failure (it won't be caught) with type checking mode. In permissive, these failures are coerced to MISSING, and the aggregate functions handle them appropriately.

This is applicable to the following 58 tests:

  • selectValueStructConstructorWithMissing
  • SELECT supplierId_missings FROM products_sparse p GROUP BY p.supplierId_missings
  • SELECT p.supplierId_missings FROM products_sparse p GROUP BY p.supplierId_missings
  • SELECT VALUE { 'supplierId_missings' : p.supplierId_missings } FROM products_sparse p GROUP BY p.supplierId_missings
  • SELECT supplierId_mixed FROM products_sparse p GROUP BY p.supplierId_mixed
  • SELECT p.supplierId_mixed FROM products_sparse p GROUP BY p.supplierId_mixed
  • SELECT VALUE { 'supplierId_mixed' : p.supplierId_mixed } FROM products_sparse p GROUP BY p.supplierId_mixed
  • SELECT regionId, supplierId_missings FROM products_sparse p GROUP BY p.regionId, p.supplierId_missings
  • SELECT p.regionId, p.supplierId_missings FROM products_sparse p GROUP BY p.regionId, p.supplierId_missings
  • SELECT VALUE { 'regionId': p.regionId, 'supplierId_missings': p.supplierId_missings } FROM products_sparse p GROUP BY p.regionId, p.supplierId_missings
  • SELECT regionId, supplierId_mixed FROM products_sparse p GROUP BY p.regionId, p.supplierId_mixed
  • SELECT regionId, p.supplierId_mixed FROM products_sparse p GROUP BY p.regionId, p.supplierId_mixed
  • SELECT VALUE { 'regionId': p.regionId, 'supplierId_mixed': p.supplierId_mixed } FROM products_sparse p GROUP BY p.regionId, p.supplierId_mixed
  • SELECT COUNT(1) AS the_count, <ALL_AGGS>(p.price_missings) AS the_agg FROM products_sparse AS p
  • SELECT COUNT(1) AS the_count, <ALL_AGGS>(p.price_mixed) AS the_agg FROM products_sparse AS p
  • SELECT categoryId, COUNT(1) AS the_count, <ALL_AGGS>(p.price_missings) AS the_agg FROM products_sparse AS p GROUP BY categoryId
  • SELECT categoryId, COUNT(1) AS the_count, <ALL_AGGS>(p.price_mixed) AS the_agg FROM products_sparse AS p GROUP BY categoryId
  • SELECT p.categoryId, COUNT(1) AS the_count, <ALL_AGGS>( price_nulls) AS the_agg FROM products_sparse AS p GROUP BY p.categoryId
  • SELECT p.categoryId, COUNT(1) AS the_count, <ALL_AGGS>(p.price_mixed) AS the_agg FROM products_sparse AS p GROUP BY p.categoryId
  • undefinedUnqualifiedVariable_inSelect_withProjectionOption
  • selectListWithMissing
  • ANY with GROUP BY
  • ANY DISTINCT with GROUP BY
  • SOME with GROUP BY
  • SOME DISTINCT with GROUP BY
  • EVERY with GROUP BY
  • EVERY DISTINCT with GROUP BY
  • undefinedUnqualifiedVariableInSelectWithUndefinedVariableBehaviorMissing
  • array element evaluates to MISSING
  • bag element evaluates to MISSING
  • bag element evaluates to MISSING in bag constructor
  • cast and operations with missing argument
  • group by with absent values
  • group by with differenciated absent values

UNPIVOT Coercion

I fixed a handful of tests that do not handle the Unpivot mistyping scenarios. According to the PartiQL Specification:

In the following cases the expression in the FROM UNPIVOT clause item has the “wrong" type, i.e., it is not a tuple.
Under the type checking option, all of these cases raise an error and the query fails. Under the permissive option, the cases proceed as follows:
FROM UNPIVOT x AS v AT n
whereas x is not a tuple and is not MISSING, is equivalent to:
FROM UNPIVOT {'_1': x} AS v AT n

AT Alias When Unordered

I fixed some tests that previously contradicted the following excerpt from the PartiQL Specification:

In the following cases the expression in the FROM clause item has the wrong type. Under the type checking option, all of these cases raise an error and the query fails. Under the permissive option, the cases proceed as follows
• Position variable on bags: Consider the clause:
FROM b AS v AT p
and assume that b is a bag << e0, . . . , en−1 >>. The output is a bag with binding tuples 〈v : ei, p : MISSING. The value MISSING for the variable p indicates that the order of elements in the bag was meaningless.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@johnedquinn johnedquinn marked this pull request as ready for review April 17, 2024 18:00
@johnedquinn johnedquinn changed the title Fixes miscellaneous tests Corrects 76 tests regarding coercion, indexing, bag operators, and missing attributes Apr 17, 2024
evalMode: EvalModeError,
result: EvaluationFail,
},
]
Copy link
Contributor

@rchowell rchowell Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the next test "rangeOverSexp" is also incorrect. An sexp is a collection. The result should be bag::[a, b, c ] not a singleton bag.

Notice that we turn an input Ion list into a collection that we range over, but here we treat the sexp collection as a single value. This does not make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will add that test to the scope of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! Also found and fixed some more incorrect tests -- see 1da4e8b.

@johnedquinn johnedquinn changed the title Corrects 76 tests regarding coercion, indexing, bag operators, and missing attributes Corrects tests regarding coercion, indexing, bag operators, and missing attributes Apr 18, 2024
Comment on lines 300 to 314
assert: [
{
evalMode:EvalModeCoerce,
result:EvaluationSuccess,
output:$bag::[
{
a:5
}
]
},
{
evalMode: EvalModeError,
result: EvaluationFail,
},
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting to track rangeOverStruct failure in strict mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tuple case is covered in section 5.1.1:

5.1.1. Mistyping Cases
In the following cases the expression in the FROM clause item has the wrong type. Under the type checking option, all
of these cases raise an error and the query fails.
...
• Iteration over a tuple value: Consider the query:
FROM t AS v AT p
or the query:
FROM t AS v
where t is a tuple. Then t coerces into the bag << t >>

so it should give an error in strict mode. Existing permissive mode behavior is correct.

@johnedquinn johnedquinn requested a review from jpschorr April 23, 2024 22:49
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still have a couple files I'm looking through, namely

  • partiql-tests-data/eval/query/select/from-clause.ion and
  • partiql-tests-data/eval/query/group-by/group-by.ion

test changes in the other files I've looked through look good

{readings:$bag::[0.5]}
]
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree w/ how this test will give an error in strict mode due to the path expression l.sensor in the 2nd grouping expression. I guess the original test was meant to demonstrate how to prevent NULL and MISSING from being grouped together.

Not sure if there's really a way to prevent GROUP BY on l.sensor from erroring in strict mode.

result: EvaluationSuccess,
output: $bag::[{x: 0}]
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch for these tests. Was an oversight on my part -- I completely missed the heading "Mistyping Cases" when porting

@@ -126,7 +126,6 @@ bagOperators::[
evalMode:[EvalModeCoerce, EvalModeError],
result:EvaluationSuccess,
output:$bag::[
1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to link partiql/partiql-lang#75 and/or leave a comment about the behavior here. Didn't realize the DISTINCT is called on both inputs before calling the EXCEPT.

alancai98
alancai98 previously approved these changes Apr 26, 2024
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work correcting the conformance tests! Finished looking at all the tests. Left a couple non-blocking comments.

Comment on lines 300 to 314
assert: [
{
evalMode:EvalModeCoerce,
result:EvaluationSuccess,
output:$bag::[
{
a:5
}
]
},
{
evalMode: EvalModeError,
result: EvaluationFail,
},
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tuple case is covered in section 5.1.1:

5.1.1. Mistyping Cases
In the following cases the expression in the FROM clause item has the wrong type. Under the type checking option, all
of these cases raise an error and the query fails.
...
• Iteration over a tuple value: Consider the query:
FROM t AS v AT p
or the query:
FROM t AS v
where t is a tuple. Then t coerces into the bag << t >>

so it should give an error in strict mode. Existing permissive mode behavior is correct.

evalMode:EvalModeError,
result:EvaluationFail
},
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically all the changed tests from above this point that had used supplierId_missings, supplierId_mixed, price_missings, or price_mixed in the GROUP BY or aggregation expression which should have resulted in an error in strict mode?

jpschorr
jpschorr previously approved these changes May 14, 2024
Copy link
Contributor

@jpschorr jpschorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes look better aligned with the spec.

I wonder, however, if what the spec says about FROM RHS Coercion and UNPIVOT actually runs afoul of our ability satisfy the "Gradual Guarantee" as per the "Refined Criteria for Gradual Typing" paper.

Perhaps push all the other changes and hold back the FROM and UNPIVOT changes while we investigate the typing implications?

@johnedquinn johnedquinn dismissed stale reviews from jpschorr and alancai98 via b3bf59f May 21, 2024 20:38
johnedquinn added a commit that referenced this pull request May 21, 2024
@johnedquinn
Copy link
Member Author

Perhaps push all the other changes and hold back the FROM and UNPIVOT changes while we investigate the typing implications?

Totally. I've removed those cases and have pushed them to PR #113. This is now ready for final review.

@johnedquinn johnedquinn merged commit be88ae7 into main May 21, 2024
4 checks passed
@johnedquinn johnedquinn deleted the main-fix-tests branch May 21, 2024 22:09
johnedquinn added a commit that referenced this pull request Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants