Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test-suite-data): Fix subpath tests #404

Closed
wants to merge 1 commit into from

Conversation

dwalluck
Copy link

@dwalluck dwalluck commented Mar 4, 2025

The purls are valid as input, but the dots should be dropped from the canonical output.

The test should list the canonical component for subpath, not the input component.

The purls are valid as input, but the dots should be dropped from the
canonical output.

The test should list the canonical component for `subpath`, not the
input component.
@dwalluck
Copy link
Author

dwalluck commented Mar 4, 2025

As background, https://github.com/package-url/packageurl-java currently compares all components in the canonical purl to the components in the test suite JSON file, and this is the only test that is currently failing that comparison.

@dwalluck
Copy link
Author

dwalluck commented Mar 4, 2025

The spec is unclear here. I think this is due to confusing the input (when certain things are allowed) with canonical output (where certain things are now allowed). I know this based on how many times different implementations can't agree on things.

Where the spec says "When percent-decoded", maybe it means "When canonicalized".

Not allowed ("MUST NOT"):

"When percent-decoded, a segment: - MUST NOT contain a '/' - MUST NOT be any of '..' or '.' - MUST NOT be empty"

Yet, it apparently is allowed:

"Discard empty, '.' and '..' segments"

dwalluck added a commit to dwalluck/packageurl-java that referenced this pull request Mar 4, 2025
This change makes the two test pass that were introduced in
package-url/purl-spec#368.

See also package-url/purl-spec#404.
@jkowalleck jkowalleck self-requested a review March 5, 2025 08:36
@jkowalleck
Copy link
Member

jkowalleck commented Mar 5, 2025

As background, https://github.com/package-url/packageurl-java currently compares all components in the canonical purl to the components in the test suite JSON file, and this is the only test that is currently failing that comparison.

That is because the test is a non-intended one. one the test-suite was not built for.
It was intended to work the other way around, and therefore, the java tests must be fixed, not the test suite.

the purpose of the test suite:

To test ``purl`` parsing and building, a tool can use this test suite and for
every listed test object, run these tests:
- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
components should return the test canonical ``purl``
- parsing the test ``purl`` should return the components parsed from the test
canonical ``purl``
- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
should return the test canonical ``purl``
- building a ``purl`` from the test components should return the test canonical ``purl``

Copy link
Member

@jkowalleck jkowalleck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

altering the value of the component would drop one of the intended test scenarios:
it would no longer test for the correctness of the following process:

  • building a purl from the test components should return the test canonical purl

as of

To test ``purl`` parsing and building, a tool can use this test suite and for
every listed test object, run these tests:
- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
components should return the test canonical ``purl``
- parsing the test ``purl`` should return the components parsed from the test
canonical ``purl``
- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
should return the test canonical ``purl``
- building a ``purl`` from the test components should return the test canonical ``purl``

Therefore, the modification in this PR is undesired!

lets play this though: the test is: take all the given components, craft a PURL from them, and the result MUST BE the canonicalized purl. This all is intended to work with or without the ../. in the components.
So if we have those ../. in the components, we have a proper test whether the implementation drops those characters in accordance with the spec.
If you'd remove these characters from the test suite, there would be no case anymore.

Therefore, your change is removing a significant part of the test suite.
and therefore, your change is undesired.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 5, 2025

The spec is unclear here. I think this is due to confusing the input (when certain things are allowed) with canonical output (where certain things are now allowed). I know this based on how many times different implementations can't agree on things.

Where the spec says "When percent-decoded", maybe it means "When canonicalized".

Not allowed ("MUST NOT"):

"When percent-decoded, a segment: - MUST NOT contain a '/' - MUST NOT be any of '..' or '.' - MUST NOT be empty"

Yet, it apparently is allowed:

"Discard empty, '.' and '..' segments"

and why would all these rule need modifications on the test suite components that are used for input only?


PS:

let me show whatthe test suite is actually meant for:

  • parsing the test canonical purl then re-building a purl from these parsed
    components should return the test canonical purl

means: make_purl(parse_purl(canonical_purl)) == canonical_purl

  • parsing the test purl should return the components parsed from the test
    canonical purl

means: parse_purl(purl) == parse_purl(canonical_purl)

  • parsing the test purl then re-building a purl from these parsed components
    should return the test canonical purl

means: make_purl(parse_purl(purl)) == canonical_purl

  • building a purl from the test components should return the test canonical purl

means: make_purl({ ...test_components }) == canonical_purl

@dwalluck
Copy link
Author

dwalluck commented Mar 5, 2025

  • building a purl from the test components should return the test canonical purl

All that is still the case in this pull request. The change is about something else entirely.

The change is about what the components themselves are supposed to represent. Currently, it is the case for all other tests except this one that if is_invalid is false, then

name == canonical_purl.name
type == canonical_purl.type
namespace == canonical_purl.namespace
name == canonical_purl.name
version == canonical_purl.version
qualifiers == canonical_purl.qualifiers
subpath == canonical_purl.subpath

This one test changes this assumption by having subpath == purl.subpath.

If the fields can hold either the value of purl or canonical_purl and there's no assumption made there, then it's not even clear which one to use to write a test case.

Then, you have to make the stipulation that implementation authors MUST NOT ever compare against the test fields directly.

@dwalluck
Copy link
Author

dwalluck commented Mar 5, 2025

Currently, packageurl-java compares the canonical purl fields against the fields in the test. This is how I know that this is currently the case and also does not change anything above that you talking about. It is still the case that all of that is true.

In either case make_purl({ ...test_components }) == canonical_purl whther or not test_components represents the fields of purl or canonical_purl or a mix of both.

The question is what test_components is supposed to represent and whether they represent the canonical_purl components as is currently the case. Otherwise, they can represent both or neither, or a mix of the two, and we can't run any tests against them as we do currently.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 6, 2025

Currently, packageurl-java compares the canonical purl fields against the fields in the test. This is how I know that this is currently the case and also does not change anything above that you talking about.

Didi I get you right? you use:
{ ...test_components } == parse_purl(canonical_purl)

But what the purpose of the test suite is:

make_purl({ ...test_components }) == canonical_purl

It is still the case that all of that is true.

i see that the above is quite different. You are holding it wrong.

The question is what test_components is supposed to represent and whether they represent the canonical_purl components as is currently the case. Otherwise, they can represent both or neither, or a mix of the two, and we can't run any tests against them as we do currently.

thats what Ive told you:
test components are there to craft a purl from, and this purl is expected to be equal to the canonical_purl. see below:


  • building a purl from the test components should return the test canonical purl

All that is still the case in this pull request. The change is about something else entirely.

true. and it was also the case before you changed anything.
in fact, you modified a qualitative, meaningful test cases - and now it is meaningless.

your change is undesireable.

@jkowalleck
Copy link
Member

I wonder whether the wording/phasing of the test-purpose is note nought, people seam to be confused. Maybe it can be improved?
Any suggestions how things could be phrased so that it is clear what they suppose to mean?

@dwalluck
Copy link
Author

dwalluck commented Mar 6, 2025

I wonder whether the wording/phasing of the test-purpose is note nought, people seam to be confused. Maybe it can be improved?

Please confirm that this code is wrong, then, since what it's testing cannot be assumed to be true

https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107

However, you intend the test to be used this way

https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L139-L149

However, this doesn't work for tests with scheme problems, because the parsing error could occur even earlier, and "scheme" isn't even one of the listed compnoents.

Any suggestions how things could be phrased so that it is clear what they suppose to mean?

Well, I already tried to state the problem: there's nothing there about the meaning of test_components and what form they are in. Also, it is inconsistent: some tests use canonical form, e.g., type: GOLANG -> golang, but this test is using original form.

@jkowalleck
Copy link
Member

https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107

this can be summarized as the following, right?
{ ...test_components } == parse_purl(purl)
If so, this is none of the intended use cases of the test suite, or is it?

@dwalluck
Copy link
Author

dwalluck commented Mar 6, 2025

this can be summarized as the following, right?
{ ...test_components } == parse_purl(purl)
If so, this is none of the intended use cases of the test suite, or is it?

I would summarize it as:

for t in test_components:
    assert canonical_purl.t == t

I cannot say about parse_purl(purl), is it supposed to return the canonical form or the input/relaxed form? If it is canonical form, then, yes, it seems to be doing what you stated above.

It doesn't appear to be "intended" based on what you said. But again, shouldn't test_components be of one of the forms and not a mix of both? How does a test author or implementation author know which goes there if it has both?

I am referring to this case here

"type": "golang",

That line has the canonical form, whereas this test has the relaxed form.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 6, 2025

this can be summarized as the following, right?
{ ...test_components } == parse_purl(purl)
If so, this is none of the intended use cases of the test suite, or is it?

I would summarize it as:

for t in test_components:
    assert canonical_purl.t == t

but thats not what the code does.
https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107
it does not look at canonical_purl/cpurlString at all. it looks at purl/purlString.

I cannot say about parse_purl(purl), is it supposed to return the canonical form or the input/relaxed form? [...]

please read

To test ``purl`` parsing and building, a tool can use this test suite and for
every listed test object, run these tests:
- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
components should return the test canonical ``purl``
- parsing the test ``purl`` should return the components parsed from the test
canonical ``purl``
- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
should return the test canonical ``purl``
- building a ``purl`` from the test components should return the test canonical ``purl``

I am referring to this case here

"type": "golang",

That line has the canonical form, whereas this test has the relaxed form.

does this have anything todo with the topic of this pullrequest? nope.
please stay fodussed.
I am very much intrigued closing this PR. You could opening a discussion . where you can misscopes, and go backandforth about the same things again and again -- github's discussions have proper thread features to handle these kind of conversations.

@dwalluck
Copy link
Author

dwalluck commented Mar 6, 2025

but thats not what the code does.
https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107
it does not look at canonical_purl/cpurlString at all. it looks at purl/purlString.

That is incorrect. It is like I said: it compares the canonical purl to the fields in the JSON. It does not look a the original string. That string is "lost" when the purl is created.

@dwalluck
Copy link
Author

dwalluck commented Mar 6, 2025

does this have anything todo with the topic of this pullrequest? nope.
please stay fodussed.

The pull request changed the value of the components under the assumption that the components were supposed to be in the canonical form, based on that example. This pull request exposes an inconsistency in the test cases. Right now, it seems to me that the test author may just randomly choose any form to put for the field, which may or may not be related to the input string.

I apologize if you still do not see the issue. As far as the pull request, you may close it if my assumption is incorrect, but I still wish for someone to define what the correct assumption should be.

Thanks.

@dwalluck
Copy link
Author

dwalluck commented Mar 6, 2025

To be more clear, the tests will currently fail on the golang example if comparing the non-canonical/original fields to the JSON fields, or the subpath example if comparing the canonical fields to the JSON fields.

The simple answer may be that the fields can be anything and to never compare fields directly, but to only use them for composing the purl. But, then it's unclear what meaning, if any, should be assigned to those fields in any given example.

@dwalluck dwalluck mentioned this pull request Mar 6, 2025
@dwalluck
Copy link
Author

dwalluck commented Mar 6, 2025

Sorry, I just saw #409 after I posted comments to this issue.

It does clear up the meaning of the fields now. It apparently also shows that the current Java code is making the wrong assumption. And since the Java implementation "loses" the original value of the fields, there's nothing to compare to. I guess it could just delete any field comparison code altogether and only use the fields for constructing.

@jkowalleck
Copy link
Member

Exactly.
The java implementation used the test data in a way it was never used ntended for.

So can we close this PR, then?

@jkowalleck jkowalleck added invalid and removed Ecma specification Work on the core specification labels Mar 11, 2025
dwalluck added a commit to dwalluck/packageurl-java that referenced this pull request Mar 11, 2025
This change makes the two test pass that were introduced in
package-url/purl-spec#368.

See also package-url/purl-spec#404.
@dwalluck
Copy link
Author

So can we close this PR, then?

Sure.

@dwalluck dwalluck closed this Mar 11, 2025
@dwalluck dwalluck deleted the fix-subpath-tests branch March 11, 2025 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants