fix(test-suite-data): Fix subpath tests #404

dwalluck · 2025-03-04T16:19:26Z

The purls are valid as input, but the dots should be dropped from the canonical output.

The test should list the canonical component for subpath, not the input component.

The purls are valid as input, but the dots should be dropped from the canonical output. The test should list the canonical component for `subpath`, not the input component.

dwalluck · 2025-03-04T16:22:27Z

As background, https://github.com/package-url/packageurl-java currently compares all components in the canonical purl to the components in the test suite JSON file, and this is the only test that is currently failing that comparison.

dwalluck · 2025-03-04T16:29:35Z

The spec is unclear here. I think this is due to confusing the input (when certain things are allowed) with canonical output (where certain things are now allowed). I know this based on how many times different implementations can't agree on things.

Where the spec says "When percent-decoded", maybe it means "When canonicalized".

Not allowed ("MUST NOT"):

"When percent-decoded, a segment: - MUST NOT contain a '/' - MUST NOT be any of '..' or '.' - MUST NOT be empty"

Yet, it apparently is allowed:

"Discard empty, '.' and '..' segments"

This change makes the two test pass that were introduced in package-url/purl-spec#368. See also package-url/purl-spec#404.

jkowalleck · 2025-03-05T08:36:17Z

As background, https://github.com/package-url/packageurl-java currently compares all components in the canonical purl to the components in the test suite JSON file, and this is the only test that is currently failing that comparison.

That is because the test is a non-intended one. one the test-suite was not built for.
It was intended to work the other way around, and therefore, the java tests must be fixed, not the test suite.

the purpose of the test suite:

purl-spec/PURL-SPECIFICATION.rst

Lines 473 to 485 in 65d98c4

    
           To test ``purl`` parsing and building, a tool can use this test suite and for 
        
           every listed test object, run these tests: 
        
           - parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed 
        
             components should return the test canonical ``purl`` 
        
           - parsing the test ``purl`` should return the components parsed from the test 
        
             canonical ``purl`` 
        
           - parsing the test ``purl`` then re-building a ``purl`` from these parsed components 
        
             should return the test canonical ``purl`` 
        
           - building a ``purl`` from the test components should return the test canonical ``purl``

jkowalleck

altering the value of the component would drop one of the intended test scenarios:
it would no longer test for the correctness of the following process:

building a purl from the test components should return the test canonical purl

as of

purl-spec/PURL-SPECIFICATION.rst

Lines 473 to 485 in 65d98c4

    
           To test ``purl`` parsing and building, a tool can use this test suite and for 
        
           every listed test object, run these tests: 
        
           - parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed 
        
             components should return the test canonical ``purl`` 
        
           - parsing the test ``purl`` should return the components parsed from the test 
        
             canonical ``purl`` 
        
           - parsing the test ``purl`` then re-building a ``purl`` from these parsed components 
        
             should return the test canonical ``purl`` 
        
           - building a ``purl`` from the test components should return the test canonical ``purl``

Therefore, the modification in this PR is undesired!

lets play this though: the test is: take all the given components, craft a PURL from them, and the result MUST BE the canonicalized purl. This all is intended to work with or without the ../. in the components.
So if we have those ../. in the components, we have a proper test whether the implementation drops those characters in accordance with the spec.
If you'd remove these characters from the test suite, there would be no case anymore.

Therefore, your change is removing a significant part of the test suite.
and therefore, your change is undesired.

jkowalleck · 2025-03-05T08:50:53Z

The spec is unclear here. I think this is due to confusing the input (when certain things are allowed) with canonical output (where certain things are now allowed). I know this based on how many times different implementations can't agree on things.

Where the spec says "When percent-decoded", maybe it means "When canonicalized".

Not allowed ("MUST NOT"):

"When percent-decoded, a segment: - MUST NOT contain a '/' - MUST NOT be any of '..' or '.' - MUST NOT be empty"

Yet, it apparently is allowed:

"Discard empty, '.' and '..' segments"

and why would all these rule need modifications on the test suite components that are used for input only?

PS:

let me show whatthe test suite is actually meant for:

parsing the test canonical purl then re-building a purl from these parsed
components should return the test canonical purl

means: make_purl(parse_purl(canonical_purl)) == canonical_purl

parsing the test purl should return the components parsed from the test
canonical purl

means: parse_purl(purl) == parse_purl(canonical_purl)

parsing the test purl then re-building a purl from these parsed components
should return the test canonical purl

means: make_purl(parse_purl(purl)) == canonical_purl

building a purl from the test components should return the test canonical purl

means: make_purl({ ...test_components }) == canonical_purl

dwalluck · 2025-03-05T14:14:44Z

building a purl from the test components should return the test canonical purl

All that is still the case in this pull request. The change is about something else entirely.

The change is about what the components themselves are supposed to represent. Currently, it is the case for all other tests except this one that if is_invalid is false, then

name == canonical_purl.name
type == canonical_purl.type
namespace == canonical_purl.namespace
name == canonical_purl.name
version == canonical_purl.version
qualifiers == canonical_purl.qualifiers
subpath == canonical_purl.subpath

This one test changes this assumption by having subpath == purl.subpath.

If the fields can hold either the value of purl or canonical_purl and there's no assumption made there, then it's not even clear which one to use to write a test case.

Then, you have to make the stipulation that implementation authors MUST NOT ever compare against the test fields directly.

dwalluck · 2025-03-05T14:20:59Z

Currently, packageurl-java compares the canonical purl fields against the fields in the test. This is how I know that this is currently the case and also does not change anything above that you talking about. It is still the case that all of that is true.

In either case make_purl({ ...test_components }) == canonical_purl whther or not test_components represents the fields of purl or canonical_purl or a mix of both.

The question is what test_components is supposed to represent and whether they represent the canonical_purl components as is currently the case. Otherwise, they can represent both or neither, or a mix of the two, and we can't run any tests against them as we do currently.

jkowalleck · 2025-03-06T12:17:52Z

Currently, packageurl-java compares the canonical purl fields against the fields in the test. This is how I know that this is currently the case and also does not change anything above that you talking about.

Didi I get you right? you use:
{ ...test_components } == parse_purl(canonical_purl)

But what the purpose of the test suite is:

make_purl({ ...test_components }) == canonical_purl

It is still the case that all of that is true.

i see that the above is quite different. You are holding it wrong.

The question is what test_components is supposed to represent and whether they represent the canonical_purl components as is currently the case. Otherwise, they can represent both or neither, or a mix of the two, and we can't run any tests against them as we do currently.

thats what Ive told you:
test components are there to craft a purl from, and this purl is expected to be equal to the canonical_purl. see below:

building a purl from the test components should return the test canonical purl

All that is still the case in this pull request. The change is about something else entirely.

true. and it was also the case before you changed anything.
in fact, you modified a qualitative, meaningful test cases - and now it is meaningless.

your change is undesireable.

jkowalleck · 2025-03-06T12:21:42Z

I wonder whether the wording/phasing of the test-purpose is note nought, people seam to be confused. Maybe it can be improved?
Any suggestions how things could be phrased so that it is clear what they suppose to mean?

dwalluck · 2025-03-06T14:17:08Z

I wonder whether the wording/phasing of the test-purpose is note nought, people seam to be confused. Maybe it can be improved?

Please confirm that this code is wrong, then, since what it's testing cannot be assumed to be true

https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107

However, you intend the test to be used this way

https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L139-L149

However, this doesn't work for tests with scheme problems, because the parsing error could occur even earlier, and "scheme" isn't even one of the listed compnoents.

Any suggestions how things could be phrased so that it is clear what they suppose to mean?

Well, I already tried to state the problem: there's nothing there about the meaning of test_components and what form they are in. Also, it is inconsistent: some tests use canonical form, e.g., type: GOLANG -> golang, but this test is using original form.

jkowalleck · 2025-03-06T14:40:23Z

https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107

this can be summarized as the following, right?
{ ...test_components } == parse_purl(purl)
If so, this is none of the intended use cases of the test suite, or is it?

dwalluck · 2025-03-06T14:53:00Z

this can be summarized as the following, right?
{ ...test_components } == parse_purl(purl)
If so, this is none of the intended use cases of the test suite, or is it?

I would summarize it as:

for t in test_components:
    assert canonical_purl.t == t

I cannot say about parse_purl(purl), is it supposed to return the canonical form or the input/relaxed form? If it is canonical form, then, yes, it seems to be doing what you stated above.

It doesn't appear to be "intended" based on what you said. But again, shouldn't test_components be of one of the forms and not a mix of both? How does a test author or implementation author know which goes there if it has both?

I am referring to this case here

purl-spec/test-suite-data.json

Line 42 in 7f7e82f

"type": "golang",

That line has the canonical form, whereas this test has the relaxed form.

jkowalleck · 2025-03-06T16:40:16Z

this can be summarized as the following, right?
{ ...test_components } == parse_purl(purl)
If so, this is none of the intended use cases of the test suite, or is it?

I would summarize it as:
for t in test_components:
    assert canonical_purl.t == t

but thats not what the code does.
https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107
it does not look at canonical_purl/cpurlString at all. it looks at purl/purlString.

I cannot say about parse_purl(purl), is it supposed to return the canonical form or the input/relaxed form? [...]

please read

purl-spec/PURL-SPECIFICATION.rst

Lines 473 to 485 in 65d98c4

    
           To test ``purl`` parsing and building, a tool can use this test suite and for 
        
           every listed test object, run these tests: 
        
           - parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed 
        
             components should return the test canonical ``purl`` 
        
           - parsing the test ``purl`` should return the components parsed from the test 
        
             canonical ``purl`` 
        
           - parsing the test ``purl`` then re-building a ``purl`` from these parsed components 
        
             should return the test canonical ``purl`` 
        
           - building a ``purl`` from the test components should return the test canonical ``purl``

I am referring to this case here

purl-spec/test-suite-data.json

Line 42 in 7f7e82f

"type": "golang",

That line has the canonical form, whereas this test has the relaxed form.

does this have anything todo with the topic of this pullrequest? nope.
please stay fodussed.
I am very much intrigued closing this PR. You could opening a discussion . where you can misscopes, and go backandforth about the same things again and again -- github's discussions have proper thread features to handle these kind of conversations.

dwalluck · 2025-03-06T20:21:28Z

but thats not what the code does.
https://github.com/package-url/packageurl-java/blob/d23a0ee3f9221e3798ce2e380c918821329366fa/src/test/java/com/github/packageurl/PackageURLTest.java#L87-L107
it does not look at canonical_purl/cpurlString at all. it looks at purl/purlString.

That is incorrect. It is like I said: it compares the canonical purl to the fields in the JSON. It does not look a the original string. That string is "lost" when the purl is created.

dwalluck · 2025-03-06T20:24:36Z

does this have anything todo with the topic of this pullrequest? nope.
please stay fodussed.

The pull request changed the value of the components under the assumption that the components were supposed to be in the canonical form, based on that example. This pull request exposes an inconsistency in the test cases. Right now, it seems to me that the test author may just randomly choose any form to put for the field, which may or may not be related to the input string.

I apologize if you still do not see the issue. As far as the pull request, you may close it if my assumption is incorrect, but I still wish for someone to define what the correct assumption should be.

Thanks.

dwalluck · 2025-03-06T20:33:53Z

To be more clear, the tests will currently fail on the golang example if comparing the non-canonical/original fields to the JSON fields, or the subpath example if comparing the canonical fields to the JSON fields.

The simple answer may be that the fields can be anything and to never compare fields directly, but to only use them for composing the purl. But, then it's unclear what meaning, if any, should be assigned to those fields in any given example.

dwalluck · 2025-03-06T20:40:52Z

Sorry, I just saw #409 after I posted comments to this issue.

It does clear up the meaning of the fields now. It apparently also shows that the current Java code is making the wrong assumption. And since the Java implementation "loses" the original value of the fields, there's nothing to compare to. I guess it could just delete any field comparison code altogether and only use the fields for constructing.

jkowalleck · 2025-03-11T10:28:29Z

Exactly.
The java implementation used the test data in a way it was never used ntended for.

So can we close this PR, then?

This change makes the two test pass that were introduced in package-url/purl-spec#368. See also package-url/purl-spec#404.

dwalluck · 2025-03-11T14:48:08Z

So can we close this PR, then?

Sure.

fix(test-suite-data): Fix subpath tests

bdb0535

The purls are valid as input, but the dots should be dropped from the canonical output. The test should list the canonical component for `subpath`, not the input component.

dwalluck mentioned this pull request Mar 4, 2025

test: unencoded subpath cannot contain . or .. #368

Merged

dwalluck added a commit to dwalluck/packageurl-java that referenced this pull request Mar 4, 2025

Silently drop empty and dot segments for subpath

00c46dd

This change makes the two test pass that were introduced in package-url/purl-spec#368. See also package-url/purl-spec#404.

dwalluck mentioned this pull request Mar 4, 2025

Silently drop empty and dot segments for subpath package-url/packageurl-java#176

Open

johnmhoran added Test suite PURL subpath component Ecma specification Work on the core specification labels Mar 4, 2025

jkowalleck self-requested a review March 5, 2025 08:36

jkowalleck requested changes Mar 5, 2025

View reviewed changes

dwalluck mentioned this pull request Mar 6, 2025

revisit test-suite #409

Open

jkowalleck added invalid and removed Ecma specification Work on the core specification labels Mar 11, 2025

dwalluck added a commit to dwalluck/packageurl-java that referenced this pull request Mar 11, 2025

Silently drop empty and dot segments for subpath

a151d32

This change makes the two test pass that were introduced in package-url/purl-spec#368. See also package-url/purl-spec#404.

dwalluck closed this Mar 11, 2025

dwalluck deleted the fix-subpath-tests branch March 11, 2025 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(test-suite-data): Fix subpath tests #404

fix(test-suite-data): Fix subpath tests #404

dwalluck commented Mar 4, 2025

dwalluck commented Mar 4, 2025

dwalluck commented Mar 4, 2025

jkowalleck commented Mar 5, 2025 •

edited

Loading

jkowalleck left a comment •

edited

Loading

jkowalleck commented Mar 5, 2025 •

edited

Loading

dwalluck commented Mar 5, 2025

dwalluck commented Mar 5, 2025

jkowalleck commented Mar 6, 2025 •

edited

Loading

jkowalleck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

jkowalleck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

jkowalleck commented Mar 6, 2025 •

edited

Loading

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

jkowalleck commented Mar 11, 2025

dwalluck commented Mar 11, 2025

	To test ``purl`` parsing and building, a tool can use this test suite and for
	every listed test object, run these tests:

	- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
	components should return the test canonical ``purl``

	- parsing the test ``purl`` should return the components parsed from the test
	canonical ``purl``

	- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
	should return the test canonical ``purl``

	- building a ``purl`` from the test components should return the test canonical ``purl``

fix(test-suite-data): Fix subpath tests #404

fix(test-suite-data): Fix subpath tests #404

Conversation

dwalluck commented Mar 4, 2025

dwalluck commented Mar 4, 2025

dwalluck commented Mar 4, 2025

jkowalleck commented Mar 5, 2025 • edited Loading

jkowalleck left a comment • edited Loading

Choose a reason for hiding this comment

jkowalleck commented Mar 5, 2025 • edited Loading

dwalluck commented Mar 5, 2025

dwalluck commented Mar 5, 2025

jkowalleck commented Mar 6, 2025 • edited Loading

jkowalleck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

jkowalleck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

jkowalleck commented Mar 6, 2025 • edited Loading

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

jkowalleck commented Mar 11, 2025

dwalluck commented Mar 11, 2025

jkowalleck commented Mar 5, 2025 •

edited

Loading

jkowalleck left a comment •

edited

Loading

jkowalleck commented Mar 5, 2025 •

edited

Loading

jkowalleck commented Mar 6, 2025 •

edited

Loading

jkowalleck commented Mar 6, 2025 •

edited

Loading