revisit test-suite #409

jkowalleck · 2025-03-06T17:21:46Z

fixes #407

aspects:

Lines 473 to 485 in 65d98c4

    
           To test ``purl`` parsing and building, a tool can use this test suite and for 
        
           every listed test object, run these tests: 
        
           - parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed 
        
             components should return the test canonical ``purl`` 
        
           - parsing the test ``purl`` should return the components parsed from the test 
        
             canonical ``purl`` 
        
           - parsing the test ``purl`` then re-building a ``purl`` from these parsed components 
        
             should return the test canonical ``purl`` 
        
           - building a ``purl`` from the test components should return the test canonical ``purl``

fixes package-url#407 Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>

jkowalleck · 2025-03-06T17:24:02Z

@dwalluck, @matt-phylum, et al.
please review

dwalluck · 2025-03-06T20:38:01Z

@dwalluck, @matt-phylum, et al.
please review

Yes, thank you. I think that this clears things up, and I hope that you understand that I had a point with #404, but my assumption was the opposite of what was intended, apparently). Feel free to close #404 when this is merged.

Thanks.

dwalluck · 2025-03-06T23:00:18Z

@matt-phylum I am going to try to test this with code first. I don't know if you want to as well.

It could be that my parsing algorithm is wrong, but it could be the tests. I got three failures.

Is version supposed to be null in these two or is it the part after the '@'?

 {
    "description": "a scheme is always required",
    "purl": "EnterpriseLibrary.Common@6.0.1304",
    "canonical_purl": "EnterpriseLibrary.Common@6.0.1304",
    "type": null,
    "namespace": null,
    "name": "EnterpriseLibrary.Common",
    "version": null,
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  },
  {
    "description": "a name is required",
    "purl": "pkg:maven/@1.3.4",
    "canonical_purl": "pkg:maven/@1.3.4",
    "type": "maven",
    "namespace": null,
    "name": null,
    "version": null,
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  },

Also, I am not sure how to parse this one to end up with components given. Maybe type is meant to be "pkg%3Amaven", not "maven" since this purl technically is missing the scheme.

  {
    "description": "invalid encoded colon : between scheme and type",
    "purl": "pkg%3Amaven/org.apache.commons/io",
    "canonical_purl": null,
    "type": "maven",
    "namespace": "org.apache.commons",
    "name": "io",
    "version": null,
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  }

matt-phylum · 2025-03-07T12:58:58Z

Those are all invalid so your parsing algorithm is supposed to fail. In the second case the parser should decode the version before failing, but I don't think it affects the behavior of the test. If the PURL is invalid, you can't parse it, and if the components are invalid you can't format it, so there's nothing left to do.

dwalluck · 2025-03-07T14:18:18Z

In any case, the first two appear to be missing version, no? The parser will fail, but it parses from the right, so it should have the version component (assuming it doesn't check for scheme first, then name would be null, too.

matt-phylum · 2025-03-07T14:23:22Z

Most parsers do not return an invalid PURL so there is no way to assert the version.

dwalluck · 2025-03-07T14:26:18Z

One more, yes, it is a test with "is_invalid": true, but the parsing should be consistent. The spec specifies the order of parsing. The entire part after '?' should be qualifiers, which have no valid keys. The only thing it should parse is the type.

 {
    "description": "check for invalid character in type",
    "purl": "pkg:n&g?inx/nginx@0.8.9",
    "canonical_purl": null,
    "type": "n&g?inx",
    "namespace": null,
    "name": "nginx",
    "version": "0.8.9",
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  }

dwalluck · 2025-03-07T14:29:22Z

This pull request makes the valid purl components consistent with the original purl. Awesome!

However, I have now found inconsistencies in the invalid purl components.

It should be made consistent with the parsing algorithm, or leave everything null. The test author should not make up what they "think" should be the components without following the parsing algorithm laid out in the specification.

dwalluck · 2025-03-07T14:35:50Z

I can be more clear. The test does not test what it thinks it does. It actually tests an invalid qualifier key, not an invalid type. It could test an invalid type if it uses a character other than '#' or '?', I think. So, purl->fields is testing the qualifier keys, but fields->purl is testing the type.

matt-phylum · 2025-03-07T16:06:50Z

That's not how the tests work. The type is an input to formatting only. It has nothing to do with parsing.

dwalluck · 2025-03-07T22:45:51Z

That's not how the tests work. The type is an input to formatting only. It has nothing to do with parsing.

Sorry, what does "input to formatting" mean?

If a purl is valid, then the fields refer to the purl.
If the purl is invalid, then the fields refer to what?

Even though that purl is invalid, the fields there don't refer to that purl URI. I would have thought it would look more like:

 {
    "description": "check for invalid character in qualifier key",
    "purl": "pkg:n&g?inx/nginx@0.8.9",
    "canonical_purl": null,
    "type": "n&g",
    "namespace": null,
    "name": null,
    "version": null,
    "qualifiers": {"inx/nginx@0.8.9": ""},
    "subpath": null,
    "is_invalid": true
  },

matt-phylum · 2025-03-07T22:51:35Z

Sorry, what does "input to formatting" mean?

The values are input to the formatting algorithm described here:

purl-spec/PURL-SPECIFICATION.rst

Line 251 in 7f7e82f

How to build ``purl`` string from its components

If a purl is valid, then the fields refer to the purl.

They do not.

If the PURL is valid, then using those values as input to build a PURL string results in the canonical PURL. Parsing the canonical PURL does not necessarily produce the same values again.

matt-phylum · 2025-03-07T23:00:48Z

There is also a problem with this case:

purl-spec/test-suite-data.json

Lines 662 to 673 in 7f7e82f

    
           { 
        
             "description": "invalid encoded colon : between scheme and type", 
        
             "purl": "pkg%3Amaven/org.apache.commons/io", 
        
             "canonical_purl": null, 
        
             "type": "maven", 
        
             "namespace": "org.apache.commons", 
        
             "name": "io", 
        
             "version": null, 
        
             "qualifiers": null, 
        
             "subpath": null, 
        
             "is_invalid": true 
        
           },

The PURL is invalid, and is_invalid is true, but the type namespace name version qualifiers subpath are all valid. This results in every implementation failing. Should type be null?

dwalluck · 2025-03-07T23:39:13Z

They do not.

I meant purl as in the non-canonical purl field in the JSON. This pull request changed them so that they all (valid ones) match, except for the few invalid cases that I mentioned.

This is for invalid purls, the spec still defines a consistent parse result, so it would be possible for the implementations to get the same results (or not) depending on if they fail-fast or not.

jkowalleck · 2025-03-11T10:35:01Z

Hm, i see.

Let's iterate and improve one thing at a time. This one is basically about valid test data.

There is an open ticket to discuss how invalid test cases must be structured. As soon as that discussion can to a conclusion, according action can follow up.

see package-url/purl-spec#409

dwalluck · 2025-03-11T15:04:24Z

There is an open ticket to discuss how invalid test cases must be structured. As soon as that discussion can to a conclusion, according action can follow up.

Thanks. I am not seeing any issues with the valid test data except when something should be encoded or not. For example, ':' appears at times both unencoded and encoded (encoded in the version, but unencoded in the repository_url key).

As a rule, should the standalone test components always appear decoded, as in these valid examples?

  {
    "purl": "pkg:GOLANG/google.golang.org/genproto@abcdedf#/googleapis/%2E%2E/api/annotations/",
    "subpath": "/googleapis/../api/annotations/",
    "is_invalid": false
  }

or

 {
    "purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io",
    "canonical_purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io",
    "version": "sha256:244fd47e07d1004f0aed9c",
    "is_invalid": false
  }

matt-phylum · 2025-03-11T16:35:22Z

The standalone components are not encoded. If they contain percent signs, those are supposed to be encoded as %25, not decoded like percent encoding.

revisit test-suite

077166e

fixes package-url#407 Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>

jkowalleck added the Test suite label Mar 6, 2025

jkowalleck requested review from johnmhoran and pombredanne March 6, 2025 17:22

matt-phylum approved these changes Mar 6, 2025

View reviewed changes

dwalluck mentioned this pull request Mar 6, 2025

fix(test-suite-data): Fix subpath tests #404

Closed

matt-phylum added a commit to phylum-dev/purl that referenced this pull request Mar 11, 2025

align test generation with spec

213fc4e

see package-url/purl-spec#409

matt-phylum added a commit to phylum-dev/purl that referenced this pull request Mar 11, 2025

align test generation with spec

13dc877

see package-url/purl-spec#409

matt-phylum mentioned this pull request Mar 11, 2025

Update test suite phylum-dev/purl#27

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revisit test-suite #409

revisit test-suite #409

jkowalleck commented Mar 6, 2025 •

edited

Loading

jkowalleck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025 •

edited

Loading

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

dwalluck commented Mar 7, 2025

dwalluck commented Mar 7, 2025 •

edited

Loading

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

matt-phylum commented Mar 7, 2025

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

jkowalleck commented Mar 11, 2025

dwalluck commented Mar 11, 2025

matt-phylum commented Mar 11, 2025

	To test ``purl`` parsing and building, a tool can use this test suite and for
	every listed test object, run these tests:

	- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
	components should return the test canonical ``purl``

	- parsing the test ``purl`` should return the components parsed from the test
	canonical ``purl``

	- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
	should return the test canonical ``purl``

	- building a ``purl`` from the test components should return the test canonical ``purl``

revisit test-suite #409

Are you sure you want to change the base?

revisit test-suite #409

Conversation

jkowalleck commented Mar 6, 2025 • edited Loading

jkowalleck commented Mar 6, 2025

dwalluck commented Mar 6, 2025

dwalluck commented Mar 6, 2025 • edited Loading

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

dwalluck commented Mar 7, 2025

dwalluck commented Mar 7, 2025 • edited Loading

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

matt-phylum commented Mar 7, 2025

matt-phylum commented Mar 7, 2025

dwalluck commented Mar 7, 2025

jkowalleck commented Mar 11, 2025

dwalluck commented Mar 11, 2025

matt-phylum commented Mar 11, 2025

jkowalleck commented Mar 6, 2025 •

edited

Loading

dwalluck commented Mar 6, 2025 •

edited

Loading

dwalluck commented Mar 7, 2025 •

edited

Loading