Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit test-suite #409

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jkowalleck
Copy link
Member

@jkowalleck jkowalleck commented Mar 6, 2025

fixes #407

aspects:

To test ``purl`` parsing and building, a tool can use this test suite and for
every listed test object, run these tests:
- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
components should return the test canonical ``purl``
- parsing the test ``purl`` should return the components parsed from the test
canonical ``purl``
- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
should return the test canonical ``purl``
- building a ``purl`` from the test components should return the test canonical ``purl``

fixes package-url#407

Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
@jkowalleck
Copy link
Member Author

@dwalluck, @matt-phylum, et al.
please review

@dwalluck
Copy link

dwalluck commented Mar 6, 2025

@dwalluck, @matt-phylum, et al.
please review

Yes, thank you. I think that this clears things up, and I hope that you understand that I had a point with #404, but my assumption was the opposite of what was intended, apparently). Feel free to close #404 when this is merged.

Thanks.

@dwalluck
Copy link

dwalluck commented Mar 6, 2025

@matt-phylum I am going to try to test this with code first. I don't know if you want to as well.

It could be that my parsing algorithm is wrong, but it could be the tests. I got three failures.

Is version supposed to be null in these two or is it the part after the '@'?

 {
    "description": "a scheme is always required",
    "purl": "EnterpriseLibrary.Common@6.0.1304",
    "canonical_purl": "EnterpriseLibrary.Common@6.0.1304",
    "type": null,
    "namespace": null,
    "name": "EnterpriseLibrary.Common",
    "version": null,
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  },
  {
    "description": "a name is required",
    "purl": "pkg:maven/@1.3.4",
    "canonical_purl": "pkg:maven/@1.3.4",
    "type": "maven",
    "namespace": null,
    "name": null,
    "version": null,
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  },

Also, I am not sure how to parse this one to end up with components given. Maybe type is meant to be "pkg%3Amaven", not "maven" since this purl technically is missing the scheme.

  {
    "description": "invalid encoded colon : between scheme and type",
    "purl": "pkg%3Amaven/org.apache.commons/io",
    "canonical_purl": null,
    "type": "maven",
    "namespace": "org.apache.commons",
    "name": "io",
    "version": null,
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  }

@matt-phylum
Copy link
Contributor

Those are all invalid so your parsing algorithm is supposed to fail. In the second case the parser should decode the version before failing, but I don't think it affects the behavior of the test. If the PURL is invalid, you can't parse it, and if the components are invalid you can't format it, so there's nothing left to do.

@dwalluck
Copy link

dwalluck commented Mar 7, 2025

In any case, the first two appear to be missing version, no? The parser will fail, but it parses from the right, so it should have the version component (assuming it doesn't check for scheme first, then name would be null, too.

@matt-phylum
Copy link
Contributor

Most parsers do not return an invalid PURL so there is no way to assert the version.

@dwalluck
Copy link

dwalluck commented Mar 7, 2025

One more, yes, it is a test with "is_invalid": true, but the parsing should be consistent. The spec specifies the order of parsing. The entire part after '?' should be qualifiers, which have no valid keys. The only thing it should parse is the type.

 {
    "description": "check for invalid character in type",
    "purl": "pkg:n&g?inx/nginx@0.8.9",
    "canonical_purl": null,
    "type": "n&g?inx",
    "namespace": null,
    "name": "nginx",
    "version": "0.8.9",
    "qualifiers": null,
    "subpath": null,
    "is_invalid": true
  }

@dwalluck
Copy link

dwalluck commented Mar 7, 2025

This pull request makes the valid purl components consistent with the original purl. Awesome!

However, I have now found inconsistencies in the invalid purl components.

It should be made consistent with the parsing algorithm, or leave everything null. The test author should not make up what they "think" should be the components without following the parsing algorithm laid out in the specification.

@dwalluck
Copy link

dwalluck commented Mar 7, 2025

I can be more clear. The test does not test what it thinks it does. It actually tests an invalid qualifier key, not an invalid type. It could test an invalid type if it uses a character other than '#' or '?', I think. So, purl->fields is testing the qualifier keys, but fields->purl is testing the type.

@matt-phylum
Copy link
Contributor

That's not how the tests work. The type is an input to formatting only. It has nothing to do with parsing.

@dwalluck
Copy link

dwalluck commented Mar 7, 2025

That's not how the tests work. The type is an input to formatting only. It has nothing to do with parsing.

Sorry, what does "input to formatting" mean?

If a purl is valid, then the fields refer to the purl.
If the purl is invalid, then the fields refer to what?

Even though that purl is invalid, the fields there don't refer to that purl URI. I would have thought it would look more like:

 {
    "description": "check for invalid character in qualifier key",
    "purl": "pkg:n&g?inx/nginx@0.8.9",
    "canonical_purl": null,
    "type": "n&g",
    "namespace": null,
    "name": null,
    "version": null,
    "qualifiers": {"inx/nginx@0.8.9": ""},
    "subpath": null,
    "is_invalid": true
  },

@matt-phylum
Copy link
Contributor

Sorry, what does "input to formatting" mean?

The values are input to the formatting algorithm described here:

How to build ``purl`` string from its components

If a purl is valid, then the fields refer to the purl.

They do not.

If the PURL is valid, then using those values as input to build a PURL string results in the canonical PURL. Parsing the canonical PURL does not necessarily produce the same values again.

@matt-phylum
Copy link
Contributor

There is also a problem with this case:

{
"description": "invalid encoded colon : between scheme and type",
"purl": "pkg%3Amaven/org.apache.commons/io",
"canonical_purl": null,
"type": "maven",
"namespace": "org.apache.commons",
"name": "io",
"version": null,
"qualifiers": null,
"subpath": null,
"is_invalid": true
},

The PURL is invalid, and is_invalid is true, but the type namespace name version qualifiers subpath are all valid. This results in every implementation failing. Should type be null?

@dwalluck
Copy link

dwalluck commented Mar 7, 2025

They do not.

I meant purl as in the non-canonical purl field in the JSON. This pull request changed them so that they all (valid ones) match, except for the few invalid cases that I mentioned.

This is for invalid purls, the spec still defines a consistent parse result, so it would be possible for the implementations to get the same results (or not) depending on if they fail-fast or not.

@jkowalleck
Copy link
Member Author

Hm, i see.

Let's iterate and improve one thing at a time. This one is basically about valid test data.

There is an open ticket to discuss how invalid test cases must be structured. As soon as that discussion can to a conclusion, according action can follow up.

matt-phylum added a commit to phylum-dev/purl that referenced this pull request Mar 11, 2025
matt-phylum added a commit to phylum-dev/purl that referenced this pull request Mar 11, 2025
@dwalluck
Copy link

There is an open ticket to discuss how invalid test cases must be structured. As soon as that discussion can to a conclusion, according action can follow up.

Thanks. I am not seeing any issues with the valid test data except when something should be encoded or not. For example, ':' appears at times both unencoded and encoded (encoded in the version, but unencoded in the repository_url key).

As a rule, should the standalone test components always appear decoded, as in these valid examples?

  {
    "purl": "pkg:GOLANG/google.golang.org/genproto@abcdedf#/googleapis/%2E%2E/api/annotations/",
    "subpath": "/googleapis/../api/annotations/",
    "is_invalid": false
  }

or

 {
    "purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io",
    "canonical_purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io",
    "version": "sha256:244fd47e07d1004f0aed9c",
    "is_invalid": false
  }

@matt-phylum
Copy link
Contributor

The standalone components are not encoded. If they contain percent signs, those are supposed to be encoded as %25, not decoded like percent encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

revisit the test suite
3 participants