Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Schema that defines PURL types #401

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Conversation

stevespringett
Copy link
Member

@stevespringett stevespringett commented Mar 1, 2025

This PR adds a formal structure to PURL type definitions. This PR contains:

  • JSON schema for PURL type definitions
  • JSON schema for the index of all PURL type definitions
  • Sample PURL type definitions (e.g. maven, npm, etc)
  • GitHub action and Python script that will automatically generate:
    • The index.json of all PURL type definitions
    • The human-readable documentation for all types in Markdown format

This PR closes #310

Signed-off-by: Steve Springett <steve@springett.us>
Signed-off-by: Steve Springett <steve@springett.us>
Signed-off-by: Steve Springett <steve@springett.us>
Signed-off-by: Steve Springett <steve@springett.us>
Signed-off-by: Steve Springett <steve@springett.us>
Signed-off-by: Steve Springett <steve@springett.us>
…b action is activated.

Signed-off-by: Steve Springett <steve@springett.us>
@stevespringett stevespringett changed the title Purl type schema Add Schema that defines PURL types Mar 1, 2025
Signed-off-by: Steve Springett <steve@springett.us>
Dustin4444
Dustin4444 previously approved these changes Mar 2, 2025
Comment on lines +30 to +38
"normalization": {
"type": "string",
"enum": [
"lowercase",
"uppercase",
"none"
],
"description": "Defines if values must be normalized to lowercase, uppercase, or kept as provided."
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incompatible with pkg:pypi: https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization . The simpler rules in the current PURL spec are wrong (#262) but those can't be described by this normalization section either.

"case-sensitive",
"case-insensitive"
],
"description": "Determines if case must be preserved or ignored."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between sensitivity and normalization? AFAIK PURLs are always case sensitive and case-insensitive values are normalized to one casing. Un-normalized comparison with selective case insensitivity means that all code comparing PURLs needs to understand how to parse them and have a correct understanding of the comparison rules for the package type.

Comment on lines +33 to +34
"lowercase",
"uppercase",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are insufficiently defined. Technically, with Unicode I think this is an infeasible problem. For package types that allow Unicode characters in a PURL component that has normalization rules, the set of characters to be lowercased (or uppercased? does anything actually do that?) can be ASCII characters or Unicode characters. For Unicode, expect some packaging implementations to have bugs where non-BMP characters are handled incorrectly, leading to the potential need for a "lowercase BMP characters only" rule. There may even be cases where the version of Unicode makes a difference, but I doubt the package manager authors are thinking about that.

},
"character_constraints": {
"type": "string",
"description": "Regex defining valid characters."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly risky. Not all regex implementations work the same way. It would be good to specify a well known flavor of regex.

Maybe it would be better to just remove this concept entirely. PURL should not be deciding whether a package name is valid or not. It provides little benefit and causes problems if the rules become more permissive later or when bad data is received from another source.

"$id": {
"type": "string",
"description": "The unique identifier for this PURL type definition.",
"pattern": "^https://purl-spec\\.org/types/[a-z0-9-]+\\.json$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly restrictive. Sometimes people invent their own package types and this rule appears to force those people to masquerade as purl-spec.org.

"definition": {
"namespace": {
"requirement": "optional",
"allowed_characters": "^[a-z0-9-]+$",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule forbids all prefixed packages because the prefix must (currently) begin with @.

"allowed_characters": "^[a-z0-9-]+$",
"case_rules": {
"sensitivity": "case-sensitive",
"normalization": "lowercase"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. NPM package IDs are case sensitive, so they should not be lowercased.

"allowed_characters": "^[a-z0-9-]+$",
"case_rules": {
"sensitivity": "case-sensitive",
"normalization": "lowercase"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some packages have mixed case names so uppercase must be allowed and the name must not be lowercased.

},
"name": {
"requirement": "required",
"allowed_characters": "^[a-zA-Z0-9_.-]+$",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt-phylum this makes sense. We should likely drop allowed_characters

},
"name": {
"requirement": "required",
"allowed_characters": "^[a-z0-9-]+$",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. NPM contains packages like pkg:npm/%2F18_wahajali/.adventure_game@4.0.2.

@johnmhoran johnmhoran added PURL type definition Non-core definitions that describe and standardize PURL types PURL documentation Ecma PURL type component PURL generation labels Mar 4, 2025
@pombredanne pombredanne dismissed Dustin4444’s stale review March 5, 2025 16:19

Unknown user, no comments. Looks like spam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ecma PURL documentation PURL generation PURL type component PURL type definition Non-core definitions that describe and standardize PURL types
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Solution for Purl Type Definitions
5 participants