Skip to content
This repository has been archived by the owner on Feb 21, 2023. It is now read-only.

Statement by reference #35

Closed
letmaik opened this issue Nov 14, 2022 · 17 comments
Closed

Statement by reference #35

letmaik opened this issue Nov 14, 2022 · 17 comments

Comments

@letmaik
Copy link
Contributor

letmaik commented Nov 14, 2022

At IETF 115 there were discussions on having some kind of standard way to deal with statements by reference. Here are also the two relevant slides from https://datatracker.ietf.org/doc/slides-115-scitt-combined-scitt-presentations/:

image
image

Simply using COSE detached payloads as defined in the RFC would not be sufficient as the payload would still be required during signature validation when registering the signed statement.

Instead, having a specific content type for referencing external statements may be useful. Note that this format by itself would be a statement.

RFC 9054 gives two examples for such hash structures:

COSE_Hash_V = (
    1 : int / tstr, # Algorithm identifier
    2 : bstr, # Hash value
    ? 3 : tstr, # Location of object that was hashed
    ? 4 : any   # object containing other details and things
    )

and

COSE_Hash_Find = [
    hashAlg : int / tstr,
    hashValue : bstr
]

SUIT's digest container defines this as:

SUIT_Digest = [
  suit-digest-algorithm-id : suit-cose-hash-algs,
  suit-digest-bytes : bstr,
  * $$SUIT_Digest-extensions   ; described as optional extra values required by a hash alg (?)
]

Would having a variant of one of the above as a CBOR content type address this issue?
Should location of the referenced content be included? How? Should location hints be globally unique? Resolvable?
Should a SCITT transparency service know about this content type and at least validate its CDDL schema?

@letmaik
Copy link
Contributor Author

letmaik commented Nov 16, 2022

Another thing to include in the reference should be the content type.

@OR13
Copy link

OR13 commented Nov 17, 2022

The COSE representations are killing me... I find them incredibly hard to process.

Here is an example I am familiar with:

https://blog.cloudflare.com/cloudflare-distributed-web-resolver/
https://docs.ipfs.tech/concepts/content-addressing/

https://ipfs.io/ipfs/QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR
https://ipfs.io/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi

https://docs.ipfs.tech/how-to/best-practices-for-nft-data/#types-of-ipfs-links-and-when-to-use-them

const cid = await ipfs.add({ content }, {
  cidVersion: 1,
  hashAlg: 'sha2-256'
})

ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi

If you don't want to use IPFS, but you want something similar, you can do what docker has been doing:

sha256:fc92eec5cac70b0c324cec2933cd7db1c0eae7c9e2649e42d02e77eb6da0d15f

^ this won't help you resolve or dereference, but it will help you identify.

@rjb4standards
Copy link

FYI: Sha256 ID's have been working well for over a year for our own registry SAG-CTR. Have not seen any collisions yet.

@OR13
Copy link

OR13 commented Nov 17, 2022

If you want to build a custom identifier scheme for statements you should consider the precedent of Data URIs:

https://github.com/transmute-industries/did-method-meliorism/blob/f2a7d8673a7b49a6fae84c4348614109ff35409b/src/cli.js#L153

https://en.wikipedia.org/wiki/Data_URI_scheme

data:text/vnd-example+xyz;foo=bar;base64,R0lGODdh

data:text/vnd-scitt+claim;hash=sha256;content-type=application/vnd-cid+ipld;base64,R0lGODdh

See also Tag 42: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml

@letmaik
Copy link
Contributor Author

letmaik commented Nov 18, 2022

From back to front:

See also Tag 42: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml

CIDs use multicodec for identifying content types and those have to registered here: https://github.com/multiformats/multicodec/blob/master/table.csv. I think SCITT should allow CIDs in some way but it doesn't look like a general enough mechanism.

If you want to build a custom identifier scheme for statements you should consider the precedent of Data URIs:

I think what you're suggesting is creating a new media type where the hash of the referenced data is the content and everything else is put into media type parameters, for example:

Media type: application/scitt-statement-by-reference;hash=sha256;content-type=application/spdx
Content: binary sha256 hash of statement

You could put all that into a Data URI by base64 encoding the content (hash), but I don't see where this would go in the COSE envelope and how it interacts with the cty parameter. Base64 encoding the hash seems also a bit wasteful. I don't see how decoding such a Data URI is easier than decoding a CBOR structure to be honest.

The COSE representations are killing me... I find them incredibly hard to process.

Where exactly do you see problems in processing the CBOR representations? Would the same be true for an equivalent JSON representation?

My general feeling is that the detached use case may become the thing that's used exclusively in some settings, and so if we define a standard mechanism for that I think it should be as efficient as possible and not necessarily rely on text representations. In that sense, CBOR CIDs (as mentioned above) go in the right direction but are quite hard to decode (and introduce yet another format next to CBOR) and too limited I think (see above).

@SteveLasker
Copy link

SteveLasker commented Nov 21, 2022

Have you considered PURLs:?
Here's one we did a while back, specifically for this purpose:
https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#oci

We also spent time discussing separating identity from location:

@OR13
Copy link

OR13 commented Nov 21, 2022

Where exactly do you see problems in processing the CBOR representations?

Readability and types... getting in the way of "representative examples".

I prefer to argue over a representative example, and then map from it to existing building blocks, not the other way round.

@letmaik
Copy link
Contributor Author

letmaik commented Nov 21, 2022

Slides:
Discussion - Statement by reference.pdf

@letmaik
Copy link
Contributor Author

letmaik commented Nov 22, 2022

Where exactly do you see problems in processing the CBOR representations?

Readability and types... getting in the way of "representative examples".

I prefer to argue over a representative example, and then map from it to existing building blocks, not the other way round.

Alright, makes sense. So your concern is not about the implementations but just to facilitate discussions. I guess sometimes the two are intertwined but let's try anyway.

@letmaik
Copy link
Contributor Author

letmaik commented Nov 23, 2022

@OR13 This is my attempt at representative examples. Is this what you had in mind? Any others you can think of?

Statement stored in undeclared location

hash alg: sha-256
hash: abc
content type: application/foo

Statement stored on web server

hash alg: sha-256
hash: abc
content type: application/foo
location: https://example.com/statements/abc.json

Statement stored on IPFS

hash alg: sha-256
hash: abc
content type: application/foo
location: ipfs://QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB

Note: The hash embedded within the CID is not the hash of the raw content!
See https://docs.ipfs.tech/concepts/hashing/#content-identifiers-are-not-file-hashes.

Note2: The content type embedded within the CID cannot be arbitrary.
See https://github.com/multiformats/multicodec/blob/master/table.csv and search for "ipld".
The raw type may be a reasonable fall-back and a specific content type may be stored outside
of the CID.

Statement stored in OCI registry

hash alg: sha-256
hash: abc
content type: application/foo
location: docker.io/library/example@sha256:def

Note: The hash in the location is not the hash of the raw content, but rather of a manifest.
There are a few indirections that make it a bit hard to understand.
See the in-development ORAS artifacts spec at https://github.com/oras-project/artifacts-spec.
Does referencing the location with hash add any benefit? Would a flexible tag be enough?

Note2: The Notary project also defines signing over OCI artifacts and may be in conflict.
See https://github.com/notaryproject/notaryproject.

Statement stored in DID service endpoint

hash alg: sha-256
hash: abc
content type: application/foo
location: did:example:123?service=files&relativeRef=/statement.json

Note: DID dereferencing would be used to retrieve the statement from the given location.

Note2: The DID in the location may be distinct from the issuer of the signed statement.

@OR13
Copy link

OR13 commented Nov 30, 2022

@letmaik these are excellent examples.

Statement stored on IPFS

The 2 notes are interesting.

perhaps its not for this repo, but I would love to generate some "real examples" from some "safe / fake data"

So we can see the actual proposed data structures.

If there is a repo where I can do that work, I'm happy to tackle the IPFS examples, I did something similar recently for this:

https://github.com/transmute-industries/ns.transmute.org

@OR13
Copy link

OR13 commented Dec 2, 2022

We have a generic use case for "signed statement" by reference, which I would love to explore as well.

When my statement refers to several "other signed statements" or "transparent statements" by reference.

@letmaik
Copy link
Contributor Author

letmaik commented Feb 8, 2023

@OR13 Let's experiment here: https://github.com/ietf-scitt/statements-by-reference

@OR13
Copy link

OR13 commented Feb 8, 2023

I filed: ietf-scitt/statements-by-reference#1

@yogeshbdeshpande
Copy link
Contributor

@letmaik to move this issue to new repository retaining all the history of Conversations been tracked here!

@yogeshbdeshpande
Copy link
Contributor

@fournet & @letmaik to work on a PR for Architeture

@SteveLasker
Copy link

Closing as this should be covered by: ietf-wg-scitt/draft-ietf-scitt-architecture#8

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants