Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantics of closing "top" #3727

Open
cuematthew opened this issue Feb 5, 2025 · 7 comments
Open

semantics of closing "top" #3727

cuematthew opened this issue Feb 5, 2025 · 7 comments
Labels
Triage Requires triage/attention

Comments

@cuematthew
Copy link
Contributor

cuematthew commented Feb 5, 2025

Apparently closing top doesn't do anything. I find this is illogical, and think should be changed.

#s: f: _
o: #s & {f: g: 1}

This does not cause an error.

  1. I think this is surprising, and a foot-gun.

  2. It now means we have 3 ways to spell forced-open:

    1. #s: f: {...} This is (imo) by far the easiest to understand. It looks like a struct. ... only has one role in the language. This can't really be misinterpreted.
    2. #s: f: _ This doesn't look like a struct, and it looks like it should be closed. Yet we allow it to unify with structs that add fields.
    3. #s: f: {_} This is surprising syntax. It looks like a struct, yet we allow it to unify with things that are not structs. E.g. o: #s & {f: 1}
  3. It is illogical. Consider this:
    #x: number
    Every int will satisfy this (as does int). E.g. adding o: #x & 5 won't cause an error. If I make the original a bit more specific:
    #x: int
    then still every int will satisfy this. So this is a basic transitive property: if we have "i is more specific than j is more specific than k" and we replace a "k" with a "j" then we still require "i" to unify without error with "j".

    But consider this:
    #y: _
    Currently, every struct will satisfy this (as does {}). E.g. adding o: #y & {f: 1} won't cause an error. Yet, if I make the original a bit more specific:
    #y: {}
    now the behaviour has inverted and values that unified before no longer unify. We have broken transitivity. This causes surprise for users, and makes the mental model harder for everyone to hold -- as every caveat and corner case does.

Therefore, I think it is preferable for

#s: f: _
o: #s & {f: g: 1}

to cause an error -- field not allowed.

There is nothing problematic or undesirable about a closed top precluding any non-empty struct. If the user really wants to allow any struct, then {...} is available, and if they want to allow any value at all, then {_} is available (or {...} | _ which I prefer). I think because {_} is such strange syntax it avoids the point-of-surprise -- you would see it and think "well that's odd, it must be there for a good reason".

@cuematthew cuematthew added the Triage Requires triage/attention label Feb 5, 2025
@myitcv
Copy link
Member

myitcv commented Feb 6, 2025

Copying your example for reference:

#s: f: _
o: #s & {f: g: 1}
  1. I think this is surprising, and a foot-gun.

FWIW, it's not surprising to me! I say that only as a sample in the discussion, certainly not as a point of reference.

For me, the notion of something being closed only applies to structure, i.e. lists and structs. But with _ we have nothing. Hence with such a definition we don't have anything to do with respect to closedness when referring to #s (and by extension #s.f). Note also that close(_) has no meaning:

! exec cue vet x.cue
cmp stderr stderr.golden

-- x.cue --
close(_)

-- stderr.golden --
struct argument must be concrete:
    ./x.cue:1:1

Noting that _ does not imply the value will be a struct because it says nothing. It could equally be a list or any other value than bottom. Indeed for that reason I think that error message is slightly misleading and should perhaps say:

struct or list argument must be concrete
  1. #s: f: {...} This is (imo) by far the easiest to understand. It looks like a struct. ... only has one role in the language. This can't really be misinterpreted.

As you say, , {...} is the spelling of "must be of type struct", and that structure can have any shape.

Contrast (play):

#s: f: {
  g: name?: string
  ...
}

x: #s & {
  f: g: something: "test"
  anything: "else"
}

which gives:

x.f.g.something: field not allowed:
    -:2:6
    -:6:4
    -:7:9

In this example, beyond the field g in f we can define whatever values we like.

The only reason I say this out loud is to point out that ... only plays the role of "arbitrary further structure" in a list or structure, beyond that structure already established.

2. #s: f: _ This doesn't look like a struct, and it looks like it should be closed. Yet we allow it to unify with structs that add fields.

Per my comments above, there is nothing to close here.

3. #s: f: {_} This is surprising syntax. It looks like a struct, yet we allow it to unify with things that are not structs. E.g. o: #s & {f: 1}

I'd tend to agree that this looks a bit odd. My understanding is that it's a consequence of embedding. On this point I reference our offline discussion regarding more explicit syntax for embedding that would cause this to disappear.

3. Consider this:
#x: number
Every int will satisfy this (as does int). E.g. adding o: #x & 5 won't cause an error. If I make the original a bit more specific:
#x: int
then still every int will satisfy this. So this is a basic transitive property: if we have "i is more specific than j is more specific than k" and we replace a "k" with a "j" then we still require "i" to unify without error with "j".

I'm not sure I follow this argument so an example/question by way of reply:

#x: number
o: #x & 5

// some time later we refine to:

#x: float

Has this not violated the property you outlined?

In any case, I'm not clear that property has to hold in general, in CUE at least. It's a breaking change, to my mind, to make such a definition more specific. The only safe change, in general, would be to make is less specific.

@cuematthew
Copy link
Contributor Author

As a general rule, I would expect subsumption to obey transitivity. That is, if k is more specific than j, and j is more specific than i, then k should be more specific than i. This seems quite fundamental. But the fact that closing top does not do anything, breaks this.

i: number
j: int
k: 5

ij: i & j // == j, so j is more specific than i
jk: j & k // == k, so k is more specific than j
ik: i & k // == k, and so the transitive property is established - k is more specific than i

all fine. But:

#i: _
#j: {f: 1}
#k: {}

We know that j is more specific that i. We also know that k is more specific than j. This is because really, we should spell them as:

#j: {f: 1, ...⊥}
#k: {...⊥}

so both have all the fields defined, but for the field f, we have k.f is more specific than j.f because ⊥ is the most specific. Consequently I would expect transitivity to hold as before:

ij: #i & #j // == j, so j is more specific than i
jk: #j & #k // This errors because of the attempt to unify 1 with ⊥, which fails. This is fine.
ik: #i & #k // This does not error.

What is seemingly wrong is that replacing the most general value, i with the value j (that we know is between i and k) produces an error. I think it makes sense for ik to error in addition to jk.

@mpvl
Copy link
Member

mpvl commented Feb 7, 2025

It is also not surprising to me: _ is the top of the lattice. Think of it as any (it may not be a bad idea to rename it to that).

Note that _ is also more broader claim than {...}. The latter states that there may any field of any type within that struct, whereas _, or "any" means that any value is allowed, which includes {...}.

There is an orthogonal concern of {} meaning struct only, whereas {_} meaning an embedding of any value. It would indeed be nice to have a clearer syntax for that, but this is non-trivial, as it interacts with many aspects of CUE.

As a general principle, A is equivalent to {A}.

@mpvl
Copy link
Member

mpvl commented Feb 7, 2025

On this point I reference our offline discussion regarding more explicit syntax for embedding that would cause this to disappear.

with the mentioned syntax this would still be { & _ }, or something like that, which is not necessarily a big improvement here.

A big reason for this syntax (though not the only one) is the interaction with comprehensions, which always emit a value enclosed in braces{}. If we found a way to relax that, we may start contemplating a notation for embedded scalars that looks more like int { #foo: bar }, which may be more intuitive. Again, many interactions that would have to be considered.

@mpvl
Copy link
Member

mpvl commented Feb 7, 2025

Therefore, I think it is preferable for

#s: f: _
o: #s & {f: g: 1}

to cause an error -- field not allowed.

With _ meaning any and being the top (as CUE holds it) of the lattice, this makes no sense, as anything, including {g: 1} is an instance of "any".

@mpvl
Copy link
Member

mpvl commented Feb 7, 2025

@cuematthew :

We know that j is more specific that i. We also know that k is more specific than j. This is because really, we should spell them as:

Actually they are not: {...⊥} is equivalent to defining all possible optional fields as bottom. By making f a regular field, you make it such that the glb is bottom. IOW, #k and #j are unordered relatively to each other and #k is not more specific than #j in this case.

@mpvl
Copy link
Member

mpvl commented Feb 7, 2025

For completeness of the argument, a regular field is more specific than a required field which is more specific than an optional field.

@mvdan mvdan changed the title Closing ⊤ does not. What?! semantics of closing "top" Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage Requires triage/attention
Projects
None yet
Development

No branches or pull requests

3 participants