- Abstract
- Problem
- Background
- Proposal
- Rationale
- Alternatives considered
- Member packs
- Single semantic model for pack expansions
- Generalize
expand
- Omit
expand
- Support expanding arrays
- Omit each-names
- Fold expressions
- Allow multiple pack expansions in a tuple pattern
- Allow nested pack expansions
- Use postfix instead of prefix
...
- Avoid context-sensitity in pack expansions
- Require parentheses around
each
- Fused expansion tokens
- No parameter merging
- Exhaustive function call typechecking
Proposes a set of core features for declaring and implementing generic variadic functions.
A "pack expansion" is a syntactic unit beginning with ...
, which is a kind of
compile-time loop over sequences called "packs". Packs are initialized and
referred to using "each-names", which are marked with the each
keyword at the
point of declaration and the point of use.
The syntax and behavior of a pack expansion depends on its context, and in some
cases by a keyword following the ...
:
- In a tuple literal expression (such as a function call argument list),
...
iteratively evaluates its operand expression, and treats the values as successive elements of the tuple. ...and
and...or
iteratively evaluate a boolean expression, combining the values usingand
andor
, and ending the loop early if the underlying operator short-circuits.- In a statement context,
...
iteratively executes a statement. - In a tuple literal pattern (such as a function parameter list),
...
iteratively matches the elements of the scrutinee tuple. In conjunction with pack bindings, this enables functions to take an arbitrary number of arguments.
Carbon needs a way to define functions and parameterized types that are variadic, meaning they can take a variable number of arguments.
C has long supported variadic functions through the "varargs" mechanism, but that's heavily disfavored in C++ because it isn't type-safe. Instead, C++ provides a separate feature for defining variadic templates, which can be functions, classes, or even variables. However, variadic templates currently suffer from several shortcomings. Most notably:
- They must be templates, which means they cannot be definition-checked, and suffer from a variety of other costs such as needing to be defined in header files, and code bloat due to template instantiation.
- It is inordinately difficult to define a variadic function whose parameters have a fixed type, and the signature of such a function does not clearly communicate that fixed type to readers.
- The design encourages using recursion rather than iteration to process the elements of a variadic parameter list. This results in more template instantiations, and typically has at least quadratic overhead in the size of the pack (at compile time, and sometimes at run time). In recent versions of C++ it is also possible to iterate over packs procedurally, using a fold expressions over the comma operator, but that technique is awkward to use and not widely known.
There have been a number of C++ standard proposals to address some of these issues, and improve variadic templates in other ways, such as P1219R2: Homogeneous variadic function parameters, P1306R1: Expansion Statements, P1858R2: Generalized Pack Declaration and Usage, and P2277R0: Packs Outside of Templates. However, C++ has chosen not to pursue definition-checking even for non-variadic functions, so definition-checked variadics seem out of reach. The most recent proposal to support fixed-type parameter packs was rejected. A proposal to support iterating over parameter packs was inactive for several years, but has very recently been revived.
Swift supports variadic parameters so long as all elements have the same type, and has recently approved SE-0393: Value and Type Parameter Packs, which adds support for definition-checked, heterogeneous variadic parameters (with a disjoint syntax). SE-0404: Pack Iteration, which extends that to support iterating through a variadic parameter list, has been positively received, but hasn't yet been approved.
There have been several attempts to add such a feature to Rust, but that work is currently inactive.
See /docs/design/variadics.md
in this pull request.
// Computes the sum of its arguments, which are i64s
fn SumInts(... each param: i64) -> i64 {
var sum: i64 = 0;
... sum += each param;
return sum;
}
// Concatenates its arguments, which are all convertible to String
fn StrCat[... each T:! ConvertibleToString](... each param: each T) -> String {
var len: i64 = 0;
... len += each param.Length();
var result: String = "";
result.Reserve(len);
... result.Append(each param.ToString());
return result;
}
// Returns the minimum of its arguments, which must all have the same type T.
fn Min[T:! Comparable & Value](var result: T, ... each next: T) -> T {
... if (each next < result) {
result = each next;
}
return result;
}
// Invokes f, with the tuple `args` as its arguments.
fn Apply[... each T:! type, F:! CallableWith(... each T)]
(f: F, args: (... each T)) -> auto {
return f(...expand args);
}
// Takes an arbitrary number of vectors with arbitrary element types, and
// returns a vector of tuples where the i'th element of the vector is
// a tuple of the i'th elements of the input vectors.
fn Zip[... each ElementType:! type]
(... each vector: Vector(each ElementType))
-> Vector((... each ElementType)) {
... var each iter: auto = each vector.Begin();
var result: Vector((... each ElementType));
while (...and each iter != each vector.End()) {
result.push_back((... each iter));
... each iter++;
}
return result;
}
// Toy example of mixing variadic and non-variadic parameters.
// Takes an i64, any number of f64s, and then another i64.
fn MiddleVariadic(first: i64, ... each middle: f64, last: i64);
// Toy example of using the result of variadic type deduction.
fn TupleConcat[... each T1: type, ... each T2: type](
t1: (... each T1), t2: (... each T2)) -> (... each T1, ... each T2) {
return (...expand t1, ...expand t2);
}
The following table compares selected examples of Carbon variadics against equivalent code written in C++20 (with and without the extensions discussed earlier) and Swift.
Carbon | C++20 | C++20 with extensions | Swift with extensions |
---|---|---|---|
// Computes the sum of its arguments, which are i64s
fn SumInts(... each param: i64) -> i64 {
var sum: i64 = 0;
... sum += each param;
return sum;
} |
template <typename... Params>
requires (std::convertible_to<Params, int64_t> && ...)
int64_t SumInts(const Params&... params) {
return (static_cast<int64_t>(params) + ... + 0);
} |
With P1219R2: int64_t SumInts(int64... params) {
return (static_cast<int64_t>(params) + ... + 0);
} |
(No extensions) func SumInts(_ params: Int64...) {
var sum: Int64 = 0
for param in params {
sum += param
}
return sum
} |
fn Min[T:! Comparable & Value](first: T, ... each next: T) -> T {
var result: T = first;
... if (each next < result) {
result = each next;
}
return result;
} |
template <typename T, typename... Params>
requires std::totally_ordered<T> && std::copyable<T> &&
(std::same_as<T, Params> && ...)
T Min(T first, Params... rest) {
if constexpr (sizeof...(rest) == 0) {
// Base case.
return first;
} else {
T min_rest = Min(rest...);
if (min_rest < first) {
return min_rest;
} else {
return first;
}
}
} |
With P1219R2 and P1306R2 template <typename T>
requires std::totally_ordered<T> && std::copyable<T>
T Min(const T& first, const T&... rest) {
T result = first;
template for (const T& t: rest) {
if (t < result) {
result = t;
}
}
return result;
} |
(No extensions) func Min<T: Comparable>(_ first: T, _ rest: T...) -> T {
var result: T = first;
for t in rest {
if (t < result) {
result = t
}
}
return result
} |
fn StrCat[... each T:! ConvertibleToString](... each param: each T) -> String {
var len: i64 = 0;
... len += each param.Length();
var result: String = "";
result.Reserve(len);
... result.Append(each param.ToString());
return result;
} |
template <ConvertibleToString... Ts>
std::string StrCat(const Ts&... params) {
std::string result;
result.reserve((params.Length() + ... + 0));
(result.append(params.ToString()), ...);
return result;
} |
With P1306R2 template <ConvertibleToString... Ts>
std::string StrCat(const Ts&... params) {
std::string result;
result.reserve((params.Length() + ... + 0));
template for (auto param: params) {
result.append(param.ToString());
}
return result;
} |
With SE-0393 and SE-404 func StrCat<each T: ConvertibleToString>(_ param: repeat each T) -> String {
var len: Int64 = 0;
for param in repeat each param {
len += param.Length()
}
var result: String = ""
result.reserveCapacity(len)
for param in repeat each param {
result.append(param.ToString())
}
return result
} |
Carbon needs variadics to effectively support
interoperation with and migration from C++,
where variadic templates are fairly common. Variadics also make code
easier to read, understand, and write,
because some APIs (such as printf
) can't be naturally expressed in terms of a
fixed number of parameters.
Furthermore, Carbon needs to support generic variadics for the same reasons it needs to support generic non-variadic functions: for example, definition-checking makes APIs easier to read, understand, and write, and easier to evolve. Furthermore, the language as a whole is easier to understand and write code in if separate features like variadics and generics compose in natural ways, rather than being mutually exclusive.
Variadics are also important for supporting
performance-critical software,
because variadic APIs can be more efficient than their non-variadic
counterparts. For example, StrCat
is fundamentally more efficient than
something like a chain of operator+
calls on std::string
, because it does
not need to materialize a series of partial results, and it can pre-allocate a
buffer large enough for the final result.
Variadics are also needed to support the principle that all APIs are library APIs, because the library representations of types such as tuples and callables will need to be variadic. This proposal may appear to deviate from that principle in some ways, but that appearance is misleading:
- The design of pack expansion expressions treats the tuple literal syntax as built-in, but this isn't a problem because literal syntaxes are explicitly excluded from the principle.
- The design of pack expansion patterns treats tuple types as built-in. This is arguably consistent with the principle, if we regard a tuple pattern as a kind of tuple literal (note that they have identical syntax). This proposal also revises the text of the principle to make that more explicit.
- Pack types themselves are built-in types, with no library API. However, the principle only applies to first-class types, and pack types are decidedly not first-class: they cannot be function return types, they cannot even be named, and an expression cannot evaluate to a value with a pack type unless it's within a pack expansion and it has compile-time expression phase (and even that narrow exception only exists to make the formalism more convenient).
We could potentially support declaring each-names as class members. However, this raises some novel design issues. In particular, pack bindings currently rely exclusively on type deduction for information like the arity of the pack, but for class members, there usually isn't an initializer available to drive type deduction.
In addition, it's usually if not always possible to work around the lack of member packs by using members with tuple or array types instead. Consequently, this feature is deferred to future work.
There's a subtle discrepancy in how this proposal models expression pack expansions: at run time, all pack expansions are modeled as procedural loops that successively evaluate the expansion body for each element of the input pack, and within each iteration, expressions have scalar values. However, in the type system, expressions within a pack expansion are notionally evaluated once, producing a pack value. In effect, this treats pack expansions like SIMD code, with expressions operating on "vectors" of data in parallel, rather than iteratively executing the code on a series of scalar values.
This discrepancy leads to an impedance mismatch where the two models meet. In particular, it leads to the result that expressions within a pack expansion have pack types, but do not evaluate to pack values. This contravenes one of the basic expectations of a type system, that the type of an expression equals (or is at least a supertype of) the type of its value.
It's tempting to resolve the inconsistency by applying the parallel model at run
time as well as in the type system. However, that isn't feasible, because the
parallel model has the same limitation for variadics as it does for SIMD: it
can't model branching control flow. For example, consider
(... if (expand cond) then F(expand param) else G(expand param))
: if
expand param
truly evaluated to a pack value, then evaluating this expression
would require N calls to both F
and G
, rather than N calls to either F
or G
. Even for expressions that don't contain control flow, the same problem
applies when they occur within a statement pack expansion that does. We can't
even statically detect these problems, because a branch could be hidden inside a
function call. And this isn't just a performance problem -- if F
or G
have
side effects, it can also be a correctness problem.
An earlier version of this proposal tried to address this problem in a more limited way by saying that expressions within a pack expansion don't have types at all, but instead have "type packs". This shift in terminology nominally avoids the problem of having expressions that don't evaluate to a value of the expression's type, but it doesn't seem to be very clarifying in practice, and it doesn't address the substance of the problem.
The syntax "...expand
expression" behaves like syntactic sugar for
... each x
, where x
is an invented pack binding in the same scope, defined
as if by "let (... each x: auto) =
expression". We could generalize that by
saying that expand
is a prefix operator with the same precedence as *
that
can be used anywhere in a pack expansion, where "expand
expression" is
syntactic sugar for each x
(with x
defined as before, in the scope
containing the pack expansion). This would make expand
more useful, and also
resolve the anomaly where ...expand
is the only syntax that begins with ...
but is not a pack expansion. It is also a precondition for several of the
alternatives discussed below.
However, those semantics could be very surprising in practice. For example:
...if (Condition()) {
var x: auto = expand F(y);
}
In this code, F(y)
is evaluated before the pack expansion is entered, which
means that it is evaluated unconditionally, and it cannot refer to names
declared inside the if
block.
We can avoid the name-resolution issue by disallowing expand
in statement pack
expansions, but the sequencing of evaluation could still be surprising,
particularly with if
expressions.
As noted above, ...expand
is fundamentally syntactic sugar, so we could omit
it altogether. This would somewhat simplify the design, and avoid the anomaly of
having one syntax that starts with ...
but isn't a pack expansion. However,
that would make it substantially less ergonomic to do things like expand a tuple
into an argument list, which we expect to be relatively common.
Statically-sized arrays are very close to being a special case of tuple types:
the only difference between an array type [i32; 2]
(using Rust syntax) and a
tuple type (i32, i32)
is that the array type can be indexed with a run-time
subscript. Consequently, it would be fairly natural to allow expand
to operate
on arrays as well as tuples, and even to allow arrays of types to be treated as
tuple types (in the same way that tuples of types can be treated as tuple
types).
This functionality is omitted from the current proposal because we have no motivating use cases, but it could be added as an extension. Note that there are important motivating use cases under some of the alternatives considered below.
Rather than having packs be distinguished by their names, we could instead
distinguish them by their types. For example, under the current proposal, the
signature of Zip
is:
fn Zip[... each ElementType:! type]
(... each vector: Vector(each ElementType))
-> Vector((... each ElementType));
With this alternative, it could instead be written:
fn Zip[ElementTypes:! [type;]]
(... vectors: Vector(expand ElementTypes))
-> Vector((... expand ElementTypes));
This employs several features not in the primary proposal:
- In cases where the declared type of the each-name does not vary across
iterations (like
ElementType
), we can re-express it as an array binding ifexpand
supports arrays, and ifexpand
is a stand-alone operator. Note that we only need this in type position of a binding pattern, where we could more easily restrictexpand
to avoid the problems discussed earlier. - In cases where the declared type of the binding does vary, that fact alone
implies that the binding refers to a pack, so we can effectively infer the
presence of
each
from the type, rather than make the user spell it out explicitly.
This slight change in syntax belies a much larger shift in the underlying
semantics: since these are ordinary bindings, a given call to Zip
must bind
each of them to a single value that represents the whole sequence of arguments
(which is why their names are now plural). In the case of ElementTypes
, that
follows straightforwardly from its type: it represents the argument types as an
array of type
s. The situation with vectors
is more subtle: we have to
interpret Vector(expand ElementTypes)
as the type of the whole sequence of
argument values, rather than as a generic description of the type of a single
argument. In other words, we have to interpret it as a pack type, and that means
vectors
notionally binds to a run-time pack value.
Consequently, when vectors
is used in the function body, it doesn't need an
each
prefix: we've chosen to express variadicity in terms of types, and it
already has a pack type, so it can be directly used as an expansion site.
This approach has a few advantages:
- We don't have to introduce the potentially-confusing concept of a binding that binds to multiple values simultaneously.
- It avoids the anomaly where we have pack types in the type system, but no actual values of those types.
- Removing the
each
keyword makes it more natural to spellexpand
as a symbolic token (earlier versions of this proposal used[:]
), which is more concise and doesn't need surrounding whitespace. - For fully homogeneous variadics (such as
SumInts
andMin
) it's actually possible to write the function body as an ordinary loop with no variadics, by expressing the signature in terms of a non-pack binding with an array type.
However, it also has some major disadvantages:
- The implicit expansion of pack-type bindings hurts readability. For example,
it's easy to overlook the fact that the loop condition
while (...and expand iters != vectors.End())
inZip
has two expansion sites, not just one. This problem is especially acute in cases where a non-local name has a pack type. - We have to forbid template-dependent names from having pack types (see leads issue #1162), because the possibility that an expression might be an expansion site in some instantiations but not others would cause serious readability and implementability issues.
- A given use of such a binding really represents a single value at a time, in the same way that the iteration variable of a for-each loop does, so giving the binding a plural name and a pack type creates confusion in that context rather than alleviating it.
It's also worth noting that we may eventually want to introduce operations that
treat the sequence of bound values as a unit, such as to determine the length of
the sequence (like sizeof...
in C++), or even to index into it. This approach
might seem more amenable to that, because it conceptually treats the sequence of
values as a value in itself, which could have its own operations. However, this
approach leaves no "room" in the syntax to spell those operations, because any
mention of a pack-type binding implicitly refers to one of its elements.
Conversely, the status quo proposal seems to leave a clear syntactic opening for
those operations: you can refer to the sequence as a whole by omitting each
,
so each vector.Size()
refers to the size of the current iteration's vector
,
whereas vector.Size()
could refer to the size of the sequence of bound values.
However, this could easily turn out to be a "wrong default": omitting each
seems easy to do by accident, and easy to misread during code review.
There are other solutions to this problem that work equally well with the status
quo or this alternative. In particular, it's already possible to express these
operations outside of a pack expansion by converting to a tuple, as in
(... each vector).Size()
(status quo) or (... vectors).Size()
(this
alternative). That may be sufficient to address those use cases, especially if
we relax the restrictions on nesting pack expansions. Failing that,
variadic-only spellings for these operations (like sizeof...
in C++) would
also work with both approaches. So this issue does not seem like an important
differentiator between the two approaches.
As a variant of the above approach, it's possible to omit both each-names and
pack-type bindings, and instead rely on variadic tuple-type bindings. For
example, the signature of Zip
could instead be:
fn Zip[ElementTypes:! [type;]]
(... expand vectors: (... Vector(expand ElementTypes)))
-> Vector((... expand ElementTypes));
This signature doesn't change the callsite semantics, but within the function
body vectors
will be a tuple rather than a pack. This avoids or mitigates all
of the major disadvantages of pack-type bindings, but it comes at a substantial
cost: the function signature is substantially more complex and opaque. That
seems likely to be a bad tradeoff -- the disadvantages of pack-type bindings
mostly concern the function body, but readability of variadic function
signatures seems much more important than readability of variadic function
bodies, because the signatures will be read far more often, and by programmers
who have less familiarity with variadics.
This approach requires us to relax the ban on nested pack expansions. This does
create some risk of confusion about which pack expansion a given expand
belongs to, but probably much less than if we allowed unrestricted nesting.
The leads chose not to pursue this approach in leads issue #1162.
We could generalize the ...and
and ...or
syntax to support a wider variety
of binary operators, and to permit specifying an initial value for the chain of
binary operators, as with C++'s
fold expressions. This would
be more consistent with C++, and would give users more control over
associativity and over the behavior of the arity-zero case.
However, fold expressions are arguably too general in some respects: folding
over a non-commutative operator like -
is more likely to be confusing than to
be useful. Similarly, there are few if any plausible use cases for customizing
the arity-zero behavior of and
or or
. Conversely, fold expressions are
arguably not general enough in other respects, because they only support folding
over a fixed set of operators, not over functions or compound expressions.
Furthermore, in order to support folds over operator tokens that can be either
binary or prefix-unary (such as *
), we would need to choose a different syntax
for tuple element lists. Otherwise, ...*each foo
would be ambiguous between
*foo[:0:], *foo[:1:],
etc. and foo[:0:] * foo[:1:] *
etc.
Note that even if Carbon supported more general C++-like fold expressions, we
would still probably have to give and
and or
special-case treatment, because
they are short-circuiting.
As a point of comparison, C++ fold expressions give special-case treatment to
the same two operators, along with ,
: they are the only ones where the initial
value can be omitted (such as ... && args
rather than true && ... && args
)
even if the pack may be empty. Furthermore, folding over &&
appears to have
been the original motivation for adding fold expressions to C++; it's not clear
if there are important motivating use cases for the other operators.
Given that we are only supporting a minimal set of operators, allowing ...
to
occur in ordinary binary syntax has few advantages and several drawbacks:
- It might conflict with a future general fold facility.
- It would invite users to try other operators, and would probably give less clear errors if they do.
- It would substantially complicate parsing and the AST.
- It would force users to make a meaningless choice between
x or ...
and... or x
, and likewise forand
.
See also the discussion below of using ...,
and ...;
in
place of the tuple and statement forms of ...
. This is inspired by fold
expressions, but distinct from them, because ,
and ;
are not truly binary
operators, and it's targeting a different problem.
As currently proposed, we allow multiple ...
expressions within a tuple
literal expression, but only allow one ...
pattern within a tuple pattern. It
is superficially tempting to relax this restriction, but fundamentally
infeasible.
Allowing multiple ...
patterns would create a potential for ambiguity about
where their scrutinees begin and end. For example, given a signature like
fn F(... each xs: i32, ... each ys: i32)
, there is no way to tell where xs
ends and ys
begins in the argument list; every choice is equally valid. That
ambiguity can be avoided if the types are different, but that would make type
non-equality a load-bearing part of the pattern. That's a very unusual thing
to need to reason about in the type system, so it's liable to be a source of
surprise and confusion for programmers, and in particular it looks difficult if
not impossible to usefully express with generic types, which would greatly limit
the usefulness of such a feature.
Function authors can straightforwardly work around this restriction by adding
delimiters. For example, the current design disallows
fn F(... each xs: i32, ... each ys: i32)
, but it allows
fn F((... each xs: i32), (... each ys: i32))
, which is not only easier to
support, but makes the callsite safer and more readable, since the boundary
between the xs
and ys
arguments is explicitly marked. By contrast, if we
disallowed multiple ...
expressions in a function argument list, function
callers who ran into that restriction would often find it difficult or
impossible to work around. Note, however, that this workaround presupposes that
function signatures can have bindings below top-level, which is
currently undecided.
To take a more abstract view of this situation: when we reuse expression syntax
as pattern syntax, we are effectively inverting expression evaluation, by asking
the language to find the operands that would cause an expression to evaluate to
a given value. That's only possible if the operations involved are invertible,
meaning that they do not lose information. When a tuple literal contains
multiple ...
expressions, evaluating it effectively discards structural
information about for example where xs
ends and ys
begins. The operation of
forming a tuple from multiple packs is not invertible, and consequently we
cannot use it as a pattern operation. Our rule effectively says that if the
function needs that structural information, it must ask the caller to provide
it, rather than asking the compiler to infer it.
Earlier versions of this design allowed pack expansions to contain other pack expansions. This is in some ways a natural generalization, but it added nontrivial complexity to the design. In particular, when an each-name is lexically within two or more pack expansions, we need a rule for determining which pack expansion iterates over it, in a way that is unsurprising and supports the intended use cases. However, we have few if any motivating use cases for it, which made it difficult to evaluate that aspect of the design. Consequently, this proposal does not support nested pack expansions, although it tries to avoid ruling them out as a future extension.
...
is a postfix operator in C++, which aligns with the natural-language use
of "…", so it would be more consistent with both if ...
, ...and
, and ...or
were postfix operators spelled ...
, and...
, and or...
, and likewise if
statement pack expansions were marked by a ...
at the end rather than the
beginning.
However, prefix syntaxes are usually easier to parse (particularly for humans),
because they ensure that by the time you start parsing an utterance, you already
know the context in which it is used. This is clearest in the case of
statements: the reader might have to read an arbitrary amount of code in the
block before realizing that the code they've been reading will be executed
variadically, so that seems out of the question. The cases of and
, or
, and
,
are less clear-cut, but we have chosen to make them all prefix operators for
consistency with statements.
This proposal "overloads" the ...
token with multiple different meanings
(including different precedences), and the meaning depends in part on the
surrounding context, despite Carbon's principle of
avoiding context-sensitivity.
We could instead represent the different meanings using separate syntaxes.
There are several variants of this approach, but they all have substantial drawbacks (see the following subsections). Furthermore, the problems associated with context-sensitivity appear to be fairly mild in this case: the difference between a tuple literal context and a statement context is usually quite local, and is usually so fundamental that confusion seems unlikely.
We could use a modifier after ...
to select the expansion's meaning (as we
already do with and
and or
). In particular, we could write ...,
to
iteratively form elements of a tuple, and write ...;
to iteratively execute a
statement. This avoids context-sensitivity (apart from ...,
having a dual role
in expressions and patterns, like many other syntaxes), and has an underlying
unity: ...,
, ...;
...and
, and ...or
represent "folds" over the ,
, ;
,
and
, and or
tokens, respectively. As a side benefit, this would preserve the
property that a tuple literal always contains a ,
character (unlike the
current proposal).
However, this approach has major readability problems. Using ...;
as a prefix
operator is completely at odds with the fact that ;
marks the end of a
statement, not the beginning. Furthermore, it would probably be surprising to
use ...;
in contexts where ;
is not needed, because the end of the statement
is marked with }
.
The problems with ...,
are less severe, but still substantial. In this syntax
,
does not behave like a separator, but our eyes are trained to read it as
one, and that habit is difficult to unlearn. For example, most readers have
found that they can't help automatically reading (..., each x)
as having two
sub-expressions, ...
and each x
. This effect is particularly disruptive when
skimming a larger body of code, such as:
fn TupleConcat[..., each T1: type, ..., each T2: type](
t1: (..., each T1), t2: (..., each T2)) -> (..., each T1, ..., each T2) {
return (..., expand t1, ..., expand t2);
}
We could replace the statement form of ...
with a variadic block syntax such
as ...{ }
. However, this doesn't give us an alternative for the tuple form of
...
, and yet heightens the problems with it: ...{
could read as as applying
the ...
operator to a struct literal.
Furthermore, it gives us no way to variadically declare a variable that's
visible outside the expansion (such as each iter
in the Zip
example). This
can be worked around by declaring those variables as tuples, but this adds
unnecessary complexity to the code.
We could drop ...
altogether, and use a separate keyword for each kind of pack
expansion. For example, we could use repeat
for variadic lists of tuple
elements, do_repeat
for variadic statements, and all_of
and any_of
in
place of ...and
and ...or
. This leads to code like:
// Takes an arbitrary number of vectors with arbitrary element types, and
// returns a vector of tuples where the i'th element of the vector is
// a tuple of the i'th elements of the input vectors.
fn Zip[repeat each ElementType:! type]
(repeat each vector: Vector(each ElementType))
-> Vector((repeat each ElementType)) {
do_repeat var each iter: auto = each vector.Begin();
var result: Vector((repeat each ElementType));
while (all_of each iter != each vector.End()) {
result.push_back((repeat each iter));
repeat each iter++;
}
return result;
}
This approach is heavily influenced by
Swift variadics,
but not quite the same. It has some major advantages: the keywords are more
consistent with each
(and expand
to some extent), substantially less
visually noisy than ...
, and they may also be more self-explanatory. However,
it does have some substantial drawbacks.
Most notably, there is no longer any syntactic commonality between the different
tokens that mark the root of an expansion. That makes it harder to visually
identify expansions, and could also make variadics harder to learn, because the
spelling does not act as a mnemonic cue. And while it's already not ideal that
under the primary proposal a tuple literal is identified by the presence of
either ,
or ...
, it seems even worse if one of those two tokens is instead a
keyword.
Relatedly, the keywords have less clear precedence relationships, because
all_of
and any_of
can't as easily "borrow" their precedence from their
non-variadic counterparts. For example, consider this line from Zip
:
while (...and each iter != each vector.End()) {
Under this alternative, that becomes:
while (all_of each iter != each vector.End()) {
I find the precedence relationships in the initial all_of expand iters !=
more
opaque than in ...and expand iters !=
, to the extent that we might need to
require additional parentheses:
while (all_of (expand iters != each vectors.End())) {
That avoids outright ambiguity, but obliging readers to maintain a mental stack of parentheses in order to parse the expression creates its own readability problems.
It's appealing that the repeat
keyword combines with each
to produce code
that's almost readable as English, but it creates a temptation to read expand
the same way, which will usually be misleading. For example, repeat expand foo
sounds like it is repeatedly expanding foo
, but in fact it expands it only
once. It's possible that a different spelling of expand
could avoid that
problem, but I haven't been able to find one that does so while also avoiding
the potential for confusion with each
. This is somewhat mitigated by the fact
that expand
expressions are likely to be rare.
It's somewhat awkward, and potentially even confusing, to use an imperative word
like repeat
in a pattern context. By design, the pattern language is
descriptive rather than imperative: it describes the values that match rather
than giving instructions for how to match them. As a result, in a pattern like
(repeat each param: i64)
, it's not clear what action is being repeated.
Finally, it bears mentioning that the keywords occupy lexical space that could
otherwise be used for identifiers. Notably, all_of
, any_of
, and repeat
are
all names of functions in the C++ standard library. This is not a fundamental
problem, because we expect Carbon to have some way of "quoting" a keyword for
use as an identifier (such as Rust's
raw identifiers),
but it is likely to be a source of friction.
We could give each
a lower precedence, so that expressions such as
each vector.End()
would need to be written as (each vector).End()
. This
could make the code clearer for readers, especially if they are new to Carbon
variadics. However, this would make the code visually busier, and might give the
misleading impression that each
can be applied to anything other than an
identifier. I propose that we wait and see whether the unparenthesized syntax
has readability problems in practice, before attempting to solve those problems.
We have discussed a more general solution to this kind of problem, where a
prefix operator could be embedded in a ->
token, in order to apply the prefix
operator to the left-hand operand without needing parentheses. However, this
approach is much more appealing when the prefix operator is a symbolic token:
x-?>y
may be a plausible alternative to (?x).y
, but x-each>y
seems much
harder to visually parse. Furthermore, this approach is hard to reconcile with
treating each
as fundamentally part of the name, rather than an operator
applied to the name.
Instead of treating ...and
and ...or
as two tokens with whitespace
discouraged between them, we could treat them as single tokens. This might more
accurately reflect the fact that they are semantically different operations than
...
, and reduce the potential for readability problems in code that doesn't
follow our recommended whitespace conventions. However, that could lead to a
worse user experience if users accidentally insert a space after the ...
.
Under the current proposal, the compiler attempts to merge function parameters
in order to support use cases like this one, where merging the parameters of
Min
enables us to pair each argument with a single logical parameter that will
match it:
fn Min[T:! type](first: T, ... each next: T) -> T;
fn F(... each arg: i32) {
Min(... each arg, 0 as i32);
}
However, this approach makes typechecking hard to understand (and predict), because the complex conditions governing merging mean that subtle differences in the code can cause dramatic differences in the semantics. For example:
fn F[A:! I, ... each B:! I](a: A, ... each b: each B);
fn G[A:! I, ... each B:! I](a: A, ... each b: each B) -> A;
These two function signatures are identical other than their return types, but
they actually have different requirements on their arguments: G
requires the
first argument to be singular, whereas F
only requires some argument to be
singular. It seems likely to be hard to teach programmers that the function's
return type sometimes affects whether a given argument list is valid. Relatedly,
it's hard to see how a diagnostic could concisely explain why a given call to
G
is invalid, in a way that doesn't seem to also apply to F
.
We could solve that problem by omitting parameter merging, and interpreting all of the above signatures as requiring that the first argument must be singular, because the first parameter is singular. Thus, there would be a clear and predictable connection between the parameter list and the requirements on the argument list.
In order to support use cases like Min
where the author doesn't intend to
impose such a requirement, we would need to provide some syntax for declaring
Min
so that it has a single parameter, but can't be called with no arguments.
More generally, this syntax would probably need to support setting an arbitrary
minimum number of arguments, not just 1. For example, an earlier version of this
proposal used each(>=N)
to require that a parameter match at least N
arguments, so Min
could be written like this:
fn Min[T:! type](... each(>=1) param: T) -> T;
However, this alternative has several drawbacks:
- We haven't been able to find a satisfactory arity-constraint syntax. In
addition to its aesthetic problems,
each(>=1) param
disrupts the mental model whereeach
is part of the name, and it's conceptually awkward because the constraint actually applies to the pack expansion as a whole, not to the each-name in particular. However, it's even harder to find an arity-constraint syntax that could attach to...
without creating ambiguity. Furthermore, any arity-constraint syntax would be an additional syntax that users need to learn, and an additional choice they need to make when writing a function signature. - Ideally, generic code should typecheck if every possible monomorphization of
it would typecheck. This alternative does not live up to that principle --
see, for example, the above example of
Min
. The current design does not fully achieve that aspiration either, but it's far more difficult to find plausible examples where it fails. - The first/rest style will probably be more natural to programmers coming from C++, and if they define APIs in that style, there isn't any plausible way for them to find out that they're imposing an unwanted constraint on callers, until someone actually tries to make a call with the wrong shape.
The current proposal uses merging and splitting to try to align the argument and parameter lists so that each argument has exactly one parameter than can match it. We also plan to extend this design to also try the opposite approach, aligning them so that each parameter has exactly one argument that it can match. However, it isn't always possible to align arguments and parameters in that way. For example:
fn F[... each T:! type](x: i32, ... each y: each T);
fn G(... each z: i32) {
F(... each z, 0 as i16);
}
Every possible monomorphization of this code would typecheck, but we can't merge the parameters because they have different types, and we can't merge the arguments for the same reason. We also can't split the variadic parameter or the variadic argument, because either of them could be empty.
The fundamental problem is that, although every possible monomorphization
typechecks, some monomorphizations are structurally different from others. For
example, if each z
is empty, the monomorphized code converts 0 as i16
to
i32
, but otherwise 0 as i16
is passed into F
unmodified.
We could support such use cases by determining which parameters can potentially match which arguments, and then typechecking each pair. For example, we could typecheck the above code by cases:
- If
each z
is empty,x: i32
matches0 as i16
(which typechecks becausei16
is convertible toi32
), andeach y: each T
matches nothing. - If
each z
is not empty,x: i32
matches its first element (which typechecks becausei32
is convertible toi32
), andeach y: each T
matches the remaining elements ofeach z
, followed by0 as i16
(which typechecks by bindingeach T
to⟬«i32; ‖each z‖-1», i16⟭
).
More generally, this approach works by identifying all of the structurally different ways that arguments could match parameters, typechecking them all in parallel, and then combining the results with logical "and".
However, the number of such cases (and hence the cost of typechecking) grows quadratically, because the number of cases grows with the number of parameters, and the case analysis has to be repeated for each variadic argument. Fast development cycles are a priority for Carbon, so if at all possible we want to avoid situations where compilation costs grow faster than linearly with the amount of code.
Furthermore, typechecking a function call doesn't merely need to output a boolean decision about whether the code typechecks. In order to typecheck the code that uses the call, and support subsequent phases of compilation, it needs to also output the type of the call expression, and that can depend on the values of deduced parameters of the function.
These more complex outputs make it much harder to combine the results of typechecking the separate cases. To do this in a general way, we would need to incorporate some form of case branching directly into the type system. For example:
fn P[T:! I, ... each U:! J](t: T, ... each u: each U) -> T;
fn Q[X:! I&J, ... each Y:! I&J](x: X, ... each y: each Y) -> auto {
return P(... each y, x);
}
fn R[A:! I&J ... each B:! I&J](a: A, ... each b: each B) {
Q(... each b, a);
}
The typechecker would need to represent the type of P(... each x, y)
as
something like (... each Y, X).0
. That subscript .0
acts as a disguised form
of case branching, because now any subsequent code that depends on
P(... each y, x)
needs to be typechecked separately for the cases where
... each Y
is and is not empty. In this case, that even leaks back into the
caller R
through Q
's return type, which compounds the complexity: the type
of Q(... each b, a)
would need to be something like
((... each B, A).(1..‖each B‖), (... each B, A).0).0
(where .(M..N)
is a
hypothetical tuple slice notation).
All of this may be feasible, but the cost in type system complexity and performance would be daunting, and the benefits are at best unclear, because we have not yet found plausible motivating use cases that benefit from this kind of typechecking.