Add unstable frontmatter support #137193

epage · 2025-02-17T22:27:45Z

This is an implementation for #136889

This strips the frontmatter, like shebang, because nothing within rustc will be using this. rustfmt folks have said this should be good enough for now as rustfmt should successfully work with this (ie not remove it) even though proper formatting would require AST support. I had considered making this part of lexing of tokens but that led to problems with ambiguous tokens. See also zulip.

This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.

This includes supporting the syntax in

the parser
pretty printing
rustfmt (nop, just needed a test)

This leaves to the future

r-a support (called out in the tracking issue)

rustbot · 2025-02-17T22:27:52Z

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

epage · 2025-02-17T22:33:11Z

compiler/rustc_parse/src/lexer/mod.rs

@@ -55,6 +55,16 @@ pub(crate) fn lex_token_trees<'psess, 'src>(
        start_pos = start_pos + BytePos::from_usize(shebang_len);
    }

+    // Skip frontmatter, if present.
+    if let Some(frontmatter_len) = rustc_lexer::strip_frontmatter(src) {


Other places we call strip_shebang that I have not (yet?) updated

src/tools/rust-analyzer/crates/parser/src/lexed_str.rs

compiler/rustc_ast_pretty/src/pprust/state.rs

This is in part because I'm not too sure what the appropriate way of doing it is.

Other places we call strip_shebang that I have not (yet?) updated [emphasis: mine]

Both of these aren't blocking this PR, that's for sure.

compiler/rustc_ast_pretty/src/pprust/state.rs

Disclaimer: I'm not 100% sure what would happen if you don't strip the frontmatter (e.g., whether it can lead to ICEs or only butchered output) but we should definitely do so (at some point) because otherwise the AST pretty-printer would reinterpret the frontmatter as Rust code when collecting Rust source code comments (which it tries to "reattach" later on, maybe? I'm not quite sure).

I mean you could simply check what happens under -Zunpretty=normal or -Zunpretty=expanded given a frontmatter that contains e.g., // ….

src/tools/rust-analyzer/crates/parser/src/lexed_str.rs

Note: I don't speak for the rust-analyzer team. However, I know that they prefer that one doesn't modify rust-analyzer from within this repository (via). So I recommend you to either open an issue or a PR over at rust-lang/rust-analyzer for frontmatter support.

src/tools/rust-analyzer/crates/parser/src/lexed_str.rs

I've made a separate line item on the tracking issue for r-a's implementation

compiler/rustc_ast_pretty/src/pprust/state.rs

Knowing what to look for in tests, I've gone ahead and add this to this PR, updating the initial test commit and adding a commit specifically for pretty support.

ytmimi · 2025-02-18T04:23:30Z

Might be worth adding a test case to src/tools/rustfmt/tests/target just to show that rustfmt won't strip the frontmatter.

rustbot · 2025-02-18T16:59:29Z

Some changes occurred in src/tools/rustfmt

cc @rust-lang/rustfmt

compiler/rustc_parse/src/lexer/mod.rs

@@ -55,6 +55,16 @@ pub(crate) fn lex_token_trees<'psess, 'src>(
        start_pos = start_pos + BytePos::from_usize(shebang_len);
    }

+    // Skip frontmatter, if present.
+    if let Some(frontmatter_len) = rustc_lexer::strip_frontmatter(src) {


fee1-dead · 2025-02-22T03:42:37Z

compiler/rustc_lexer/src/lib.rs

+/// Frontmatter is a special attribute type reserved for use by external tools
+///
+/// This must be called after [`strip_shebang`]
+pub fn strip_frontmatter(input: &str) -> Option<usize> {


Instead of stripping the frontmatter and returning an Option (None when the frontmatter is invalid), we should make a best effort to parse it first. For example, we should provide a customized error message if we detect the start of frontmatter but it was never properly closed.

I spoke to this in the PR description:

This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.

Frontmatter is an attribute for calling tools to consume. This means that if the frontmatter is invalid, they will report the error before rustc ever sees it. Rather than investing a lot of effort in quality error messages here that almost no one will see, I figured it would be better to put that focus in Cargo where they will be seen.

I have started a discussion on zulip about making cargo's code reusable. In theory, we could share between rustc and Cargo. However, I worry about the release logistics involved to keep us in-sync.

What I felt was important is that rustc at least error in the different cases, even if the message isn't the greatest. We can then use the tests verifying that to track any improvements to the error and prevent accidentally making them success cases without further work (which I'm assuming is unlikely to happen).

Could you speak to your concerns with that plan or why better error reporting is a blocker for this PR?

@fee1-dead from #137193 (review)

So the way I currently see this, is that there is a natural way of splitting these two. I think cargo should only parse valid frontmatters (in terms of the fence syntax) and should provide detailed errors that are invalid within a frontmatter (unknown fields and etc.)

For rustc, because we do recognize frontmatters and attempt to strip it, we should tell users what went wrong. For example, if there are only two -s, then that's an invalid frontmatter start and we should tell them that. If the infostring is not an identifier, we should tell the users that. If the frontmatter does not have a valid close, then we should tell the users that.

...

I'm not sure how you currently see this responsibility being split/duplicated across these two projects, but it is slightly irresponsible to me to just assume that because cargo will do the rest of the job instead, we land this code which doesn't properly consider errors. cargo isn't the primary Rust code parser, so it shouldn't be the one responsible for parser errors. If it can't parse the front matter, it should emit an error, give the rest to rustc and let rustc report good errors.

So, I'm opposed to landing this in this way. Please take some time to work on good errors, which will not be wasted. (if we do decide to put those in cargo instead, you can pretty much still reuse the code) I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard. Feel free to comment here or chat me on Zulip if you disagree.

(moving back to the original thread for continity

So if I'm understanding correctly, you are suggesting to flip this and have cargo ignore the frontmatter syntax if its invalid, and have rustc do the full error report.

My concern is that having rustc report the error is too late in the process. For example, cargo has many commands that operate on manifests-only (e.g. cargo metadata) and that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error. This experience could be even worse for tools like cargo add. We'd need to at least implement go a step further and at least report there was an error (and non-mutating commands could turn that into None so rustc's error isn't masked).

if we do decide to put those in cargo instead, you can pretty much still reuse the code

Pulling in rustc-lexer would be complicated. We publish cargo and would need something that can be published. We'd need to work with T-release to deal with weird ping ponging between our repos as we transition from development to release. We'd also need to work through how to integrate specific assumptions rustc makes to something that cargo can use. This is all then limited to just cargo and not other consumers. We could copy and paste the code but that also doesn't buy us much.

that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error

That's also true. But I don't want people to call rustc and get confusing errors. Perhaps that can be figured out later in the process.

fee1-dead · 2025-02-27T04:40:40Z

compiler/rustc_lexer/src/lib.rs

+        return None;
+    }
+    let (fence_pattern, rest) = rest.split_at(fence_end);
+    let (info, rest) = rest.split_once("\n").unwrap_or((rest, ""));


How is a frontmatter like ---cargo--- going to be parsed? or ---cargo ---?

It looks like they aren't considered to be valid. Consider making that a bit clearer (maybe here? to say that if there is no newline it will be invalid) and add tests

Everything after the 3+ - is the infostring, including any dashes that appear after something else. We check if its an identifier on the following lines which will reject that. We have a test case for invalid identifiers.

Or is the concern more with

strip_frontmatter("---")

(ie a test case where there is no newline after the opening)

compiler/rustc_lexer/src/lib.rs

bors · 2025-03-04T03:15:21Z

☔ The latest upstream changes (presumably #137927) made this pull request unmergeable. Please resolve the merge conflicts.

fee1-dead

So the way I currently see this, is that there is a natural way of splitting these two. I think cargo should only parse valid frontmatters (in terms of the fence syntax) and should provide detailed errors that are invalid within a frontmatter (unknown fields and etc.)

For rustc, because we do recognize frontmatters and attempt to strip it, we should tell users what went wrong. For example, if there are only two -s, then that's an invalid frontmatter start and we should tell them that. If the infostring is not an identifier, we should tell the users that. If the frontmatter does not have a valid close, then we should tell the users that.

Furthermore, we should warn if we are not called from cargo, which should acknowledge to us (probably via some flag) that the frontmatter has been parsed correctly.

I'm not sure how you currently see this responsibility being split/duplicated across these two projects, but it is slightly irresponsible to me to just assume that because cargo will do the rest of the job instead, we land this code which doesn't properly consider errors. cargo isn't the primary Rust code parser, so it shouldn't be the one responsible for parser errors. If it can't parse the front matter, it should emit an error, give the rest to rustc and let rustc report good errors.

So, I'm opposed to landing this in this way. Please take some time to work on good errors, which will not be wasted. (if we do decide to put those in cargo instead, you can pretty much still reuse the code) I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard. Feel free to comment here or chat me on Zulip if you disagree.

epage · 2025-03-06T15:21:20Z

So, I'm opposed to landing this in this way.

This is being added as an unstable feature. Does merging of the current implementation need to be blocked on those, e.g. negatively affecting stable code in some way that I missed?

Leaving this as something to resolve before stabilization on the tracking issue means we unblock me from updating rust-analyzer, checking some more cargo fmt and cargo doc cases, updating cargo to use this, etc. Those and continuing to discuss this can happen in parallel. Otherwise, I fear how long this will drag on as we work through rustc and cargo's requirements or re-discuss topics from the RFC while having multi-day lag between responses, after which I'll be able to start on those other tasks.

epage · 2025-03-06T15:31:55Z

compiler/rustc_parse/src/lexer/mod.rs

+        start_pos = start_pos + BytePos::from_usize(frontmatter_len);
+        let hi = start_pos;
+        let span = Span::with_root_ctxt(lo, hi);
+        psess.gated_spans.gate(sym::frontmatter, span);


@fee1-dead from #137193 (review)

Furthermore, we should warn if we are not called from cargo, which should acknowledge to us (probably via some flag) that the frontmatter has been parsed correctly.

(moving to a thread to better follow the discussion)

Through the RFC process, T-lang insisted this not be Cargo specific, see RFC language.

That still leaves room for the calling tool to be able to register their frontmatter infostring so rustc could warn otherwise. However, we'd first need to determine if this is needed, if there are other problems with doing this, what other cases to handle, etc. For example, if a tool registers an infostring, should that only be for the root module and disallowed otherwise? The problem with that is that this was designed around allowing people to turn random files into scripts, so they need to be allowed further in. So maybe we validate those are from the calling tool. Except do they need to be? So that then leaves we validate the tool's infostring against the actual infostring in the root module. Except any deviation from a sub-module to the root module can happen just as well with the root module because you could have a non-script have a frontmatter for use as a script in whatever other tool you want.

epage · 2025-03-06T16:09:48Z

I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard

Sorry, I overlooked this before.

We have a system in place for this kind of stuff: tracking issues, stabilization reports, and fcp concerns. I have added the error handling to #136889 as an unresolved question which will block stabilization.

fee1-dead

I've taken a closer look at the implementation given the latest comments. Had some nits and I'd be fine with landing this temporarily after the suggestions are applied.

fee1-dead · 2025-03-08T08:53:51Z

compiler/rustc_lexer/src/lib.rs

+/// Frontmatter is a special attribute type reserved for use by external tools
+///
+/// This must be called after [`strip_shebang`]
+pub fn strip_frontmatter(input: &str) -> Option<usize> {


that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error

That's also true. But I don't want people to call rustc and get confusing errors. Perhaps that can be figured out later in the process.

fee1-dead · 2025-03-08T08:59:31Z

compiler/rustc_lexer/src/lib.rs

+    while !rest.is_empty() {
+        let without_spaces = rest.trim_start_matches(|c| is_whitespace(c) && c != '\n');
+        let without_nl = without_spaces.trim_start_matches('\n');
+        if without_nl.len() == rest.len() {
+            // nothing trimmed
+            break;
+        } else if without_nl.len() == without_spaces.len() {
+            // either not a frontmatter or invalid opening
+            return None;
+        }
+        rest = without_nl;
+    }


Probably better here to just trim all the whitespace up front, then either the length is unchanged (starts with no whitespace) or the last trimmed character should be a newline.

Gave that a spin. You can see the squashed change in https://github.com/rust-lang/rust/compare/2501d454739ef26ef52e213fe5b12e94fde3f48c..1e418eeb4c3b0fdf35889295a8036daa4c084ca6

compiler/rustc_lexer/src/lib.rs

fee1-dead · 2025-03-08T09:01:50Z

compiler/rustc_lexer/src/lib.rs

+    // Opens with a line that starts with 3+ `-` followed by an optional identifier
+    const FENCE_CHAR: char = '-';
+    let fence_end =
+        rest.char_indices().find_map(|(i, c)| (c != FENCE_CHAR).then_some(i)).unwrap_or(rest.len());


there's no need to use unwrap_or(rest.len()); here. If the fence does not end, we return None.

With the thought to use ?? We discussed that previously at #137193 (comment)

compiler/rustc_lexer/src/lib.rs

fee1-dead · 2025-03-08T09:05:24Z

compiler/rustc_lexer/src/lib.rs

+    let (fence_pattern, rest) = rest.split_at(fence_end);
+    let (info, rest) = rest.split_once("\n").unwrap_or((rest, ""));
+    let info = info.trim_matches(is_whitespace);
+    if !info.is_empty() && !is_ident(info) {


Suggested change

if !info.is_empty() && !is_ident(info) {

// Either there is no infostring, or the infostring should be a valid indent.

if !(info.is_empty() || is_ident(info)) {

In terms of comments mapping this back to the textual description of this, that is handled an earlier comment. This then ends up just duplicating the logic that immediately follows and I'm not seeing the value of it.

I'm also not sure why the emphasis on pulling out the ! . For me, I'm not seeing the value.

Went ahead and made a slight tweak to the inner comment. Squashed it but its visible in https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c

fee1-dead · 2025-03-08T09:08:44Z

compiler/rustc_lexer/src/lib.rs

+    let rest = if let Some(rest) = rest.strip_prefix(fence_pattern) {
+        rest
+    } else {
+        let nl_fence_pattern = format!("\n{fence_pattern}");
+        let Some(frontmatter_nl) = rest.find(&nl_fence_pattern) else {
+            // frontmatter close is required
+            return None;
+        };
+        &rest[frontmatter_nl + nl_fence_pattern.len()..]
+    };


It might improve the flow here if you use .find('\n') and include the newline in rest, then you just need one single call of .find(&nl_fence_pattern).

I think I get what you were suggesting. Squashed it; see it at https://github.com/rust-lang/rust/compare/1e418eeb4c3b0fdf35889295a8036daa4c084ca6..078a6f68e170033250888604caca3b5ea5a3d4b7

fee1-dead · 2025-03-08T09:08:59Z

compiler/rustc_lexer/src/lib.rs

+    };
+
+    let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));
+    let line = line.trim_matches(is_whitespace);


Suggested change

let line = line.trim_matches(is_whitespace);

let line = line.trim();

(being discussed at #137193 (comment))

fee1-dead · 2025-03-08T09:09:36Z

compiler/rustc_lexer/src/lib.rs

+        &rest[frontmatter_nl + nl_fence_pattern.len()..]
+    };
+
+    let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));


Suggested change

let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));

let (closing_fence_remainder, rest) = rest.split_once("\n").unwrap_or((rest, ""));

For some reason, closing_fence_remainder felt a little off to me. I ended up settling on after_closing_fence. Does that work for you?

Squashed it, see https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c

fee1-dead · 2025-03-08T09:10:05Z

compiler/rustc_lexer/src/lib.rs

+    let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));
+    let line = line.trim_matches(is_whitespace);
+    if !line.is_empty() {
+        // invalid close, even if there are extra `-`s


Suggested change

// invalid close, even if there are extra `-`s

// invalid close if there are extra characters beyond the original fence pattern

extra '-'s is the key part I am trying to emphasize because that is a case that can easily be overlooked, by implementers or users.

Went with a variant that preserves that information. Squashed it, see https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c

rustbot assigned fee1-dead Feb 17, 2025

epage changed the title ~~Add u~~ Add unstable frontmatter support Feb 17, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 17, 2025

epage commented Feb 17, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

epage force-pushed the frontmatter branch from 8d4336e to a5215cf Compare February 18, 2025 15:15

This comment has been minimized.

Sign in to view

epage force-pushed the frontmatter branch from 6c26960 to 06f44ba Compare February 18, 2025 17:09

fmease added the F-frontmatter `#![feature(frontmatter)]` label Feb 19, 2025

fee1-dead requested changes Feb 19, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2025

epage mentioned this pull request Feb 20, 2025

Tracking Issue for frontmatter #136889

Open

6 tasks

epage force-pushed the frontmatter branch 3 times, most recently from ed5aad9 to 64cecab Compare February 20, 2025 16:03

fee1-dead reviewed Feb 22, 2025

View reviewed changes

fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2025

fee1-dead requested changes Feb 27, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 27, 2025

fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 28, 2025

epage force-pushed the frontmatter branch from 64cecab to afc14be Compare March 4, 2025 13:34

m4rch3n1ng mentioned this pull request Mar 5, 2025

support frontmatter syntax tree-sitter/tree-sitter-rust#264

Open

fee1-dead requested changes Mar 6, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 6, 2025

epage commented Mar 6, 2025

View reviewed changes

fee1-dead requested changes Mar 8, 2025

View reviewed changes

epage added 7 commits March 10, 2025 13:46

test: Show frontmatter behavior before impl

76856ca

feat: Add 'frontmatter' unstable feature

58be3dd

feat(lexer): Add frontmatter stripping

be046dc

feat(parse): Strip frontmatter

dfb345c

fix(parse): Feature gate frontmatter

f3b945a

test(fmt): Verify frontmatter is not stripped

c002b34

feat(pretty): Preserve the frontmatter

078a6f6

epage force-pushed the frontmatter branch 4 times, most recently from 1e418ee to 078a6f6 Compare March 10, 2025 19:49

	if !info.is_empty() && !is_ident(info) {
	// Either there is no infostring, or the infostring should be a valid indent.
	if !(info.is_empty() \|\| is_ident(info)) {

	let line = line.trim_matches(is_whitespace);
	let line = line.trim();

	let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));
	let (closing_fence_remainder, rest) = rest.split_once("\n").unwrap_or((rest, ""));

	// invalid close, even if there are extra `-`s
	// invalid close if there are extra characters beyond the original fence pattern

Add unstable frontmatter support #137193

Are you sure you want to change the base?

Add unstable frontmatter support #137193

Conversation

epage commented Feb 17, 2025 • edited Loading

rustbot commented Feb 17, 2025

Choose a reason for hiding this comment

fmease Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

Choose a reason for hiding this comment

This comment has been minimized.

ytmimi commented Feb 18, 2025

This comment has been minimized.

rustbot commented Feb 18, 2025

This comment was marked as off-topic.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Mar 4, 2025

fee1-dead left a comment • edited Loading

Choose a reason for hiding this comment

epage commented Mar 6, 2025

epage Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epage commented Mar 6, 2025

fee1-dead left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epage commented Feb 17, 2025 •

edited

Loading

fmease Feb 18, 2025 •

edited

Loading

fee1-dead left a comment •

edited

Loading

epage Mar 6, 2025 •

edited

Loading