Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unstable frontmatter support #137193

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

epage
Copy link
Contributor

@epage epage commented Feb 17, 2025

This is an implementation for #136889

This strips the frontmatter, like shebang, because nothing within rustc will be using this. rustfmt folks have said this should be good enough for now as rustfmt should successfully work with this (ie not remove it) even though proper formatting would require AST support. I had considered making this part of lexing of tokens but that led to problems with ambiguous tokens. See also zulip.

This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.

This includes supporting the syntax in

  • the parser
  • pretty printing
  • rustfmt (nop, just needed a test)

This leaves to the future

  • r-a support (called out in the tracking issue)

@rustbot
Copy link
Collaborator

rustbot commented Feb 17, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@epage epage changed the title Add u Add unstable frontmatter support Feb 17, 2025
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 17, 2025
@@ -55,6 +55,16 @@ pub(crate) fn lex_token_trees<'psess, 'src>(
start_pos = start_pos + BytePos::from_usize(shebang_len);
}

// Skip frontmatter, if present.
if let Some(frontmatter_len) = rustc_lexer::strip_frontmatter(src) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other places we call strip_shebang that I have not (yet?) updated

  • src/tools/rust-analyzer/crates/parser/src/lexed_str.rs
  • compiler/rustc_ast_pretty/src/pprust/state.rs

This is in part because I'm not too sure what the appropriate way of doing it is.

Copy link
Member

@fmease fmease Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other places we call strip_shebang that I have not (yet?) updated [emphasis: mine]

Both of these aren't blocking this PR, that's for sure.

compiler/rustc_ast_pretty/src/pprust/state.rs

Disclaimer: I'm not 100% sure what would happen if you don't strip the frontmatter (e.g., whether it can lead to ICEs or only butchered output) but we should definitely do so (at some point) because otherwise the AST pretty-printer would reinterpret the frontmatter as Rust code when collecting Rust source code comments (which it tries to "reattach" later on, maybe? I'm not quite sure).

I mean you could simply check what happens under -Zunpretty=normal or -Zunpretty=expanded given a frontmatter that contains e.g., // ….

src/tools/rust-analyzer/crates/parser/src/lexed_str.rs

Note: I don't speak for the rust-analyzer team. However, I know that they prefer that one doesn't modify rust-analyzer from within this repository (via). So I recommend you to either open an issue or a PR over at rust-lang/rust-analyzer for frontmatter support.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/tools/rust-analyzer/crates/parser/src/lexed_str.rs

I've made a separate line item on the tracking issue for r-a's implementation

compiler/rustc_ast_pretty/src/pprust/state.rs

Knowing what to look for in tests, I've gone ahead and add this to this PR, updating the initial test commit and adding a commit specifically for pretty support.

@rust-log-analyzer

This comment has been minimized.

@ytmimi
Copy link
Contributor

ytmimi commented Feb 18, 2025

Might be worth adding a test case to src/tools/rustfmt/tests/target just to show that rustfmt won't strip the frontmatter.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Feb 18, 2025

Some changes occurred in src/tools/rustfmt

cc @rust-lang/rustfmt

@fmease fmease added the F-frontmatter `#![feature(frontmatter)]` label Feb 19, 2025
@@ -55,6 +55,16 @@ pub(crate) fn lex_token_trees<'psess, 'src>(
start_pos = start_pos + BytePos::from_usize(shebang_len);
}

// Skip frontmatter, if present.
if let Some(frontmatter_len) = rustc_lexer::strip_frontmatter(src) {

This comment was marked as off-topic.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2025
@epage epage mentioned this pull request Feb 20, 2025
6 tasks
@epage epage force-pushed the frontmatter branch 3 times, most recently from ed5aad9 to 64cecab Compare February 20, 2025 16:03
/// Frontmatter is a special attribute type reserved for use by external tools
///
/// This must be called after [`strip_shebang`]
pub fn strip_frontmatter(input: &str) -> Option<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of stripping the frontmatter and returning an Option (None when the frontmatter is invalid), we should make a best effort to parse it first. For example, we should provide a customized error message if we detect the start of frontmatter but it was never properly closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke to this in the PR description:

This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.

Frontmatter is an attribute for calling tools to consume. This means that if the frontmatter is invalid, they will report the error before rustc ever sees it. Rather than investing a lot of effort in quality error messages here that almost no one will see, I figured it would be better to put that focus in Cargo where they will be seen.

I have started a discussion on zulip about making cargo's code reusable. In theory, we could share between rustc and Cargo. However, I worry about the release logistics involved to keep us in-sync.

What I felt was important is that rustc at least error in the different cases, even if the message isn't the greatest. We can then use the tests verifying that to track any improvements to the error and prevent accidentally making them success cases without further work (which I'm assuming is unlikely to happen).

Could you speak to your concerns with that plan or why better error reporting is a blocker for this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fee1-dead from #137193 (review)

So the way I currently see this, is that there is a natural way of splitting these two. I think cargo should only parse valid frontmatters (in terms of the fence syntax) and should provide detailed errors that are invalid within a frontmatter (unknown fields and etc.)

For rustc, because we do recognize frontmatters and attempt to strip it, we should tell users what went wrong. For example, if there are only two -s, then that's an invalid frontmatter start and we should tell them that. If the infostring is not an identifier, we should tell the users that. If the frontmatter does not have a valid close, then we should tell the users that.

...

I'm not sure how you currently see this responsibility being split/duplicated across these two projects, but it is slightly irresponsible to me to just assume that because cargo will do the rest of the job instead, we land this code which doesn't properly consider errors. cargo isn't the primary Rust code parser, so it shouldn't be the one responsible for parser errors. If it can't parse the front matter, it should emit an error, give the rest to rustc and let rustc report good errors.

So, I'm opposed to landing this in this way. Please take some time to work on good errors, which will not be wasted. (if we do decide to put those in cargo instead, you can pretty much still reuse the code) I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard. Feel free to comment here or chat me on Zulip if you disagree.

(moving back to the original thread for continity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding correctly, you are suggesting to flip this and have cargo ignore the frontmatter syntax if its invalid, and have rustc do the full error report.

My concern is that having rustc report the error is too late in the process. For example, cargo has many commands that operate on manifests-only (e.g. cargo metadata) and that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error. This experience could be even worse for tools like cargo add. We'd need to at least implement go a step further and at least report there was an error (and non-mutating commands could turn that into None so rustc's error isn't masked).

if we do decide to put those in cargo instead, you can pretty much still reuse the code

Pulling in rustc-lexer would be complicated. We publish cargo and would need something that can be published. We'd need to work with T-release to deal with weird ping ponging between our repos as we transition from development to release. We'd also need to work through how to integrate specific assumptions rustc makes to something that cargo can use. This is all then limited to just cargo and not other consumers. We could copy and paste the code but that also doesn't buy us much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error

That's also true. But I don't want people to call rustc and get confusing errors. Perhaps that can be figured out later in the process.

@fee1-dead fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2025
return None;
}
let (fence_pattern, rest) = rest.split_at(fence_end);
let (info, rest) = rest.split_once("\n").unwrap_or((rest, ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a frontmatter like ---cargo--- going to be parsed? or ---cargo ---?

It looks like they aren't considered to be valid. Consider making that a bit clearer (maybe here? to say that if there is no newline it will be invalid) and add tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything after the 3+ - is the infostring, including any dashes that appear after something else. We check if its an identifier on the following lines which will reject that. We have a test case for invalid identifiers.

Or is the concern more with

strip_frontmatter("---")

(ie a test case where there is no newline after the opening)

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 27, 2025
@fee1-dead fee1-dead added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 28, 2025
@bors
Copy link
Contributor

bors commented Mar 4, 2025

☔ The latest upstream changes (presumably #137927) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@fee1-dead fee1-dead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the way I currently see this, is that there is a natural way of splitting these two. I think cargo should only parse valid frontmatters (in terms of the fence syntax) and should provide detailed errors that are invalid within a frontmatter (unknown fields and etc.)

For rustc, because we do recognize frontmatters and attempt to strip it, we should tell users what went wrong. For example, if there are only two -s, then that's an invalid frontmatter start and we should tell them that. If the infostring is not an identifier, we should tell the users that. If the frontmatter does not have a valid close, then we should tell the users that.

Furthermore, we should warn if we are not called from cargo, which should acknowledge to us (probably via some flag) that the frontmatter has been parsed correctly.

I'm not sure how you currently see this responsibility being split/duplicated across these two projects, but it is slightly irresponsible to me to just assume that because cargo will do the rest of the job instead, we land this code which doesn't properly consider errors. cargo isn't the primary Rust code parser, so it shouldn't be the one responsible for parser errors. If it can't parse the front matter, it should emit an error, give the rest to rustc and let rustc report good errors.

So, I'm opposed to landing this in this way. Please take some time to work on good errors, which will not be wasted. (if we do decide to put those in cargo instead, you can pretty much still reuse the code) I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard. Feel free to comment here or chat me on Zulip if you disagree.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 6, 2025
@epage
Copy link
Contributor Author

epage commented Mar 6, 2025

So, I'm opposed to landing this in this way.

This is being added as an unstable feature. Does merging of the current implementation need to be blocked on those, e.g. negatively affecting stable code in some way that I missed?

Leaving this as something to resolve before stabilization on the tracking issue means we unblock me from updating rust-analyzer, checking some more cargo fmt and cargo doc cases, updating cargo to use this, etc. Those and continuing to discuss this can happen in parallel. Otherwise, I fear how long this will drag on as we work through rustc and cargo's requirements or re-discuss topics from the RFC while having multi-day lag between responses, after which I'll be able to start on those other tasks.

start_pos = start_pos + BytePos::from_usize(frontmatter_len);
let hi = start_pos;
let span = Span::with_root_ctxt(lo, hi);
psess.gated_spans.gate(sym::frontmatter, span);
Copy link
Contributor Author

@epage epage Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fee1-dead from #137193 (review)

Furthermore, we should warn if we are not called from cargo, which should acknowledge to us (probably via some flag) that the frontmatter has been parsed correctly.

(moving to a thread to better follow the discussion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through the RFC process, T-lang insisted this not be Cargo specific, see RFC language.

That still leaves room for the calling tool to be able to register their frontmatter infostring so rustc could warn otherwise. However, we'd first need to determine if this is needed, if there are other problems with doing this, what other cases to handle, etc. For example, if a tool registers an infostring, should that only be for the root module and disallowed otherwise? The problem with that is that this was designed around allowing people to turn random files into scripts, so they need to be allowed further in. So maybe we validate those are from the calling tool. Except do they need to be? So that then leaves we validate the tool's infostring against the actual infostring in the root module. Except any deviation from a sub-module to the root module can happen just as well with the root module because you could have a non-script have a frontmatter for use as a script in whatever other tool you want.

@epage
Copy link
Contributor Author

epage commented Mar 6, 2025

I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard

Sorry, I overlooked this before.

We have a system in place for this kind of stuff: tracking issues, stabilization reports, and fcp concerns. I have added the error handling to #136889 as an unresolved question which will block stabilization.

Copy link
Member

@fee1-dead fee1-dead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a closer look at the implementation given the latest comments. Had some nits and I'd be fine with landing this temporarily after the suggestions are applied.

/// Frontmatter is a special attribute type reserved for use by external tools
///
/// This must be called after [`strip_shebang`]
pub fn strip_frontmatter(input: &str) -> Option<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error

That's also true. But I don't want people to call rustc and get confusing errors. Perhaps that can be figured out later in the process.

Comment on lines 286 to 292
while !rest.is_empty() {
let without_spaces = rest.trim_start_matches(|c| is_whitespace(c) && c != '\n');
let without_nl = without_spaces.trim_start_matches('\n');
if without_nl.len() == rest.len() {
// nothing trimmed
break;
} else if without_nl.len() == without_spaces.len() {
// either not a frontmatter or invalid opening
return None;
}
rest = without_nl;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better here to just trim all the whitespace up front, then either the length is unchanged (starts with no whitespace) or the last trimmed character should be a newline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Opens with a line that starts with 3+ `-` followed by an optional identifier
const FENCE_CHAR: char = '-';
let fence_end =
rest.char_indices().find_map(|(i, c)| (c != FENCE_CHAR).then_some(i)).unwrap_or(rest.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to use unwrap_or(rest.len()); here. If the fence does not end, we return None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the thought to use ?? We discussed that previously at #137193 (comment)

let (fence_pattern, rest) = rest.split_at(fence_end);
let (info, rest) = rest.split_once("\n").unwrap_or((rest, ""));
let info = info.trim_matches(is_whitespace);
if !info.is_empty() && !is_ident(info) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !info.is_empty() && !is_ident(info) {
// Either there is no infostring, or the infostring should be a valid indent.
if !(info.is_empty() || is_ident(info)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of comments mapping this back to the textual description of this, that is handled an earlier comment. This then ends up just duplicating the logic that immediately follows and I'm not seeing the value of it.

I'm also not sure why the emphasis on pulling out the ! . For me, I'm not seeing the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and made a slight tweak to the inner comment. Squashed it but its visible in https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c

Comment on lines 316 to 319
let rest = if let Some(rest) = rest.strip_prefix(fence_pattern) {
rest
} else {
let nl_fence_pattern = format!("\n{fence_pattern}");
let Some(frontmatter_nl) = rest.find(&nl_fence_pattern) else {
// frontmatter close is required
return None;
};
&rest[frontmatter_nl + nl_fence_pattern.len()..]
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might improve the flow here if you use .find('\n') and include the newline in rest, then you just need one single call of .find(&nl_fence_pattern).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

};

let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));
let line = line.trim_matches(is_whitespace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let line = line.trim_matches(is_whitespace);
let line = line.trim();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(being discussed at #137193 (comment))

&rest[frontmatter_nl + nl_fence_pattern.len()..]
};

let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));
let (closing_fence_remainder, rest) = rest.split_once("\n").unwrap_or((rest, ""));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, closing_fence_remainder felt a little off to me. I ended up settling on after_closing_fence. Does that work for you?

Squashed it, see https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c

let (line, rest) = rest.split_once("\n").unwrap_or((rest, ""));
let line = line.trim_matches(is_whitespace);
if !line.is_empty() {
// invalid close, even if there are extra `-`s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// invalid close, even if there are extra `-`s
// invalid close if there are extra characters beyond the original fence pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra '-'s is the key part I am trying to emphasize because that is a case that can easily be overlooked, by implementers or users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epage epage force-pushed the frontmatter branch 4 times, most recently from 1e418ee to 078a6f6 Compare March 10, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-frontmatter `#![feature(frontmatter)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants