-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unstable frontmatter support #137193
base: master
Are you sure you want to change the base?
Add unstable frontmatter support #137193
Conversation
r? @fee1-dead rustbot has assigned @fee1-dead. Use |
@@ -55,6 +55,16 @@ pub(crate) fn lex_token_trees<'psess, 'src>( | |||
start_pos = start_pos + BytePos::from_usize(shebang_len); | |||
} | |||
|
|||
// Skip frontmatter, if present. | |||
if let Some(frontmatter_len) = rustc_lexer::strip_frontmatter(src) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other places we call strip_shebang
that I have not (yet?) updated
src/tools/rust-analyzer/crates/parser/src/lexed_str.rs
compiler/rustc_ast_pretty/src/pprust/state.rs
This is in part because I'm not too sure what the appropriate way of doing it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other places we call strip_shebang that I have not (yet?) updated [emphasis: mine]
Both of these aren't blocking this PR, that's for sure.
compiler/rustc_ast_pretty/src/pprust/state.rs
Disclaimer: I'm not 100% sure what would happen if you don't strip the frontmatter (e.g., whether it can lead to ICEs or only butchered output) but we should definitely do so (at some point) because otherwise the AST pretty-printer would reinterpret the frontmatter as Rust code when collecting Rust source code comments (which it tries to "reattach" later on, maybe? I'm not quite sure).
I mean you could simply check what happens under -Zunpretty=normal
or -Zunpretty=expanded
given a frontmatter that contains e.g., // …
.
src/tools/rust-analyzer/crates/parser/src/lexed_str.rs
Note: I don't speak for the rust-analyzer team. However, I know that they prefer that one doesn't modify rust-analyzer from within this repository (via). So I recommend you to either open an issue or a PR over at rust-lang/rust-analyzer for frontmatter support.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/tools/rust-analyzer/crates/parser/src/lexed_str.rs
I've made a separate line item on the tracking issue for r-a's implementation
compiler/rustc_ast_pretty/src/pprust/state.rs
Knowing what to look for in tests, I've gone ahead and add this to this PR, updating the initial test commit and adding a commit specifically for pretty support.
This comment has been minimized.
This comment has been minimized.
Might be worth adding a test case to |
This comment has been minimized.
This comment has been minimized.
Some changes occurred in src/tools/rustfmt cc @rust-lang/rustfmt |
@@ -55,6 +55,16 @@ pub(crate) fn lex_token_trees<'psess, 'src>( | |||
start_pos = start_pos + BytePos::from_usize(shebang_len); | |||
} | |||
|
|||
// Skip frontmatter, if present. | |||
if let Some(frontmatter_len) = rustc_lexer::strip_frontmatter(src) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
ed5aad9
to
64cecab
Compare
/// Frontmatter is a special attribute type reserved for use by external tools | ||
/// | ||
/// This must be called after [`strip_shebang`] | ||
pub fn strip_frontmatter(input: &str) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of stripping the frontmatter and returning an Option
(None
when the frontmatter is invalid), we should make a best effort to parse it first. For example, we should provide a customized error message if we detect the start of frontmatter but it was never properly closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke to this in the PR description:
This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.
Frontmatter is an attribute for calling tools to consume. This means that if the frontmatter is invalid, they will report the error before rustc ever sees it. Rather than investing a lot of effort in quality error messages here that almost no one will see, I figured it would be better to put that focus in Cargo where they will be seen.
I have started a discussion on zulip about making cargo's code reusable. In theory, we could share between rustc and Cargo. However, I worry about the release logistics involved to keep us in-sync.
What I felt was important is that rustc at least error in the different cases, even if the message isn't the greatest. We can then use the tests verifying that to track any improvements to the error and prevent accidentally making them success cases without further work (which I'm assuming is unlikely to happen).
Could you speak to your concerns with that plan or why better error reporting is a blocker for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fee1-dead from #137193 (review)
So the way I currently see this, is that there is a natural way of splitting these two. I think cargo should only parse valid frontmatters (in terms of the fence syntax) and should provide detailed errors that are invalid within a frontmatter (unknown fields and etc.)
For rustc, because we do recognize frontmatters and attempt to strip it, we should tell users what went wrong. For example, if there are only two
-
s, then that's an invalid frontmatter start and we should tell them that. If the infostring is not an identifier, we should tell the users that. If the frontmatter does not have a valid close, then we should tell the users that....
I'm not sure how you currently see this responsibility being split/duplicated across these two projects, but it is slightly irresponsible to me to just assume that because cargo will do the rest of the job instead, we land this code which doesn't properly consider errors.
cargo
isn't the primary Rust code parser, so it shouldn't be the one responsible for parser errors. If it can't parse the front matter, it should emit an error, give the rest to rustc and let rustc report good errors.So, I'm opposed to landing this in this way. Please take some time to work on good errors, which will not be wasted. (if we do decide to put those in cargo instead, you can pretty much still reuse the code) I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard. Feel free to comment here or chat me on Zulip if you disagree.
(moving back to the original thread for continity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm understanding correctly, you are suggesting to flip this and have cargo ignore the frontmatter syntax if its invalid, and have rustc do the full error report.
My concern is that having rustc report the error is too late in the process. For example, cargo has many commands that operate on manifests-only (e.g. cargo metadata
) and that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error. This experience could be even worse for tools like cargo add
. We'd need to at least implement go a step further and at least report there was an error (and non-mutating commands could turn that into None
so rustc's error isn't masked).
if we do decide to put those in cargo instead, you can pretty much still reuse the code
Pulling in rustc-lexer would be complicated. We publish cargo
and would need something that can be published. We'd need to work with T-release to deal with weird ping ponging between our repos as we transition from development to release. We'd also need to work through how to integrate specific assumptions rustc makes to something that cargo can use. This is all then limited to just cargo and not other consumers. We could copy and paste the code but that also doesn't buy us much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error
That's also true. But I don't want people to call rustc and get confusing errors. Perhaps that can be figured out later in the process.
compiler/rustc_lexer/src/lib.rs
Outdated
return None; | ||
} | ||
let (fence_pattern, rest) = rest.split_at(fence_end); | ||
let (info, rest) = rest.split_once("\n").unwrap_or((rest, "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is a frontmatter like ---cargo---
going to be parsed? or ---cargo ---
?
It looks like they aren't considered to be valid. Consider making that a bit clearer (maybe here? to say that if there is no newline it will be invalid) and add tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything after the 3+ -
is the infostring, including any dashes that appear after something else. We check if its an identifier on the following lines which will reject that. We have a test case for invalid identifiers.
Or is the concern more with
strip_frontmatter("---")
(ie a test case where there is no newline after the opening)
☔ The latest upstream changes (presumably #137927) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the way I currently see this, is that there is a natural way of splitting these two. I think cargo should only parse valid frontmatters (in terms of the fence syntax) and should provide detailed errors that are invalid within a frontmatter (unknown fields and etc.)
For rustc, because we do recognize frontmatters and attempt to strip it, we should tell users what went wrong. For example, if there are only two -
s, then that's an invalid frontmatter start and we should tell them that. If the infostring is not an identifier, we should tell the users that. If the frontmatter does not have a valid close, then we should tell the users that.
Furthermore, we should warn if we are not called from cargo, which should acknowledge to us (probably via some flag) that the frontmatter has been parsed correctly.
I'm not sure how you currently see this responsibility being split/duplicated across these two projects, but it is slightly irresponsible to me to just assume that because cargo will do the rest of the job instead, we land this code which doesn't properly consider errors. cargo
isn't the primary Rust code parser, so it shouldn't be the one responsible for parser errors. If it can't parse the front matter, it should emit an error, give the rest to rustc and let rustc report good errors.
So, I'm opposed to landing this in this way. Please take some time to work on good errors, which will not be wasted. (if we do decide to put those in cargo instead, you can pretty much still reuse the code) I know that this is an unstable feature and we can always reiterate on this later, but I feel the need to pushback in order to let this feedback heard. Feel free to comment here or chat me on Zulip if you disagree.
This is being added as an unstable feature. Does merging of the current implementation need to be blocked on those, e.g. negatively affecting stable code in some way that I missed? Leaving this as something to resolve before stabilization on the tracking issue means we unblock me from updating rust-analyzer, checking some more cargo fmt and cargo doc cases, updating cargo to use this, etc. Those and continuing to discuss this can happen in parallel. Otherwise, I fear how long this will drag on as we work through rustc and cargo's requirements or re-discuss topics from the RFC while having multi-day lag between responses, after which I'll be able to start on those other tasks. |
start_pos = start_pos + BytePos::from_usize(frontmatter_len); | ||
let hi = start_pos; | ||
let span = Span::with_root_ctxt(lo, hi); | ||
psess.gated_spans.gate(sym::frontmatter, span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fee1-dead from #137193 (review)
Furthermore, we should warn if we are not called from cargo, which should acknowledge to us (probably via some flag) that the frontmatter has been parsed correctly.
(moving to a thread to better follow the discussion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Through the RFC process, T-lang insisted this not be Cargo specific, see RFC language.
That still leaves room for the calling tool to be able to register their frontmatter infostring so rustc could warn otherwise. However, we'd first need to determine if this is needed, if there are other problems with doing this, what other cases to handle, etc. For example, if a tool registers an infostring, should that only be for the root module and disallowed otherwise? The problem with that is that this was designed around allowing people to turn random files into scripts, so they need to be allowed further in. So maybe we validate those are from the calling tool. Except do they need to be? So that then leaves we validate the tool's infostring against the actual infostring in the root module. Except any deviation from a sub-module to the root module can happen just as well with the root module because you could have a non-script have a frontmatter for use as a script in whatever other tool you want.
Sorry, I overlooked this before. We have a system in place for this kind of stuff: tracking issues, stabilization reports, and fcp concerns. I have added the error handling to #136889 as an unresolved question which will block stabilization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a closer look at the implementation given the latest comments. Had some nits and I'd be fine with landing this temporarily after the suggestions are applied.
/// Frontmatter is a special attribute type reserved for use by external tools | ||
/// | ||
/// This must be called after [`strip_shebang`] | ||
pub fn strip_frontmatter(input: &str) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a subpar experience to act as if there is no manifest until something invokes rustc and then to report the error
That's also true. But I don't want people to call rustc and get confusing errors. Perhaps that can be figured out later in the process.
compiler/rustc_lexer/src/lib.rs
Outdated
while !rest.is_empty() { | ||
let without_spaces = rest.trim_start_matches(|c| is_whitespace(c) && c != '\n'); | ||
let without_nl = without_spaces.trim_start_matches('\n'); | ||
if without_nl.len() == rest.len() { | ||
// nothing trimmed | ||
break; | ||
} else if without_nl.len() == without_spaces.len() { | ||
// either not a frontmatter or invalid opening | ||
return None; | ||
} | ||
rest = without_nl; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better here to just trim all the whitespace up front, then either the length is unchanged (starts with no whitespace) or the last trimmed character should be a newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave that a spin. You can see the squashed change in https://github.com/rust-lang/rust/compare/2501d454739ef26ef52e213fe5b12e94fde3f48c..1e418eeb4c3b0fdf35889295a8036daa4c084ca6
// Opens with a line that starts with 3+ `-` followed by an optional identifier | ||
const FENCE_CHAR: char = '-'; | ||
let fence_end = | ||
rest.char_indices().find_map(|(i, c)| (c != FENCE_CHAR).then_some(i)).unwrap_or(rest.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no need to use unwrap_or(rest.len());
here. If the fence does not end, we return None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the thought to use ?
? We discussed that previously at #137193 (comment)
let (fence_pattern, rest) = rest.split_at(fence_end); | ||
let (info, rest) = rest.split_once("\n").unwrap_or((rest, "")); | ||
let info = info.trim_matches(is_whitespace); | ||
if !info.is_empty() && !is_ident(info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !info.is_empty() && !is_ident(info) { | |
// Either there is no infostring, or the infostring should be a valid indent. | |
if !(info.is_empty() || is_ident(info)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of comments mapping this back to the textual description of this, that is handled an earlier comment. This then ends up just duplicating the logic that immediately follows and I'm not seeing the value of it.
I'm also not sure why the emphasis on pulling out the !
. For me, I'm not seeing the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead and made a slight tweak to the inner comment. Squashed it but its visible in https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c
compiler/rustc_lexer/src/lib.rs
Outdated
let rest = if let Some(rest) = rest.strip_prefix(fence_pattern) { | ||
rest | ||
} else { | ||
let nl_fence_pattern = format!("\n{fence_pattern}"); | ||
let Some(frontmatter_nl) = rest.find(&nl_fence_pattern) else { | ||
// frontmatter close is required | ||
return None; | ||
}; | ||
&rest[frontmatter_nl + nl_fence_pattern.len()..] | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might improve the flow here if you use .find('\n')
and include the newline in rest, then you just need one single call of .find(&nl_fence_pattern)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get what you were suggesting. Squashed it; see it at https://github.com/rust-lang/rust/compare/1e418eeb4c3b0fdf35889295a8036daa4c084ca6..078a6f68e170033250888604caca3b5ea5a3d4b7
compiler/rustc_lexer/src/lib.rs
Outdated
}; | ||
|
||
let (line, rest) = rest.split_once("\n").unwrap_or((rest, "")); | ||
let line = line.trim_matches(is_whitespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let line = line.trim_matches(is_whitespace); | |
let line = line.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(being discussed at #137193 (comment))
compiler/rustc_lexer/src/lib.rs
Outdated
&rest[frontmatter_nl + nl_fence_pattern.len()..] | ||
}; | ||
|
||
let (line, rest) = rest.split_once("\n").unwrap_or((rest, "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let (line, rest) = rest.split_once("\n").unwrap_or((rest, "")); | |
let (closing_fence_remainder, rest) = rest.split_once("\n").unwrap_or((rest, "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, closing_fence_remainder
felt a little off to me. I ended up settling on after_closing_fence
. Does that work for you?
Squashed it, see https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c
compiler/rustc_lexer/src/lib.rs
Outdated
let (line, rest) = rest.split_once("\n").unwrap_or((rest, "")); | ||
let line = line.trim_matches(is_whitespace); | ||
if !line.is_empty() { | ||
// invalid close, even if there are extra `-`s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// invalid close, even if there are extra `-`s | |
// invalid close if there are extra characters beyond the original fence pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra '-'
s is the key part I am trying to emphasize because that is a case that can easily be overlooked, by implementers or users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with a variant that preserves that information. Squashed it, see https://github.com/rust-lang/rust/compare/26f461654c1e8ef0d93b5267b3873685688d25c3..2501d454739ef26ef52e213fe5b12e94fde3f48c
1e418ee
to
078a6f6
Compare
This is an implementation for #136889
This strips the frontmatter, like shebang, because nothing within rustc will be using this. rustfmt folks have said this should be good enough for now as rustfmt should successfully work with this (ie not remove it) even though proper formatting would require AST support. I had considered making this part of lexing of tokens but that led to problems with ambiguous tokens. See also zulip.
This does not do any fancy error reporting for invalid syntax. Besides keeping this first PR simple, we can punt on this because Cargo will be the front line of parsing in the majority of cases and it will need to do have as good or better error reporting than rustc, for this syntax, to avoid masking the quality of rustc's error reporting.
This includes supporting the syntax in
This leaves to the future