Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analysis/cdecl: custom parser for a large enough subset of C's declaration syntax. #1

Draft
wants to merge 3 commits into
base: rewrite-xml
Choose a base branch
from

Conversation

eddyb
Copy link

@eddyb eddyb commented Jan 28, 2024

For some background, see ash-rs#745 (comment).

Example dump output (from second commit): https://gist.github.com/eddyb/5ae0247468892ea7dd2a136ed3afa869.

I haven't spent much time trying to tweak the exact CType AST etc. - I assume that ash would want to also account for differences between C and Rust and other such things that I'm not familiar with.

So the goal here was to be lossless: all the information is included, in an easily-traversable form.

Copy link
Owner

@Friz64 Friz64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some very minor comments, but overall this looks fantastic. Thank you very much!

};

// FIXME(eddyb) deduplicate qualifier parsing somehow.
let mut const_qualif = match left {
Copy link
Owner

@Friz64 Friz64 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this be an Option<()> kind of feels unnecessary to me. It could just as well be a bool.

Here's a patch, to demonstrate what I'm talking about (click to expand)
diff --git a/analysis/src/cdecl.rs b/analysis/src/cdecl.rs
index 105ef89..dcf4522 100644
--- a/analysis/src/cdecl.rs
+++ b/analysis/src/cdecl.rs
@@ -1,4 +1,4 @@
-use std::num::NonZeroU8;
+use std::{mem, num::NonZeroU8};
 
 /// Identifier-category-aware minimal tokenization of a subset of C syntax,
 /// sufficient for parsing the C declarations used in `vk.xml`.
@@ -155,15 +155,6 @@ impl<'a> CDecl<'a> {
     ) -> Result<CDecl<'a>, CDeclParseErrorKind<'a>> {
         use CDeclParseErrorKind as ErrorKind;
 
-        trait InsertIfNone<T> {
-            fn insert_if_none(&mut self, value: T) -> Option<&mut T>;
-        }
-        impl<T> InsertIfNone<T> for Option<T> {
-            fn insert_if_none(&mut self, value: T) -> Option<&mut T> {
-                self.is_none().then(|| self.insert(value))
-            }
-        }
-
         let (mut left, decl_name, mut right) = {
             let mut decl_names =
                 tokens
@@ -214,9 +205,9 @@ impl<'a> CDecl<'a> {
         let mut const_qualif = match left {
             [CTok::Kw("const"), rest @ ..] => {
                 left = rest;
-                Some(())
+                true
             }
-            _ => None,
+            _ => false,
         };
 
         let mut ty = CType::Base(match left {
@@ -262,14 +253,16 @@ impl<'a> CDecl<'a> {
             while let Some((&leftmost, after_leftmost)) = left.split_first() {
                 match leftmost {
                     CTok::Kw("const") => {
-                        const_qualif
-                            .insert_if_none(())
-                            .ok_or(ErrorKind::Multiple("const"))?;
+                        if const_qualif {
+                            return Err(ErrorKind::Multiple("const"));
+                        }
+
+                        const_qualif = true;
                     }
                     CTok::Punct('*') => {
                         ty = CType::Ptr {
                             implicit_for_decay: false,
-                            is_const: const_qualif.take().is_some(),
+                            is_const: mem::take(&mut const_qualif),
                             pointee: Box::new(ty),
                         };
                     }
@@ -336,7 +329,7 @@ impl<'a> CDecl<'a> {
                         };
                     }
                     [CTok::Punct('('), params @ .., CTok::Punct(')')] => {
-                        if const_qualif.is_some() {
+                        if const_qualif {
                             return Err(ErrorKind::Unused("const"));
                         }
 
@@ -375,12 +368,12 @@ impl<'a> CDecl<'a> {
         if let (CDeclMode::FuncParam, CType::Array { .. }) = (mode, &ty) {
             ty = CType::Ptr {
                 implicit_for_decay: true,
-                is_const: const_qualif.take().is_some(),
+                is_const: mem::take(&mut const_qualif),
                 pointee: Box::new(ty),
             };
         }
 
-        if const_qualif.is_some() {
+        if const_qualif {
             return Err(ErrorKind::Unused("const"));
         }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had more uses for insert_if_none and there's a lot more things in the C syntax that work similarly, but are not used in here, I'll have to take a second look.

Booleans aren't great for stuff like this for a few different reasons, but I'm almost certain insert_if_none had more call sites before I refactored all of this to be more general and work for the function pointer typedefs, so I should clean it up.

@eddyb
Copy link
Author

eddyb commented Mar 4, 2024

One thing I should really do, besides addressing the comments, is to go over the terminology used in comments/variable name, since I have mentally validated my approach against the kind of C grammar I used to rely on, and should try to actually use standard nomenclature instead of my L/R stuff.

One subtle thing is that L/R really are prefix/suffix to the declaration, because e.g. T *p, x; implies both *p : T and x : T separately, i.e. equivalent to T *p; and T x;.

The confounding factor is that T* is valid syntax on its own only in e.g. C++ templates (foo<T*>) or some C features (sizeof(T*)/typeof(T*) etc.), but in those positions T[n] is also valid, so they really act like the full declarator syntax for a single variable but omitting the name being declared, and that syntax isn't relevant here.

@Friz64 Friz64 force-pushed the rewrite-xml branch 3 times, most recently from 278849a to d610c44 Compare March 16, 2024 07:08
@Friz64
Copy link
Owner

Friz64 commented Mar 16, 2024

I have a rebased and "improved" version of this branch up at https://github.com/Friz64/ash/tree/rewrite-xml-cdecl.

@eddyb eddyb marked this pull request as draft May 25, 2024 05:41
@eddyb
Copy link
Author

eddyb commented May 25, 2024

Working on this again, marking as draft to avoid accidental merging.

Copy link
Owner

@Friz64 Friz64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyb I've rebased the first commit (as part of ash-rs#867): 6e89a89

:)

}

#[derive(Debug)]
pub struct UnsupportedCTok<'a>(&'a str);
Copy link
Owner

@Friz64 Friz64 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should implement Error.

impl<'a> fmt::Display for UnsupportedCTok<'a> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "unsupported C token {:?}", self.0)
    }
}

impl<'a> Error for UnsupportedCTok<'a> {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants