-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose instruction encoding, CSR bitfield information as JSON #709
Comments
This would be very useful for some work I’m doing right now regarding functional coverage. I’m using riscv-opcodes as the base at the moment, but would definitely prefer to use sail as the base. |
In theory this wouldn't be too hard. The thing I've been concerned about is a JSON dump of this information might not be something we can guarantee remains stable between Sail versions, once we commit to some JSON representation and have people depending on it we might find ourselves stuck exposing too many internal implementation details. Hyrum's law is very very real, so I think we'd need to be careful. The JSON produced for documentation sidesteps this by mostly just indexing the surface syntax, which is in practice much more stable than the internal abstract syntax tree. |
I was imagining that we wouldn't internal the entire AST, more something along the lines of @ThinkOpenly 's json backend... but instead of hard-coding One thing I don't understand is how you would deal with stuff like
The condition makes it quite complicated and dynamic... On the other hand there are only a few cases like this so maybe it doesn't matter too much. It doesn't have to be perfect.
I think that's probably ok - we could add a schema version number/link to the output at least. If we also maintain some scripts that convert the JSON to C, Go, Rust, Python, etc. (like riscv-opcodes) then I think that would cover most people's use cases. |
With regard to the out-of-tree JSON backend, it extracts a lot of information into JSON. I worry that it is not 100% accurate, but it might offer some value, depending on what you seek.
I think it does a pretty good job here, but we have not done a lot of validation. Known issues include ThinkOpenly/sail#18. Example: {
"mnemonic": "vaadd.vv",
"name": "TBD",
"operands": [ { "name": "vd", "type": "regidx", "optional": false },{ "name": "vs2", "type": "regidx", "optional": false },{ "name": "vs1", "type": "regidx", "optional": false },{ "name": "vm", "type": "bits(1)", "optional": true, "default": "v0.t" } ],
"syntax": "vd,vs2,vs1[,vm]",
"format": "TBD",
"fields": [ { "field": "encdec_mvvfunct6(funct6)", "size": 6 }, { "field": "vm", "size": 1 }, { "field": "vs2", "size": 5 }, { "field": "vs1", "size": 5 }, { "field": "0b010", "size": 3 }, { "field": "vd", "size": 5 }, { "field": "0b1010111", "size": 7 } ],
I think it does a pretty good job here, too, with similar caveats. Example above.
Currently, it extracts names only. ThinkOpenly/sail#10
To support extraction of CSR fields, I don't think would be terribly challenging. The Sail JSON backend builds and runs on top of latest "sail2" branch. The code quality (indeed, the approach used) is subjective, but I'm happy to submit for review if there's sufficient interest. That being said, it does walk the AST. Would merging the backend into the canonical Sail repo alleviate the concern with AST drift? |
The JSON backend does not yet handle that level of complexity, FYI: {
"mnemonic": "lr.b",
[...]
"extensions": [ "Zalrsc" ], ...or it would report the additional extension, "Zabha", here. (I just opened ThinkOpenly/sail#50. 🙂) |
I think that would make sense, but it would have to lose all the hard-coded RISC-V stuff since Sail isn't specific to RISC-V. I haven't looked at your code in detail but do you think it would be feasible to extract the raw information it requires in a generic way, if the RISC-V code had some appropriate annotations (and then the raw information could be processed into the final "friendly" form you have by a script). Does that make sense? |
This would be quite a challenge. The JSON backend necessarily relies on a lot of keywords in the RISC-V Sail code which are not necessarily generic:
None of these constructs have a native representation in Sail. Even the basic mechanisms used by RISC-V Sail (e.g. scattered definitions) may not necessarily be utilized by other architectures. So, it would be a real challenge to make a cross-architectural Sail-to-JSON backend. On the other hand, I see some arch-specific code in Sail:
Could the JSON backend just be RISC-V-specific? |
I think I didn't explain what I meant very well. We could have an architecture-agnostic backend that extracted information from model into an "unfriendly" JSON file, and then an architecture-specific script that processed that file into the one you have made already. At the extreme end, imagine a backend that just dumped the AST as JSON... but now change it so that it only dumps the minimum information you need. Clearly I'll have to actually read your code! 😄 |
This is nearly impossible, as I understand what you are saying. I'd liken it to trying to extract the plot line from two different translations of a novel. It's instructive to compare the Sail code for RISC-V with, say https://github.com/rems-project/sail-arm/tree/master/arm-v9.4-a/src. Picking something roughly at random, this is (as I understand it with a brief look) ANDI:
No scattered definitions for functions, completely different decode mechanism, different way of checking for extensions, etc. I'm not sure they even bothered with assembly syntax (I couldn't find it).
It pairs well with a fine wine. ;-) |
The Arm Sail model is converted from ASL |
The lack of scattered definitions in the Arm model is because we didn't really need an AST for anything that we were doing - but you can see in the quote above that there's a decode/execute distinction that you'd use for one and we don't think it would be very hard to add. Arm also provide machine system register information, but I don't think the Sail translation captures it in an easy form for reuse (e.g., when there are multiple views of one register). |
My primary point was to show that it is very easy to create two Sail-based representations (even of the same architecture) that would be very difficult to extract into a single, unified representation. Sail has far too much flexibility in implementation, for good or bad. There is no single way to represent an encoding scheme, or decoding scheme, or a scheme for extensions. Assembly syntax is optional. Sail itself is able to do this, of course, to create things like emulators, so this may just be a statement on my skill level. ;-) |
Part of the reason we are so flexible was precisely to support a fairly direct translation from ASL. There are definitely a few bad consequences from that, like the set of built-ins being so small as we wanted to plug in a separate stdlib that matched ASL. I think we can consider upstreaming things that are RISC-V specific, but we should put some thought into whether that can be generalised first. |
Currently there isn't an easy way to extract useful information from the Sail model like:
This information is really useful in some contexts (e.g. coverage), and there's definitely a desire for it - hence the UDB effort.
But I would much rather use the Sail model than UDB, so it would be nice to get this information in easily digestible form (i.e. JSON).
There is the JSON doc bundle, but it's targeted at embedding Sail into Asciidoc.
I think this is basically what @ThinkOpenly has been working on, but I don't think we have an issue for it. @ThinkOpenly can you say how far you got, and what it would take to get a JSON file with all this information building in
sail-riscv
's CI?The text was updated successfully, but these errors were encountered: