Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose instruction encoding, CSR bitfield information as JSON #709

Open
Timmmm opened this issue Feb 5, 2025 · 13 comments
Open

Expose instruction encoding, CSR bitfield information as JSON #709

Timmmm opened this issue Feb 5, 2025 · 13 comments
Labels
enhancement New feature or request

Comments

@Timmmm
Copy link
Collaborator

Timmmm commented Feb 5, 2025

Currently there isn't an easy way to extract useful information from the Sail model like:

  • Instruction encoding patterns
  • Instruction assembly format
  • CSR field information

This information is really useful in some contexts (e.g. coverage), and there's definitely a desire for it - hence the UDB effort.

But I would much rather use the Sail model than UDB, so it would be nice to get this information in easily digestible form (i.e. JSON).

There is the JSON doc bundle, but it's targeted at embedding Sail into Asciidoc.

I think this is basically what @ThinkOpenly has been working on, but I don't think we have an issue for it. @ThinkOpenly can you say how far you got, and what it would take to get a JSON file with all this information building in sail-riscv's CI?

@Timmmm Timmmm added the enhancement New feature or request label Feb 5, 2025
@jordancarlin
Copy link
Collaborator

This would be very useful for some work I’m doing right now regarding functional coverage. I’m using riscv-opcodes as the base at the moment, but would definitely prefer to use sail as the base.

@Alasdair
Copy link
Collaborator

Alasdair commented Feb 6, 2025

In theory this wouldn't be too hard.

The thing I've been concerned about is a JSON dump of this information might not be something we can guarantee remains stable between Sail versions, once we commit to some JSON representation and have people depending on it we might find ourselves stuck exposing too many internal implementation details. Hyrum's law is very very real, so I think we'd need to be careful.

The JSON produced for documentation sidesteps this by mostly just indexing the surface syntax, which is in practice much more stable than the internal abstract syntax tree.

@Timmmm
Copy link
Collaborator Author

Timmmm commented Feb 6, 2025

I was imagining that we wouldn't internal the entire AST, more something along the lines of @ThinkOpenly 's json backend... but instead of hard-coding "encdec" etc. for the Sail model, have something more generic where you add attributes to the mappings etc.

One thing I don't understand is how you would deal with stuff like

function amo_width_valid(size : word_width) -> bool = {
  match size {
    BYTE   => extensionEnabled(Ext_Zabha),
    HALF   => extensionEnabled(Ext_Zabha),
    WORD   => true,
    DOUBLE => xlen >= 64,
  }
}

mapping clause encdec = AMO(op, aq, rl, rs2, rs1, size, rd)                                                if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)
  <-> encdec_amoop(op) @ bool_bits(aq) @ bool_bits(rl) @ rs2 @ rs1 @ 0b0 @ size_enc(size) @ rd @ 0b0101111 if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)

The condition makes it quite complicated and dynamic... On the other hand there are only a few cases like this so maybe it doesn't matter too much. It doesn't have to be perfect.

might not be something we can guarantee remains stable between Sail versions,

I think that's probably ok - we could add a schema version number/link to the output at least. If we also maintain some scripts that convert the JSON to C, Go, Rust, Python, etc. (like riscv-opcodes) then I think that would cover most people's use cases.

@ThinkOpenly
Copy link
Contributor

With regard to the out-of-tree JSON backend, it extracts a lot of information into JSON. I worry that it is not 100% accurate, but it might offer some value, depending on what you seek.

  • Instruction encoding patterns

I think it does a pretty good job here, but we have not done a lot of validation. Known issues include ThinkOpenly/sail#18.

Example:

{
  "mnemonic": "vaadd.vv",
  "name": "TBD",
  "operands": [ { "name": "vd", "type": "regidx", "optional": false },{ "name": "vs2", "type": "regidx", "optional": false },{ "name": "vs1", "type": "regidx", "optional": false },{ "name": "vm", "type": "bits(1)", "optional": true, "default": "v0.t" } ],
  "syntax": "vd,vs2,vs1[,vm]",              
  "format": "TBD",
  "fields": [ { "field": "encdec_mvvfunct6(funct6)", "size": 6 }, { "field": "vm", "size": 1 }, { "field": "vs2", "size": 5 }, { "field": "vs1", "size": 5 }, { "field": "0b010", "size": 3 }, { "field": "vd", "size": 5 }, { "field": "0b1010111", "size": 7 } ],
  • Instruction assembly format

I think it does a pretty good job here, too, with similar caveats. Example above.

  • CSR field information

Currently, it extracts names only. ThinkOpenly/sail#10

I think this is basically what @ThinkOpenly has been working on, but I don't think we have an issue for it. @ThinkOpenly can you say how far you got, and what it would take to get a JSON file with all this information building in sail-riscv's CI?

To support extraction of CSR fields, I don't think would be terribly challenging.

The Sail JSON backend builds and runs on top of latest "sail2" branch. The code quality (indeed, the approach used) is subjective, but I'm happy to submit for review if there's sufficient interest.

That being said, it does walk the AST.

Would merging the backend into the canonical Sail repo alleviate the concern with AST drift?

@ThinkOpenly
Copy link
Contributor

One thing I don't understand is how you would deal with stuff like

function amo_width_valid(size : word_width) -> bool = {
  match size {
    BYTE   => extensionEnabled(Ext_Zabha),
    HALF   => extensionEnabled(Ext_Zabha),
    WORD   => true,
    DOUBLE => xlen >= 64,
  }
}

mapping clause encdec = AMO(op, aq, rl, rs2, rs1, size, rd)                                                if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)
  <-> encdec_amoop(op) @ bool_bits(aq) @ bool_bits(rl) @ rs2 @ rs1 @ 0b0 @ size_enc(size) @ rd @ 0b0101111 if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)

The condition makes it quite complicated and dynamic...

The JSON backend does not yet handle that level of complexity, FYI:

{
  "mnemonic": "lr.b",
[...]
  "extensions": [ "Zalrsc" ],

...or it would report the additional extension, "Zabha", here. (I just opened ThinkOpenly/sail#50. 🙂)

@Timmmm
Copy link
Collaborator Author

Timmmm commented Feb 6, 2025

Would merging the backend into the canonical Sail repo alleviate the concern with AST drift?

I think that would make sense, but it would have to lose all the hard-coded RISC-V stuff since Sail isn't specific to RISC-V. I haven't looked at your code in detail but do you think it would be feasible to extract the raw information it requires in a generic way, if the RISC-V code had some appropriate annotations (and then the raw information could be processed into the final "friendly" form you have by a script). Does that make sense?

@ThinkOpenly
Copy link
Contributor

I think that would make sense, but it would have to lose all the hard-coded RISC-V stuff since Sail isn't specific to RISC-V. I haven't looked at your code in detail but do you think it would be feasible to extract the raw information it requires in a generic way, if the RISC-V code had some appropriate annotations (and then the raw information could be processed into the final "friendly" form you have by a script).

This would be quite a challenge. The JSON backend necessarily relies on a lot of keywords in the RISC-V Sail code which are not necessarily generic:

  • encdec, encdec_compressed
  • assembly
  • execute (We get away with not looking for this specifically, because of the way we match things up after parsing the AST.)
  • we currently do some ugly stuff looking for "maybe" function names, presuming they're for optional operands
  • extensionEnabled (and the corresponding "Ext_" strings therein)
  • spc
  • sep

None of these constructs have a native representation in Sail. Even the basic mechanisms used by RISC-V Sail (e.g. scattered definitions) may not necessarily be utilized by other architectures.

So, it would be a real challenge to make a cross-architectural Sail-to-JSON backend.

On the other hand, I see some arch-specific code in Sail:

  • test/aarch64_small
  • old/{x86,power}
  • aarch64
  • aarch64_small
  • doc/code_riscv
  • snapshots/coq/{cheri-mips,riscv}
  • snapshots/hol4/sail/{riscv,mips,aarch64,cheri}
  • snapshots/isabelle/{riscv,aarch64,cheri}

Could the JSON backend just be RISC-V-specific?

@Timmmm
Copy link
Collaborator Author

Timmmm commented Feb 6, 2025

I think I didn't explain what I meant very well. We could have an architecture-agnostic backend that extracted information from model into an "unfriendly" JSON file, and then an architecture-specific script that processed that file into the one you have made already.

At the extreme end, imagine a backend that just dumped the AST as JSON... but now change it so that it only dumps the minimum information you need.

Clearly I'll have to actually read your code! 😄

@ThinkOpenly
Copy link
Contributor

We could have an architecture-agnostic backend that extracted information from model into an "unfriendly" JSON file, and then an architecture-specific script that processed that file into the one you have made already.

At the extreme end, imagine a backend that just dumped the AST as JSON... but now change it so that it only dumps the minimum information you need.

This is nearly impossible, as I understand what you are saying. I'd liken it to trying to extract the plot line from two different translations of a novel. It's instructive to compare the Sail code for RISC-V with, say https://github.com/rems-project/sail-arm/tree/master/arm-v9.4-a/src. Picking something roughly at random, this is (as I understand it with a brief look) ANDI:

val execute_aarch32_instrs_AND_i_Op_A_txt : forall 'd 'n ('setflags : Bool),
  (0 <= 'n & 'n <= 15 & 0 <= 'd & 'd <= 15).
  (bits(1), int('d), bits(32), int('n), bool('setflags)) -> unit

function execute_aarch32_instrs_AND_i_Op_A_txt (carry, d, imm32, n, setflags) = {
    let result : bits(32) = R_read(n) & imm32;
    if d == 15 then {
        if setflags then {
            ALUExceptionReturn(result)
        } else {
            ALUWritePC(result)
        }
    } else {
        R_set(d) = result;
        if setflags then {
            PSTATE.N = [result[31]];
            PSTATE.Z = IsZeroBit(result);
            PSTATE.C = carry
        };
        ()
    }
}

val decode_aarch32_instrs_AND_i_A1enc_A_txt : (bits(4), bits(1), bits(4), bits(4), bits(12)) -> unit

function decode_aarch32_instrs_AND_i_A1enc_A_txt (cond, S, Rn, Rd, imm12) = {
    if ConditionPassed() then {
        assert(cond != 0b1111);
        carry : bits(1) = undefined;
        imm32 : bits(32) = undefined;
        let 'd = UInt(Rd);
        let 'n = UInt(Rn);
        let setflags : bool = S == 0b1;
        (imm32, carry) = A32ExpandImm_C(imm12, PSTATE.C);
        execute_aarch32_instrs_AND_i_Op_A_txt(carry, d, imm32, n, setflags)
    };
    ()
}

function clause __DecodeA32_DataProMisc ((pc, ([_, _, _, _, bitzero, bitzero, bitone, bitzero, bitzero, bitzero, bitzero, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _] as __opcode)) if Slice(__opcode, 28, 4) != 0b1111 & SEE < 2843) = {
    SEE = 2843;
    let cond = Slice(__opcode, 28, 4);
    let S = Slice(__opcode, 20, 1);
    let Rn = Slice(__opcode, 16, 4);
    let Rd = Slice(__opcode, 12, 4);
    let imm12 = Slice(__opcode, 0, 12);
    decode_aarch32_instrs_AND_i_A1enc_A_txt(cond, S, Rn, Rd, imm12)
}

No scattered definitions for functions, completely different decode mechanism, different way of checking for extensions, etc. I'm not sure they even bothered with assembly syntax (I couldn't find it).

Clearly I'll have to actually read your code! 😄

It pairs well with a fine wine. ;-)

@jrtc27
Copy link
Collaborator

jrtc27 commented Feb 6, 2025

The Arm Sail model is converted from ASL

@bacam
Copy link
Collaborator

bacam commented Feb 7, 2025

The lack of scattered definitions in the Arm model is because we didn't really need an AST for anything that we were doing - but you can see in the quote above that there's a decode/execute distinction that you'd use for one and we don't think it would be very hard to add. Arm also provide machine system register information, but I don't think the Sail translation captures it in an easy form for reuse (e.g., when there are multiple views of one register).

@ThinkOpenly
Copy link
Contributor

My primary point was to show that it is very easy to create two Sail-based representations (even of the same architecture) that would be very difficult to extract into a single, unified representation. Sail has far too much flexibility in implementation, for good or bad. There is no single way to represent an encoding scheme, or decoding scheme, or a scheme for extensions. Assembly syntax is optional.

Sail itself is able to do this, of course, to create things like emulators, so this may just be a statement on my skill level. ;-)

@Alasdair
Copy link
Collaborator

Alasdair commented Feb 7, 2025

Part of the reason we are so flexible was precisely to support a fairly direct translation from ASL. There are definitely a few bad consequences from that, like the set of built-ins being so small as we wanted to plug in a separate stdlib that matched ASL.

I think we can consider upstreaming things that are RISC-V specific, but we should put some thought into whether that can be generalised first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants