Expose instruction encoding, CSR bitfield information as JSON #709

Timmmm · 2025-02-05T19:49:10Z

Currently there isn't an easy way to extract useful information from the Sail model like:

Instruction encoding patterns
Instruction assembly format
CSR field information

This information is really useful in some contexts (e.g. coverage), and there's definitely a desire for it - hence the UDB effort.

But I would much rather use the Sail model than UDB, so it would be nice to get this information in easily digestible form (i.e. JSON).

There is the JSON doc bundle, but it's targeted at embedding Sail into Asciidoc.

I think this is basically what @ThinkOpenly has been working on, but I don't think we have an issue for it. @ThinkOpenly can you say how far you got, and what it would take to get a JSON file with all this information building in sail-riscv's CI?

The text was updated successfully, but these errors were encountered:

jordancarlin · 2025-02-06T01:07:40Z

This would be very useful for some work I’m doing right now regarding functional coverage. I’m using riscv-opcodes as the base at the moment, but would definitely prefer to use sail as the base.

Alasdair · 2025-02-06T01:55:31Z

In theory this wouldn't be too hard.

The thing I've been concerned about is a JSON dump of this information might not be something we can guarantee remains stable between Sail versions, once we commit to some JSON representation and have people depending on it we might find ourselves stuck exposing too many internal implementation details. Hyrum's law is very very real, so I think we'd need to be careful.

The JSON produced for documentation sidesteps this by mostly just indexing the surface syntax, which is in practice much more stable than the internal abstract syntax tree.

Timmmm · 2025-02-06T09:32:59Z

I was imagining that we wouldn't internal the entire AST, more something along the lines of @ThinkOpenly 's json backend... but instead of hard-coding "encdec" etc. for the Sail model, have something more generic where you add attributes to the mappings etc.

One thing I don't understand is how you would deal with stuff like

function amo_width_valid(size : word_width) -> bool = {
  match size {
    BYTE   => extensionEnabled(Ext_Zabha),
    HALF   => extensionEnabled(Ext_Zabha),
    WORD   => true,
    DOUBLE => xlen >= 64,
  }
}

mapping clause encdec = AMO(op, aq, rl, rs2, rs1, size, rd)                                                if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)
  <-> encdec_amoop(op) @ bool_bits(aq) @ bool_bits(rl) @ rs2 @ rs1 @ 0b0 @ size_enc(size) @ rd @ 0b0101111 if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)

The condition makes it quite complicated and dynamic... On the other hand there are only a few cases like this so maybe it doesn't matter too much. It doesn't have to be perfect.

might not be something we can guarantee remains stable between Sail versions,

I think that's probably ok - we could add a schema version number/link to the output at least. If we also maintain some scripts that convert the JSON to C, Go, Rust, Python, etc. (like riscv-opcodes) then I think that would cover most people's use cases.

ThinkOpenly · 2025-02-06T18:36:07Z

With regard to the out-of-tree JSON backend, it extracts a lot of information into JSON. I worry that it is not 100% accurate, but it might offer some value, depending on what you seek.

Instruction encoding patterns

I think it does a pretty good job here, but we have not done a lot of validation. Known issues include ThinkOpenly/sail#18.

Example:

{
  "mnemonic": "vaadd.vv",
  "name": "TBD",
  "operands": [ { "name": "vd", "type": "regidx", "optional": false },{ "name": "vs2", "type": "regidx", "optional": false },{ "name": "vs1", "type": "regidx", "optional": false },{ "name": "vm", "type": "bits(1)", "optional": true, "default": "v0.t" } ],
  "syntax": "vd,vs2,vs1[,vm]",              
  "format": "TBD",
  "fields": [ { "field": "encdec_mvvfunct6(funct6)", "size": 6 }, { "field": "vm", "size": 1 }, { "field": "vs2", "size": 5 }, { "field": "vs1", "size": 5 }, { "field": "0b010", "size": 3 }, { "field": "vd", "size": 5 }, { "field": "0b1010111", "size": 7 } ],

Instruction assembly format

I think it does a pretty good job here, too, with similar caveats. Example above.

CSR field information

Currently, it extracts names only. ThinkOpenly/sail#10

I think this is basically what @ThinkOpenly has been working on, but I don't think we have an issue for it. @ThinkOpenly can you say how far you got, and what it would take to get a JSON file with all this information building in sail-riscv's CI?

To support extraction of CSR fields, I don't think would be terribly challenging.

The Sail JSON backend builds and runs on top of latest "sail2" branch. The code quality (indeed, the approach used) is subjective, but I'm happy to submit for review if there's sufficient interest.

That being said, it does walk the AST.

Would merging the backend into the canonical Sail repo alleviate the concern with AST drift?

ThinkOpenly · 2025-02-06T18:55:05Z

One thing I don't understand is how you would deal with stuff like

function amo_width_valid(size : word_width) -> bool = {
  match size {
    BYTE   => extensionEnabled(Ext_Zabha),
    HALF   => extensionEnabled(Ext_Zabha),
    WORD   => true,
    DOUBLE => xlen >= 64,
  }
}

mapping clause encdec = AMO(op, aq, rl, rs2, rs1, size, rd)                                                if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)
  <-> encdec_amoop(op) @ bool_bits(aq) @ bool_bits(rl) @ rs2 @ rs1 @ 0b0 @ size_enc(size) @ rd @ 0b0101111 if extensionEnabled(Ext_Zaamo) & amo_width_valid(size)

The condition makes it quite complicated and dynamic...

The JSON backend does not yet handle that level of complexity, FYI:

{
  "mnemonic": "lr.b",
[...]
  "extensions": [ "Zalrsc" ],

...or it would report the additional extension, "Zabha", here. (I just opened ThinkOpenly/sail#50. 🙂)

Timmmm · 2025-02-06T19:44:51Z

Would merging the backend into the canonical Sail repo alleviate the concern with AST drift?

I think that would make sense, but it would have to lose all the hard-coded RISC-V stuff since Sail isn't specific to RISC-V. I haven't looked at your code in detail but do you think it would be feasible to extract the raw information it requires in a generic way, if the RISC-V code had some appropriate annotations (and then the raw information could be processed into the final "friendly" form you have by a script). Does that make sense?

ThinkOpenly · 2025-02-06T20:35:29Z

I think that would make sense, but it would have to lose all the hard-coded RISC-V stuff since Sail isn't specific to RISC-V. I haven't looked at your code in detail but do you think it would be feasible to extract the raw information it requires in a generic way, if the RISC-V code had some appropriate annotations (and then the raw information could be processed into the final "friendly" form you have by a script).

This would be quite a challenge. The JSON backend necessarily relies on a lot of keywords in the RISC-V Sail code which are not necessarily generic:

encdec, encdec_compressed
assembly
execute (We get away with not looking for this specifically, because of the way we match things up after parsing the AST.)
we currently do some ugly stuff looking for "maybe" function names, presuming they're for optional operands
extensionEnabled (and the corresponding "Ext_" strings therein)
spc
sep

None of these constructs have a native representation in Sail. Even the basic mechanisms used by RISC-V Sail (e.g. scattered definitions) may not necessarily be utilized by other architectures.

So, it would be a real challenge to make a cross-architectural Sail-to-JSON backend.

On the other hand, I see some arch-specific code in Sail:

test/aarch64_small
old/{x86,power}
aarch64
aarch64_small
doc/code_riscv
snapshots/coq/{cheri-mips,riscv}
snapshots/hol4/sail/{riscv,mips,aarch64,cheri}
snapshots/isabelle/{riscv,aarch64,cheri}

Could the JSON backend just be RISC-V-specific?

Timmmm · 2025-02-06T21:44:32Z

I think I didn't explain what I meant very well. We could have an architecture-agnostic backend that extracted information from model into an "unfriendly" JSON file, and then an architecture-specific script that processed that file into the one you have made already.

At the extreme end, imagine a backend that just dumped the AST as JSON... but now change it so that it only dumps the minimum information you need.

Clearly I'll have to actually read your code! 😄

ThinkOpenly · 2025-02-06T22:22:18Z

We could have an architecture-agnostic backend that extracted information from model into an "unfriendly" JSON file, and then an architecture-specific script that processed that file into the one you have made already.

At the extreme end, imagine a backend that just dumped the AST as JSON... but now change it so that it only dumps the minimum information you need.

This is nearly impossible, as I understand what you are saying. I'd liken it to trying to extract the plot line from two different translations of a novel. It's instructive to compare the Sail code for RISC-V with, say https://github.com/rems-project/sail-arm/tree/master/arm-v9.4-a/src. Picking something roughly at random, this is (as I understand it with a brief look) ANDI:

val execute_aarch32_instrs_AND_i_Op_A_txt : forall 'd 'n ('setflags : Bool),
  (0 <= 'n & 'n <= 15 & 0 <= 'd & 'd <= 15).
  (bits(1), int('d), bits(32), int('n), bool('setflags)) -> unit

function execute_aarch32_instrs_AND_i_Op_A_txt (carry, d, imm32, n, setflags) = {
    let result : bits(32) = R_read(n) & imm32;
    if d == 15 then {
        if setflags then {
            ALUExceptionReturn(result)
        } else {
            ALUWritePC(result)
        }
    } else {
        R_set(d) = result;
        if setflags then {
            PSTATE.N = [result[31]];
            PSTATE.Z = IsZeroBit(result);
            PSTATE.C = carry
        };
        ()
    }
}

val decode_aarch32_instrs_AND_i_A1enc_A_txt : (bits(4), bits(1), bits(4), bits(4), bits(12)) -> unit

function decode_aarch32_instrs_AND_i_A1enc_A_txt (cond, S, Rn, Rd, imm12) = {
    if ConditionPassed() then {
        assert(cond != 0b1111);
        carry : bits(1) = undefined;
        imm32 : bits(32) = undefined;
        let 'd = UInt(Rd);
        let 'n = UInt(Rn);
        let setflags : bool = S == 0b1;
        (imm32, carry) = A32ExpandImm_C(imm12, PSTATE.C);
        execute_aarch32_instrs_AND_i_Op_A_txt(carry, d, imm32, n, setflags)
    };
    ()
}

function clause __DecodeA32_DataProMisc ((pc, ([_, _, _, _, bitzero, bitzero, bitone, bitzero, bitzero, bitzero, bitzero, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _] as __opcode)) if Slice(__opcode, 28, 4) != 0b1111 & SEE < 2843) = {
    SEE = 2843;
    let cond = Slice(__opcode, 28, 4);
    let S = Slice(__opcode, 20, 1);
    let Rn = Slice(__opcode, 16, 4);
    let Rd = Slice(__opcode, 12, 4);
    let imm12 = Slice(__opcode, 0, 12);
    decode_aarch32_instrs_AND_i_A1enc_A_txt(cond, S, Rn, Rd, imm12)
}

No scattered definitions for functions, completely different decode mechanism, different way of checking for extensions, etc. I'm not sure they even bothered with assembly syntax (I couldn't find it).

Clearly I'll have to actually read your code! 😄

It pairs well with a fine wine. ;-)

jrtc27 · 2025-02-06T22:25:12Z

The Arm Sail model is converted from ASL

bacam · 2025-02-07T14:41:44Z

The lack of scattered definitions in the Arm model is because we didn't really need an AST for anything that we were doing - but you can see in the quote above that there's a decode/execute distinction that you'd use for one and we don't think it would be very hard to add. Arm also provide machine system register information, but I don't think the Sail translation captures it in an easy form for reuse (e.g., when there are multiple views of one register).

ThinkOpenly · 2025-02-07T14:58:39Z

My primary point was to show that it is very easy to create two Sail-based representations (even of the same architecture) that would be very difficult to extract into a single, unified representation. Sail has far too much flexibility in implementation, for good or bad. There is no single way to represent an encoding scheme, or decoding scheme, or a scheme for extensions. Assembly syntax is optional.

Sail itself is able to do this, of course, to create things like emulators, so this may just be a statement on my skill level. ;-)

Alasdair · 2025-02-07T15:14:51Z

Part of the reason we are so flexible was precisely to support a fairly direct translation from ASL. There are definitely a few bad consequences from that, like the set of built-ins being so small as we wanted to plug in a separate stdlib that matched ASL.

I think we can consider upstreaming things that are RISC-V specific, but we should put some thought into whether that can be generalised first.

Timmmm added the enhancement New feature or request label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose instruction encoding, CSR bitfield information as JSON #709

Expose instruction encoding, CSR bitfield information as JSON #709

Timmmm commented Feb 5, 2025

jordancarlin commented Feb 6, 2025

Alasdair commented Feb 6, 2025

Timmmm commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

Timmmm commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

Timmmm commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

jrtc27 commented Feb 6, 2025

bacam commented Feb 7, 2025

ThinkOpenly commented Feb 7, 2025

Alasdair commented Feb 7, 2025

Expose instruction encoding, CSR bitfield information as JSON #709

Expose instruction encoding, CSR bitfield information as JSON #709

Comments

Timmmm commented Feb 5, 2025

jordancarlin commented Feb 6, 2025

Alasdair commented Feb 6, 2025

Timmmm commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

Timmmm commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

Timmmm commented Feb 6, 2025

ThinkOpenly commented Feb 6, 2025

jrtc27 commented Feb 6, 2025

bacam commented Feb 7, 2025

ThinkOpenly commented Feb 7, 2025

Alasdair commented Feb 7, 2025