add control flow graph benchmark results

cdump · Feb 24, 2025 · b9b96a0 · b9b96a0
1 parent 003b53c
commit b9b96a0
Show file tree

Hide file tree

Showing 3 changed files with 77 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -410,6 +410,52 @@ $ cast selectors --resolve $(cast code 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc
  </tr>
 </table>
 
+### Control Flow Graph
+
+<i>False Negatives</i> - Valid blocks possibly incorrectly marked unreachable by CFG analysis. Lower count usually indicates better precision.
+
+<table>
+ <tr>
+  <td></td>
+  <td><a href="benchmark/providers/evmole-rs"><b><i>evmole</i></b></a></td>
+  <td><a href="benchmark/providers/ethersolve"><b><i>ethersolve</i></b></a></td>
+  <td><a href="benchmark/providers/evm-cfg"><b><i>evm-cfg</i></b></a></td>
+  <td><a href="benchmark/providers/heimdall-rs"><b><i>heimdall</i></b></a></td>
+  <td><a href="benchmark/providers/evm-cfg-builder"><b><i>evm-cfg-builder</i></b></a></td>
+  <td><a href="benchmark/providers/sevm"><b><i>sevm</i></b></a></td>
+ </tr>
+ <tr>
+  <td><i>Total Blocks</i></td>
+  <td>97.0%🥇<br><sub>661957 </sub></td>
+  <td>93.7%<br><sub>639155</sub></td>
+  <td>63.0%<br><sub>429860</sub></td>
+  <td>31.9%<br><sub>217922</sub></td>
+  <td>21.7%<br><sub>148162</sub></td>
+  <td>6.7%<br><sub>45831</sub></td>
+ </tr>
+  <td><i>False Negatives</i></td>
+  <td>3.0%🥇<br><sub>20484</sub></td>
+  <td>6.3%<br><sub>43286</sub></td>
+  <td>37.0%<br><sub>252581</sub></td>
+  <td>68.1%<br><sub>464519</sub></td>
+  <td>78.3%<br><sub>534279</sub></td>
+  <td>93.3%<br><sub>636610</sub></td>
+ </tr>
+ <tr>
+  <td><i>Time</i></td>
+  <td>34s</td>
+  <td>1202s</td>
+  <td>40s</td>
+  <td>206s</td>
+  <td>308s</td>
+  <td>41s</td>
+ </tr>
+</table>
+
+dataset largest1k, 1000 contracts, 682,441 blocks
+
+### notes
+
 See [benchmark/README.md](./benchmark/) for the methodology and commands to reproduce these results
 
 <i>versions: evmole v0.7.0; <a href="https://github.com/shazow/whatsabi">whatsabi</a> v0.19.0; <a href="https://github.com/acuarica/evm">sevm</a> v0.7.4; <a href="https://github.com/g00dv1n/evm-hound-rs">evm-hound-rs</a> v0.1.4; <a href="https://github.com/Jon-Becker/heimdall-rs">heimdall-rs</a> v0.8.5</i>

diff --git a/benchmark/README.md b/benchmark/README.md
@@ -52,6 +52,7 @@ $ python3 compare.py
 # Compare specific mode
 $ python3 compare.py --mode=arguments
 $ python3 compare.py --mode=mutability
+$ python3 compare.py --mode=flow
 
 # Filter by dataset/provider and show errors
 python3 compare.py --mode=arguments --datasets largest1k --providers etherscan evmole-py --show-errors
@@ -63,5 +64,33 @@ python3 compare.py --mode=arguments --normalize-args fixed-size-array tuples str
 python3 compare.py --mode=selectors --markdown
 ```
 
+## Control Flow Graph Analysis
+The CFG analysis methodology consists of the following steps:
+
+1. Constructing Basic Blocks
+   - A basic block is a contiguous subsequence of EVM opcodes with:
+     - One entry point (first instruction)
+     - Ends at: JUMP, JUMPI, STOP, REVERT, RETURN, INVALID, unknown opcode, or end of code
+   - JUMPDEST cannot appear inside a block - it marks the start of a new block
+
+2. Filtering Out Definitely Unreachable Blocks
+   A block is definitely unreachable if:
+   - It does not begin at pc = 0 (contract start), AND
+   - First instruction is not JUMPDEST, AND
+   - Previous block does not end with JUMPI whose "false" branch falls through
+
+3. Set Definitions
+   - SET_BB: Set of all basic blocks after initial partitioning and removal of invalid blocks
+   - SET_CFG: Set of blocks reachable from pc = 0 per CFG algorithm
+
+4. Error Metrics
+   - False Positives = (SET_CFG - SET_BB)
+     - Blocks CFG claims reachable but not valid basic blocks
+     - Should be empty in correct analysis
+   - False Negatives = (SET_BB - SET_CFG)
+     - Valid blocks not marked reachable by CFG
+     - May include legitimate dead code
+     - Fewer indicates more precise analysis
+
 ## Datasets
 See [datasets/README.md](datasets/README.md) for information about how the test datasets were constructed.
diff --git a/benchmark/compare.py b/benchmark/compare.py
@@ -424,7 +424,7 @@ def show_arguments_or_mutability(providers: list[str], all_results: list, show_e
             'providers': ['etherscan', 'evmole-rs', 'evmole-js', 'evmole-py', 'whatsabi', 'sevm', 'heimdall-rs', 'simple']
         },
         'flow': {
-            'datasets': ['largest1k', 'random50k', 'vyper'],
+            'datasets': ['largest1k'],
             'providers': ['evmole-rs', 'evm-cfg', 'ethersolve', 'sevm', 'evm-cfg-builder', 'heimdall-rs']
         }
     }
@@ -472,5 +472,5 @@ def show_arguments_or_mutability(providers: list[str], all_results: list, show_e
         show_arguments_or_mutability(cfg.providers, results, cfg.show_errors)
 
     elif cfg.mode == 'flow':
-        results = [process_flow(cfg.datasets[0], cfg.providers, cfg.results_dir)]
+        results = [process_flow(d, cfg.providers, cfg.results_dir) for d in cfg.datasets]
         show_flow(cfg.providers, results, cfg.show_errors)