Decompiling EVM Bytecode

27 Dec, 202312 Min ReadBy Mantle
MantleWeb3
Decompiling EVM Bytecode

I have recently developed and implemented an algorithm to generate Control Flow Graphs (CFGs) from EVM bytecode. The details of the algorithm can be found in this note, and the implementation is available in the evm-dis repository. In this post I’ll discuss the following aspects:

  • what kind of information do CFGs provide and what can we do with them?
  • how can you use the CFG generator and who can benefit from using it?
  • how I implemented the CFG generator.

What is the purpose of a CFG?

Control flow graphs (CFGs) provide an abstract semantics of programs.

As the name suggests, the CFG of a program captures how the control location is updated during the program execution. The control location indicates what instruction in the program is to be executed next.

A CFG is a standard directed graph with nodes and edges. The nodes are labelled with control locations. A CFG has a designated initial node. Paths in the graph can be constructed by starting at the initial node and following the (directed) edges. A path defines a sequence of nodes. As nodes can be mapped to their labels that are control locations, a path in the CFG defines a sequence of control locations. The set of paths in the graph defines a set of sequences of control locations.

Jumps in the EVM

Most of the EVM instructions update the control location to the next line of code in the program except for JUMP and JUMPI. JUMP is an inconditional jump instruction that updates the control location to the value that is at the top of the stack; it is like a goto statement. JUMPI is similar but conditional: it jumps only if the second topmost element on the stack is non-zero, otherwise it progresses to the next instruction in the code.

Note that relative JUMPs, RJUMPs are similar to JUMPs so we omit them in this discussion.

A (code) segment is a linear sequence of instructions i.e. a sequence that contains no JUMP/JUMPIs. Instead of treating each control location/instruction in the bytecode as a CFG node, we collapse linear sequences of instructions into a single node and each node in the CFG is labelled with a segment rather than with a single control location.

Example

Figure 11 below provides an example of CFG. The bytecode of the source program is the sequence of instructions in Segments 0 to 6, put together, in that order. Each node is labelled by the Segment number and the line of code with which the segment starts.

The initial node in Figure 1 is the node labelled with Segment 0 at the top. An execution of the program starts at the first control location2 of the initial node i.e. 0x00. Plain black arrows denote a normal update of the control location i.e. it goes to the next instruction in the code: for example, the arrow Segment 0 --> Segment 1 indicates that the program executes PUSH1 0x01, PUSH0, and progresses to the instruction after PUSH0 in the code, at 0x03, and executes JUMPDEST, PUSH1 0x0a, .... In contrast, plain green arrows indicate JUMPs: Segment 5 - - > Segment 1 goes from control location 0x2b: JUMP to control location 0x03 and not to the next instruction in the code.

In the example of Figure 1, each node is labelled with a different segment. However, the same (code) segment can be associated with different nodes. This may reflect that the context of the execution of a segment is different in two occurrences. An example is a program with two calls to the same “function” (segment).

Figure 1: A CFG with nested loops. Jumps are indicated with dashed arrows.

Valid CFGs

One of the expected properties of CFGs is that they should over-approximate the set of feasible sequences of control locations in the concrete program. In other words, if the concrete program has an execution (for some input) that corresponds to a sequence of control locations pc0pc_0, pc1pc_1, ......, pckpc_k, then the CFG must contain a path that corresponds to pc0pc_0, pc1pc_1, ......, pckpc_k. CFGs that satisfy this property are valid CFGs.

A trivial valid CFG is a graph that has edges between any two control locations. This is too coarse an over-approximation to be of any use. We want CFGs that are also precise enough to capture some useful information of the program. And this is why the CFG reconstruction problem is non-trivial. In a previous note I have presented an algorithm to construct valid and precise CFGs.

What can we do with CFGs?

Graph properties

In the example of Figure 1 above we have a valid and precise enough CFG to learn some interesting properties from the graph. A first property is that there are strongly connected components3 in the graph. This indicates that there are probably loops (in fact there is a nested loop) in the EVM bytecode.

A second property is that the code at Segments 0 and 2 is only executed once (it is not on a loop).

Another surfacing property is that the EVM is definitely a stack machine: the blue instructions in the CFG are stack operations and they obviously form the majority of the code.

Optimisations

CFGs are used by compilers to optimise the code. Optimisations like contants propagation, minimisation of the number of registers are often computed using a CFG and some program analysis techniques (e.g. reaching definitions, slicing).

A CFG can also be the input to static analysers e.g. in software verification techniques based on abstraction refinement.

Known limitations

As the CFG is an over-approximation there are properties of the program that are not captured by its CFG: for instance, how many times a loop can be entered. The CFG above accepts arbitrary number of iterations for each of the two loops. More generally, a CFG does not evaluate conditions but rather assumes that any condition that is used in a JUMPI can be sometimes true and sometimes false.

How to use the CFG generator and Who can use it?

Usage

The CFG generator is available in C#, Java and Python. You can find the artefacts in the build folder of the repository. For instance, the Java build is run as follows:

macbook1> java -jar evmdis.jar 
usage: <this program>  [--help]  [--dis]  [--proof]  [--segment]  [--all]  [--lib]  arg0 [--cfg]  arg0 [--raw]  arg0 <string>

options
--help      [-h] Display help and exit
--dis       [-d] Disassemble <string>
--proof     [-p] Generate proof object for <string>
--segment   [-s] Print segment of <string>
--all       [-a] Same as -d -p
--lib       [-l] The path to the Dafny-EVM source code. Used to add includes files in the proof object. 
--cfg       [-c] Max depth. Control flow graph in DOT format
--raw       [-r] Display non-minimised and minimised CFGs

The input to the CFG generator is a string, hexadecimal bytecode, like the one produced by solc -bin-runtime. If you want to feed the CFG generator with a file you may just use $(<file.bin) to pipe the content of the file to the input to the CFG generator.

The usage section of the README in the git repository provides guidance how to use the generator.

The generator can output several artefacts:

  • the disassembled bytecode (--dis) i.e. a readable version of the hexadecimal string;
  • the list of segments in the bytecode (--segment);
  • the CFG of the bytecode (--cfg). This option takes a parameter which is the maximum depth when unfolding the graph. For large contracts e.g. an ERC-20, or the Deposit Contract, a max depth of 100 is enough. This max depth is there to ensure termination of the algorithm. If the max depth is not large enough, some traces may not be fully explored and this is reflected in the ouput by self-loops on some nodes in the graph. If that happens, you may increase the max depth and start again.
  • minimised CFGS and raw CFGs. The --raw flag will output two versions of the CFG: a non-minimised and a minimised one. This is only useful for small code snippets, otherwise it gets very large very quickly, so better use the minimised version if you can.

Who is it for?

Generating CFG is useful if you want to learn what EVM bytecode is and how it works. CFGs as the one in Figure 1 provides a readable view of the bytecode. If you want to learn how the solc compiler compiles Solidity code with if-the-else, while loops or complex data structures, using this tool may help you to do so. There are tooltips and annotations to the instructions and segments so you can check what the semantics of an instruction is or what is the effect of a segment on the stack size, or gas costs of simple instructions.

If you are interested in analysing or auditing the bytecode4, this tool may also help. You can use the CFG directly to review the code, but also feed the CFG to automatic analysis tools e.g. Rattle, EthIR. A direct inspection of the graph may reveal that there are some INVALID instructions that are reachable and this may need further investigation. It may also reveal that there is no RETURN instruction that is reachable in the CFG and in that case the bytecode never terminates normally. At the moment the CFG is generated in DOT format but it is relatively easy to implement the generation to other target languages.

If you are a compiler developer, and this is true not only for Solidity to EVM but for other compilers too, you may use the CFG generator to check compilation strategies and compare the results of different ones, or to optimise some sections of the compiled code. For instance, if you look at the CFG of the Deposit Contrat in the repository, you may see that there are a few INVALID isntructions in the compiled code, and that several segments are duplicated: the same sequence of instructions appear several times in the bytecode at different locations. This may be optimised away and save some precious space. Regarding optimisations, solc does not seem to have a lot of strategies to apply and using a precise CFG may help desiging better optimisations (e.g. using reaching definitions, slicing, etc).

Finally, another avenue is to use this tool for formal verification. This is actually my primary motivation and you may have noticed that there is a --proof option in the generator that I have not discussed yet. This option produces some Dafny code, a proof object, that uses the Dafny-EVM semantics [1]. This provides a way to automatically reason about the bytecode. This feature is not fully implemented and I’ll discuss it in another post.

How I implemented the Algorithm

I have used Dafny a lot in the last five years, mostly for formal verification purposes, e.g. [1], [2], [3] and [4]. Dafny is a verification-friendly programming language and I was eager to try and develop real software with Dafny. The overall experience is great, although I am strongly biased as I am a formal verification person.

How did I benefit from using a verification-friendly programming language to develop this piece of software?

  1. I never had to debug errors like array-out-of-bounds, undeflow/overflow, loop termination. The Dafny compiler enforces the programmer to fix these issues before executing the code. To be fair, you can bypass the verification stage in Dafny, and I have tried that in two instances. My motivation was that the “functions” I had written were so obvious that I did not need to provide a proof now, I could do it later and execute the code now. In both instances, this was a mistake. The code I had written was flawed, and I discovered the flaws (infinite loop and off-by-one in a list) when writing the proofs.
  2. I could write tests that are exhaustive, e.g. in the sense that I start with an arbitrary input sequence of nats, and can prove some properties of my functions for all sequences of nats. At the same time I wrote some real and concrete tests to check that some functions were computing the expected results. Both approaches are complementary.
  3. I used constrained types a lot. This is similar to dependent types and provides a lot of safety in the code. For instance, I have defined an Instruction datatype. I have added a predicate to define valid instructions and a type ValidInstruction that are the Instructions that satisfy the Valid predicate. I have done that for several datatypes in the code base. What does it buy me? I saved a huge amount of time5 not having to deal with ill-formed Instructions, for instance whe pretty-priting them.
  4. One of the main “idioms” in Dafny is to use pre- and postconditions to prove properties of functions. I have used it a lot, and the Dafny verification engine makes sure that every time you call a function the preconditions are satisfied. What is amazing is that some postconditions can be proved by Dafny without any hints but under the hood the proof requires reasoning by induction. A good example is the BuildCFG function that builds the CFG for which I have written several constraints and postconditions on the sizes of paths. This helps the programmer to understand their code better and get more confidence about what is computed.

Dafny has several backends to generate artefacts in different languages. I have been able to generate the Java, C# and Python arfetacts. This makes the Dafny code I have written available in several languages and it can be re-used in projects suign these languages. More Dafny backends generators6 are being developed (e.g. Go, Rust, JS) and I expect to generate more artefacts in a near future.

Conclusion

CFGs are instrumental to code analysis and optimisations. Hopefully a tool like the one I presented in this post to generate precise CFGs, can help to get a better understanding of EVM bytecode. An online demo version is available at www.bytespector.org.

My main motivation in building CFGs is to use the CFGs to generate proof objects for EVM bytecode. A proof object can be thought of as instrumented bytecode and can be used with the Dafny-EVM to reason and prove properties of contracts at the bytecode level. I’ll report on this work later in a separate post.

References

[1] Cassez, F., Fuller, J., Ghale, M.K., Pearce, D.J., Quiles, H.M.A. (2023). Formal and Executable Semantics of the Ethereum Virtual Machine in Dafny. In: Chechik, M., Katoen, JP., Leucker, M. (eds) Formal Methods. FM 2023. Lecture Notes in Computer Science, vol 14000. Springer, Cham. DOI

[2] Cassez, F., Fuller, J., Asgaonkar, A. (2022). Formal Verification of the Ethereum 2.0 Beacon Chain. In: Fisman, D., Rosu, G. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2022. Lecture Notes in Computer Science, vol 13243. Springer, Cham. DOI

[3] Cassez, F., Fuller, J., Quiles, H.M.A. (2022). Deductive Verification of Smart Contracts with Dafny. In: Groote, J.F., Huisman, M. (eds) Formal Methods for Industrial Critical Systems. FMICS 2022. Lecture Notes in Computer Science, vol 13487. Springer, Cham. DOI

[4] Cassez, F. (2021). Verification of the Incremental Merkle Tree Algorithm with Dafny. In: Huisman, M., Păsăreanu, C., Zhan, N. (eds) Formal Methods. FM 2021. Lecture Notes in Computer Science(), vol 13047. Springer, Cham. DOI


Footnotes

  1. Hovering on the instructions/segments will reveal some information on the semantics and link to the Dafny-EVM semantics of the instruction. Hovering the reveals gas cost when it is constant.

  2. The control location is displayed on the left, in hexadecimal with the prefix0x.

  3. A strongly connected component is a subgraph in which there is a path from each node to any another node.

  4. Compilers from Solidity to EVM bytecode are not certified correct, so even if you are confident your high-level code is good enough, the bytecode generated by the compiler may be flawed. Several serious compilers bugs have been reported in the last year.

  5. Arguably this is not a specific property of Dafny but a common property of dependent type or strongly typed languages.

  6. The code generators are not certified correct, so there may be (and there are) bugs in the translation to some target languages.

Join the Community

X/Twitter