Abstract
As the security space around EVM-based smart contracts matures, we can observe a recurring pattern in security vulnerabilities across all EVM spectrums; a significant portion of them arise from re-entrancy vulnerabilities,
With this issue, we aim to introduce a built-in re-entrancy check across all functions of a generated contract by default, permitting programmers to explicitly mark their functions as re-entrant via the newly introduced reentrant
keyword, being declared akin to override
and co.
Motivation
Rationale
Re-entrancy attacks are one of the most common root causes of multi-million dollar exploits we can observe by going through historical exploits. Additionally, the concept of a re-entrancy is very hard to grasp when coming from a traditional programming background due to the unique nature of the EVM.
Solidity is an evolving language that attempts to cater to the wider EVM development community and has historically introduced tools to aid in developing using the language more securely, such as built-in arithmetic checks introduced in 0.8.X
.
With this change, we aim to introduce built-in re-entrancy protections by default with the ability to bypass these protections explicitly, empowering seasoned developers with maximum flexibility while protecting newcomers from EVM-related caveats they may not be aware of.
Proposal
The proposed keyword (reentrant
) is meant to mark a function explicitly re-entrant. As a result, code generation of Solidity would need to introduce a breaking change that will cause the "entrypoint" of the bytecode to evaluate the reentrancy flag.
For the proposal to function properly, a NON_REENTRANT_FLAG_OFFSET
compiler-literal would need to be introduced that signifies a storage slot's offset that is meant to indicate the re-entrant flag that is validated. This offset should be preferrably located in the upper-half of the type(uint256).max
range that a smart contract's storage slot space supports to ensure no conflicts with existing implementations.
Keyword vs. Existing Syntax
Upon additional feedback from @pcaversaccio and issue #12996, I would like to add some additional insight as to why a new keyword was chosen over the existing syntax. The original issue revolved around the concept of a new keyword and switched over to the idea of using unchecked
to perform external calls without triggering any re-entrancy safety checks.
The unchecked
keyword is meant to be utilized in the locale it is declared in (i.e. an upper-most unchecked
block will not affect the statements of internal calls it makes), as such, such a solution is not viable if we want the new re-entrancy feature to be compatible with existing programming paradigms such as inheritance.
We are faced with either breaking the existing behaviour of unchecked
to apply to internal call chains or introducing a new keyword. The latter appears to be more explicit and easier to grasp for security auditors and developers alike, however, feedback is appreciated.
Example Showcase
To illustrate how the generated bytecode would be altered, let us take a subset of the WETH9
contract:
contract MockWETH9 {
event Deposit(address indexed dst, uint wad);
mapping (address => uint) public balanceOf;
fallback() public payable {
deposit();
}
function deposit() public payable {
balanceOf[msg.sender] += msg.value;
emit Deposit(msg.sender, msg.value);
}
}
The above contract contains two functions that do mutate the state (fallback
& deposit
) as well as one function that does not mutate the state (balanceOf
).
Its compilation with current tools would result in the following bytecode in pseudo-code format:
contract MockWETH9Bytecode {
function main() {
memory[0x40:0x60] = 0x60;
// Fallback Function
if (msg.data.length < 0x04) {
label_00AF:
var var0 = 0x00b7;
deposit();
stop();
} else {
// Extract Function Signature
var0 = msg.data[0x00:0x20] / 0x0100000000000000000000000000000000000000000000000000000000 & 0xffffffff;
// Signature Comparison
if (var0 == 0xd0e30db0) {
// Pseudo-Code of deposit()
var1 = 0x03d2;
deposit();
stop();
} else if (var0 == 0x70a08231) {
// Pseudo-Code of balanceOf(address)
if (msg.value) { revert(memory[0x00:0x00]); }
var1 = 0x02cc;
var2 = msg.data[0x04:0x24] & 0xffffffffffffffffffffffffffffffffffffffff;
var2 = balanceOf(var2);
var temp17 = memory[0x40:0x60];
memory[temp17:temp17 + 0x20] = var2;
var temp18 = memory[0x40:0x60];
return memory[temp18:temp18 + (temp17 + 0x20) - temp18];
} else { goto label_00AF; }
}
}
function deposit() {
memory[0x00:0x20] = msg.sender;
memory[0x20:0x40] = 0x03;
var temp0 = keccak256(memory[0x00:0x40]);
storage[temp0] = storage[temp0] + msg.value;
var temp1 = memory[0x40:0x60];
memory[temp1:temp1 + 0x20] = msg.value;
var temp2 = memory[0x40:0x60];
log(memory[temp2:temp2 + (temp1 + 0x20) - temp2], [0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c, msg.sender]);
}
function balanceOf(var arg0) returns (var arg0) {
memory[0x20:0x40] = 0x03;
memory[0x00:0x20] = arg0;
return storage[keccak256(memory[0x00:0x40])];
}
}
Given that the compiler can detect which parts of the generated bytecode will mutate the state and which will not (based on the view
/ pure
keywords), we have two cases of code generation:
- No
reentrant
Functions Defined
- n > 1
reentrant
Functions Defined
No reentrant
Function Case
For the first case, the bytecode generation module would inject a blanket check at the beginning of main
that would validate the re-entrant state of the contract. Afterwards, the bytecode generation module would introduce an assignment to the re-entrant flag solely in the if-else
clauses that execute code mutating the state.
Identifying the correct points of injection should be trivial as the compiler is already aware of which functions mutate the state via the view
and pure
keywords. To illustrate how the bytecode generated would look like, let us take the first case with the original MockWETH9
smart contract code:
contract MockWETH9Bytecode {
function main() {
memory[0x40:0x60] = 0x60;
// INJECTED CODE: Evaluate whether flag is already set and yield an error in case of re-entrancy
if (storage[NON_REENTRANT_FLAG_OFFSET] == 0x02) { revert(memory[0x00:0x00]); }
// Fallback Function
if (msg.data.length < 0x04) {
label_00AF:
// INJECTED CODE: Mutating Function -> Set Non-Reentrant Flag
storage[NON_REENTRANT_FLAG_OFFSET] = 0x02;
var var0 = 0x00b7;
deposit();
// INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;
stop();
} else {
// Extract Function Signature
var0 = msg.data[0x00:0x20] / 0x0100000000000000000000000000000000000000000000000000000000 & 0xffffffff;
// Signature Comparison
if (var0 == 0xd0e30db0) {
// Pseudo-Code of deposit()
// INJECTED CODE: Mutating Function -> Set Non-Reentrant Flag
storage[NON_REENTRANT_FLAG_OFFSET] = 0x02;
var1 = 0x03d2;
deposit();
// INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;
stop();
} else if (var0 == 0x70a08231) {
// Pseudo-Code of balanceOf(address)
// Code Injection not necessary as function cannot mutate state
if (msg.value) { revert(memory[0x00:0x00]); }
var1 = 0x02cc;
var2 = msg.data[0x04:0x24] & 0xffffffffffffffffffffffffffffffffffffffff;
var2 = balanceOf(var2);
var temp17 = memory[0x40:0x60];
memory[temp17:temp17 + 0x20] = var2;
var temp18 = memory[0x40:0x60];
return memory[temp18:temp18 + (temp17 + 0x20) - temp18];
} else { goto label_00AF; }
}
}
function deposit() {
memory[0x00:0x20] = msg.sender;
memory[0x20:0x40] = 0x03;
var temp0 = keccak256(memory[0x00:0x40]);
storage[temp0] = storage[temp0] + msg.value;
var temp1 = memory[0x40:0x60];
memory[temp1:temp1 + 0x20] = msg.value;
var temp2 = memory[0x40:0x60];
log(memory[temp2:temp2 + (temp1 + 0x20) - temp2], [0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c, msg.sender]);
}
function balanceOf(var arg0) returns (var arg0) {
memory[0x20:0x40] = 0x03;
memory[0x00:0x20] = arg0;
return storage[keccak256(memory[0x00:0x40])];
}
}
The bytecode generator can further optimize the gas cost of the injection by performing the re-entrant flag assignment conditionally via a temporary variable which will hold the value of storage[NON_REENTRANT_FLAG_OFFSET]
that is evaluated at the very start of the main
block.
reentrant
Function Case
This case would simply require the blanket check in the main
code block showcased above to be relocated to all if-else
bodies that do NOT have the reentrant
modifier set. To illustrate how the reentrant
keyword would be used, let us adjust the original MockWETH9
code to now permit re-entrancy solely for the deposit
function:
contract MockWETH9 {
event Deposit(address indexed dst, uint wad);
mapping (address => uint) public balanceOf;
fallback() public payable {
deposit();
}
function deposit() public payable reentrant {
balanceOf[msg.sender] += msg.value;
emit Deposit(msg.sender, msg.value);
}
}
The bytecode generation would look like the following:
contract MockWETH9Bytecode {
function main() {
memory[0x40:0x60] = 0x60;
// Fallback Function
if (msg.data.length < 0x04) {
label_00AF:
// INJECTED CODE: Evaluate whether flag is already set and yield an error in case of re-entrancy, else set flag
if (storage[NON_REENTRANT_FLAG_OFFSET] == 0x02) { revert(memory[0x00:0x00]); }
else { storage[NON_REENTRANT_FLAG_OFFSET] = 0x02; }
var var0 = 0x00b7;
deposit();
// INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;
stop();
} else {
// Extract Function Signature
var0 = msg.data[0x00:0x20] / 0x0100000000000000000000000000000000000000000000000000000000 & 0xffffffff;
// Signature Comparison
if (var0 == 0xd0e30db0) {
// Pseudo-Code of deposit()
// INJECTED CODE: Mutating Function -> Set Non-Reentrant Flag
storage[NON_REENTRANT_FLAG_OFFSET] = 0x02;
var1 = 0x03d2;
deposit();
// INJECTED CODE: Mutating Function -> Reset Non-Reentrant Flag
storage[NON_REENTRANT_FLAG_OFFSET] = 0x01;
stop();
} else if (var0 == 0x70a08231) {
// Pseudo-Code of balanceOf(address)
// INJECTED CODE: Evaluate whether flag is already set and yield an error in case of re-entrancy
if (storage[NON_REENTRANT_FLAG_OFFSET] == 0x02) { revert(memory[0x00:0x00]); }
if (msg.value) { revert(memory[0x00:0x00]); }
var1 = 0x02cc;
var2 = msg.data[0x04:0x24] & 0xffffffffffffffffffffffffffffffffffffffff;
var2 = balanceOf(var2);
var temp17 = memory[0x40:0x60];
memory[temp17:temp17 + 0x20] = var2;
var temp18 = memory[0x40:0x60];
return memory[temp18:temp18 + (temp17 + 0x20) - temp18];
} else { goto label_00AF; }
}
}
function deposit() {
memory[0x00:0x20] = msg.sender;
memory[0x20:0x40] = 0x03;
var temp0 = keccak256(memory[0x00:0x40]);
storage[temp0] = storage[temp0] + msg.value;
var temp1 = memory[0x40:0x60];
memory[temp1:temp1 + 0x20] = msg.value;
var temp2 = memory[0x40:0x60];
log(memory[temp2:temp2 + (temp1 + 0x20) - temp2], [0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c, msg.sender]);
}
function balanceOf(var arg0) returns (var arg0) {
memory[0x20:0x40] = 0x03;
memory[0x00:0x20] = arg0;
return storage[keccak256(memory[0x00:0x40])];
}
}
A yet-to-be defined behaviour arises if we declare the fallback
function as reentrant
when it invokes the deposit
function which has not been declared so. To ensure maximal compatibility with existing programming paradigms, we believe that the reentrant
keyword should mark a function as re-entrant regardless of its internal call chain. As a result, if we have the following code:
contract MockWETH9 {
event Deposit(address indexed dst, uint wad);
mapping (address => uint) public balanceOf;
fallback() public payable reentrant {
deposit();
}
function deposit() public payable {
balanceOf[msg.sender] += msg.value;
emit Deposit(msg.sender, msg.value);
}
}
The function fallback
will be re-entrant even if it invokes deposit
which we have not marked so. This ensures compatibility with libraries / smart contract dependencies as otherwise users who wish to set their functions as reentrant
deliberately would have to reflect that modifier to the full call-chain. Additionally, given that the introduction of the keyword is a concious and deliberate choice by the developer(s), we consider them to be fully aware of the implications of the reentrant
keyword.
Advanced Usage
We are aware that re-entrancy is indeed desirable in a set of limited use cases, the most common being proxy implementations that follow a fragmented logic pattern and thus invoke themselves externally (i.e. Diamond standard cross-facet invocations). To accommodate for these implementations, we propose the introduction of an argument to the reentrant
keyword similarly to how arguments are present for the override
keyword.
In detail, we advise the introduction of a single, optional address
argument which marks a function as reentrant
but solely for a particular address
. In the case of the Diamond standard, for example, we can introduce the reentrant(address(this))
syntax to ensure that the facets of the Diamond can invoke each other without compromising the wider security guarantees of the system.
Additionally, this syntax permits complex smart contract systems that are meant to invoke one-another mid-execution to still function post-0.9.X
securely. Multi-address support can be introduced, however, it should be delayed until a sufficient use-case is illustrated by the development community that cannot be solved by better programming practices.
Specification
While the specification of how the new reentrant
keyword will operate can be extracted from the above text, we would also like to highlight which sections of the official Solidity documentation would require adjustments to accommodate for this change. Reference specification can be produced upon request for all chapters outlined below should this feature request gain traction.
Contracts Section
A new "Re-Entrancy" chapter would need to be introduced that describes how re-entrancy behaves post-0.9.X
(in that it is prohibited) and how developers can make use of the reentrant
keyword to bypass this security measure. A warning chapter should be introduced as well ensuring that the developers are well aware of the security guarantees they are nullifying by using the keyword.
Cheatsheet
The Modifiers
section would need to be expanded with the new reentrant
keyword and how it is meant to be used.
Language Grammar
An identical rule to the override
specifier would need to be introduced specifying how the new reentrant
keyword is meant to be parsed when reading Solidity code using machines.
Layout of State Variables In Storage
This chapter should specify the newly reserved NON_REENTRANT_FLAG_OFFSET
as a matter of specification. Overwriting the storage area of the flag via overlap or a storage slot hash collision due to the usage of upgradeable patterns and standards such as EIP-1967 should be of negligible concern with a likelihood akin to that of general hash collisions.
Solidity v0.9.0 Breaking Changes
This chapter should, as its namesake indicates, highlight the breaking change of how reentrant
behaves and how contracts compiled in pragma solidity ^0.9.0
will have re-entrancy protections enforced by default.
Backwards Compatibility
As the code generation's behaviour will change to enforce re-entrancy protection by default, this is a breaking change requiring a minor semver bump to prompt developers to get accustomed to the new security measure.
General Concerns
Security Bypass
Given that the change illustrated by this issue would rely on the storage space of the smart contract, developers will be able to explicitly unset the re-entrant flag via assembly
blocks that access the low-level nature of the EVM and write to the NON_REENTRANT_FLAG_OFFSET
storage slot. Such code is considered malicious by nature and should be flagged by auditors as well as potential static analyzers that aid them.
Bytecode & Gas Increase
The issue attempts to explain the proposed change in the Solidity language in a way that minimizes the gas footprint as well as bytecode size impact. Nevertheless, both of these numbers will increase for all contracts compiled beyond 0.9.X
.
We believe the security guarantees achieved by this change to be worth the extra units of gas and size, which is evidenced by the developer community itself via the common usage of libraries implementing this trait such as ReentrancyGuard
by OpenZeppelin.
Resources
The pseudo-code of the bytecode was generated by the ethervm decompiler and was consequently manually edited to illustrate the smaller subset of MockWETH9
as well as the adjustments that reentrant
and post-0.9.X
compilation would cause.