Ensure the security of your smart contracts

Substrate vs Ethereum: Differences and Similarities

Author: Sergey Boogerwooger & Alexey Naberezhniy
Security researchers at MixBytes
Intro
Over the past few years, MixBytes has taken part in various projects within the Polkadot and Ethereum ecosystems as security auditors, developers, and testers. We have audited and developed numerous Ethereum/Solidity contracts (which you can find on our Github), reviewed and developed several Polkadot/Substrate code (see our blog) since the earliest versions of Polkadot, and still work with Substrate now (you can read our article about the security of Substrate pallets). In this article, we aim to share our knowledge about both ecosystems.

We will discuss smart-contract development patterns in seemingly totally different environments: Ethereum/EVM and Polkadot's Substrate/WASM. It's interesting because these environments appear to have many differences at first glance, including different languages and virtual machines. However, under the hood, they have many similarities caused by strict requirements stemming from the common public blockchains design and security restrictions. Let's dive a little deeper.
Common blockchain limitations
No matter which blockchain you use, they all have the same restrictions, which require strict determinism of execution. Any function that performs a state transition must do so deterministically, meaning that given the same blockchain state and inputs, the function must return the same changes in the state. Therefore, the internals behind the layout of storage variables, arithmetic operations, etc., are similar in any good and secure the virtual machine in blockchains.

Another strict requirement for blockchains is the restricted complexity of any state transition. The Turing-completeness of Solidity and Rust is related only to syntax constructions. At the same time, any user-triggered execution in blockchains is strictly bounded by computation units ("gas" in Ethereum and "weight" in Substrate). Without such restrictions, the blockchain network is vulnerable to DoS attacks, in which users can make blockchain nodes perform "too heavy" code, making the entire blockchain get stuck.

At the current moment, the main patterns of development of smart-contracts are very similar, regardless of the language and virtual machine you use. It's always compact, restricted, user-triggered functions, trying to have the minimal state, no intermediate data storing, mostly based on key-value operations.
WASM/Rust (Substrate) vs. EVM/Solidity (Ethereum)
Both combinations have their advantages and drawbacks. Other combinations: Rust+EVM (example: SputnikVM) and Solidity+WASM (example: EIP48) are also used in different blockchain implementations. Rust is a language perfect for smart contracts in any blockchain, offering strict types, memory management, very deterministic behavior, and many other important low-level features. On the other hand, Solidity is precisely designed for the EVM and has evolved to be a very strict, effective, and safe language.

WASM has many possibilities, supported by numerous languages, enabling super-flexible functionality, while the more minimalistic EVM has already suffered from this "super-flexible functionality" in many hacks and mitigated many attack vectors.

An important point about WASM is that WASM in blockchains is not a "full" WASM that can be used in your browser; it is restricted to avoid non-deterministic behavior. All blockchains, using WASM, use their own compilers, add resource counters, so, in fact, only part of WASM is actually used in smart-contracts, making blockchain WASM one step closer to the EVM.

Let's see some examples:
Example of Substrate Rust/WASM code (some parts removed)
Example of Solidity/EVM code (some parts removed)
While Solidity code looks like much more simpler, any Solidity dev knows what's hiding behind the simplicity of balances[to] += amount - it's the map operation, including hashing, storage slot calculation, overflow checking in "+=". All these things are obligatory explicit in Substrate.

In Rust, the syntax and compiler dictate explicit behavior. As an example: in Rust any call result should be handled (if call returns some value). While in Solidity you can write transfer(from, to, amount); even if the "transfer()" function returns a result (such unchecked transfers leads to many security bugs in DeFi). In Rust you are obliged to handle the result, handle possible errors and this makes attack vectors like "unchecked CALL return value"(SWC-104) mitigated by design.

The important Substrate code difference in the presented functions is the line:

let sender = ensure_signed(_origin)?

It checks the signature of the transaction and is used almost everywhere, while in Solidity/EVM this is done by design. If you have a writable call to EVM, then the signature of the transaction sender is already checked, there are no non-signed calls in Ethereum (instead of "contract-calls-contract"). In Substrate you can make functions that accept calls from users without the direct single signature, allowing parachains to support multiple signature schemes, different algorithms, or restrict calls with other ways. At the same time, it could potentially open up many access control attack vectors.

Both EVM and Substrate's WASM have events with the same role in blockchains. Also, the underlying data structures, insertion algorithms in both virtual machines are similar, because the result of applying the transaction in both blockchain is the set of changed keys-values in the state database. Let's proceed to the storage.
Storage values
Storage operations in Solidity/EVM and Rust/WASM environments:
Example Substrate storage vars declaration:
Example Solidity storage vars declaration:
The important similarity between the data structures behind any blockchain storage vars (especially dynamic types: mappings, arrays) is value addressing using hashes. Mappings are the most used datatype in blockchains and are implemented similarly in both ecosystems. Building storage keys in the state trie is similar. In EVM, the storage key is built from "contract address->storage_slot->hashes of keys", while in Substrate "module->storagename->hashes of keys" is used. In both environments Merkle Patricia Tree is used to store values under the hood as it is needed to have efficient access, using prefixes.

Another important similarity is dynamic arrays. In blockchain storage, dynamic arrays with non-fixed size don't differ much from mappings, while it's common for usual non-blockchain code. The same determinism restrictions require that arrays in both EVM and Substrate were built internally as mappings, where indices in array are the keys. That's why ordered, easy mutable, iterable arrays and maps are not common in both EVM and Substrate. You can create an IterableStorageMap in Substrate, but it's better to avoid it.

One interesting difference is that in Substrate's mapping, you can choose the hashing function used to build a path to the value under some key. This can be done to improve performance at the cost of cryptographic security, if it is affordable. In contrast, EVM uses only the keccak256() function.

The important difference between Substrate's runtime storage operations and EVM is "transactional layers". This is controlled by macro #[transactional] and #[without_transactional] (docs). In EVM, in case of reverting the transaction, all storage changes will be reverted, while in Substrate you can:

  • add #[without_transactional] macro and any changes to storage will be committed and saved, even in case of error
  • add #[transactional] macro, adding another "transactional" layer, making some pack of storage operations atomic (apply all or nothing), making runtime storage handling more flexible (in new versions of substrate "transactional" is default behavior)

Now, #[transactional] is turned "on" for any extrinsic in Substrate by default.

For a more detailed explanation of Substrate's storage operations, you can refer to these slides which cover many points about storage operations in Substrate. Additionally, a good article on Ethereum data structures can be found here.
Contracts/modules composition
The very large difference between Substrate and EVM is the composition of contracts/modules and applying updates in their code. Ethereum contracts have a more "dispersed" structure, with many independent contracts deployed separately. In Substrate, we have one monolithic runtime composed of multiple modules assembled into a single bytecode that can be changed only by the quorum of blockchain validators. However, modules in Substrate have their own isolated storage, so from a security standpoint, the storage composition in both environments looks somewhat similar.

Code updates in Substrate and Ethereum are both similar and different. In the case of Substrate, validators must agree to vote for changes, whereas Ethereum contracts are immutable by default. However, in practice, large Ethereum projects often implement contract upgradeability, sometimes driven by stakeholders' voting, bringing the situation closer to that of Substrate. Additionally, until SELFDESTRUCT is removed from Ethereum, bytecode at certain ETH addresses can be mutated using a CREATE2->CREATE->SELFDESTRUCT->SELFDESTRUCT chain(an article on this topic can be found here). Therefore, the ability to change smart-contract code currently exists in both environments.

Ethereum deployment and updates in code/storage are more complicated and "fractioned," but they can only break their projects without affecting the entire network. Substrate validators can fix almost everything, but if the validators' quorum works poorly and is unable to react to hacks and bugs, the cost of errors is incredibly high and may affect the entire network.

It may seem that in Substrate, developers are restricted from integrating with external projects because they need to implement all side projects in their network's runtime. However, Polkadot offers a key feature - cross-chain messaging (XCM) - which natively works in any parachain that wants to use it. XCM is a built-in "messaging bridge" between parachains that allows sending messages from one parachain to another using the main relay chain consensus. It enables accounts in one parachain to interact with accounts in other parachains, send assets, or implement more complicated logic (we tested it on the first XCM versions, and it worked well). Currently, XCM is actively used in Polkadot.
Gas, weight and transaction fees
The most significant difference between transaction fees/weights in Ethereum and Substrate is that in Ethereum, you cannot easily and directly control how much gas will be spent by the call, while in Substrate, the analog of gas, called "weight," is partially controlled by the developer. This concept is described in more detail here (be sure to check out the example from the Acala parachain, where different functions are "weighted" using constant coefficients and amounts of database reads/writes). In Substrate, a developer has much more control on transaction costs, and network validators can change user fees if something bad happens (or initiate this "bad happens").

While it may appear to be a significant difference, the truth is that Ethereum's most expensive operations involve SLOAD/SSTORE which are state database reads/writes and make up the majority of gas used in a transaction. Similarly, weight calculations in Substrate, which comprise RocksDbWeight::get().reads() and RocksDbWeight::get().writes() follow a comparable approach. In essence, both environments handle gas/weight in a similar manner.

Now, let's take a look how the final transaction fee is calculated:
In Ethereum

inclusion_fee = gas_spent * ( base_fee + priority_fee)
(explained here)
In Substrate

inclusion_fee = base_fee + length_fee + [targeted_fee_adjustment * weight_fee];
final_fee = inclusion_fee + tip;
(explained here)
Although it may seem different at first glance, after EIP1559 in Ethereum, both blockchains became more similar in the question of transaction fees. In Ethereum, "length_fee" is already included in the gas spent, so the scheme with two separate parts of the fee is similar:

  • base_fee
    • obligatory fee, calculated, based on the congestion of the network (moving up when the network is overloaded and down when underloaded)
  • priority_fee/tips:
    • optional fee for block producers, raising chances to move the transaction up in inclusion queue

While the calculations of "gas" and "weight" are very different because two different instruction sets and VMs are used, the "resources counters" used to "weight" each instruction are similar. All four types of resources (CPU, memory, storage, network/bandwidth) must be considered to keep the network secure.
Packed calls
There is another important difference between Substrate's runtime and Ethereum smart-contracts. It's an ability to "pack" several calls to different contracts into one, atomic transaction. For Ethereum it's a very common pattern, there are many cases, when user deploys his own contract, that performs several DeFi operations in one call. The best example - usage of flashloans. In Ethereum a user's contract, for example:

  • calls some contract, "flashloan giver"(for example AAve Lending pool (contract))
    • passing callback data along with the call
  • AAve contract gives the tokens to user's contract address and calls the callback
  • user's contract use received tokens to:
    • perform one call to some other contract (DEX, lending protocol)
    • perform second call to some other contract (DEX, lending protocol)
  • user's contract returns his flashloaned tokens back to the AAve contract
  • AAve contract checks if all tokens was returned
    • reverts the whole transaction if something is wrong

All these steps are packed in one atomic call. In Substrate such a construction doesn't work. There is a possibility to use utility.batchAll to pack several extrinsics in one sequential and atomic call, but the execution of these extrinsics(calls) will be isolated. This removes many attack vectors that are common in Ethereum, but at the same time, it does not allow for constructions like flashloans, which enable market makers to have super-flexible trading solutions. With flashloans a trader can start with a single token on his balance (for example WETH), borrow almost any token only for the time of the single transaction, perform the trade and return his funds back to WETH, adding also the "slippage" check (transaction is profitable, of reverted).

While Substrate doesn't have native packed calls design like in Ethereum, Substrate runtime has a possibility to run EVM inside, allowing almost full duplication of EVM functionality. Substrate has a public pallet_evm, which fully emulates EVM using SputnikVM, but there can be much more native support of EVM. An excellent example is Moonbeam.network, which allows users to interact with Moonbeam using the same wallets, client-side software, the format of addresses, deployment logic, and more, as in Ethereum. This network has XCM ability, enabling interaction with other parachains.

It is critical to mention the word "almost" in "almost full duplication of EVM functionality" while considering the security between EVM and lower-level implementations. Applying changes on two levels may lead to unsynchronized data, and this part of the code requires careful inspection. These problems typically arise in patterns of contract deployment, destruction, operations with account nonces, and relationships between native blockchain tokens/functions and their representation on the EVM side. It's related not only to "EVM-on-[your virtual machine]"", but also to L1 <-> L2 interactions between L1 and L2 blockchains. Good examples of such problems: Optimism Infinite Money Duplication Bugfix or Avalanche precompile bug.
Conclusion
There cannot be a simple "better/worse" comparison between Ethereum-based and Substrate-based networks. Both environments have numerous large and small differences that work better or worse in different cases. Substrate allows super-flexible design of the blockchains, with complicated computations and rich control over the on-chain logic on many layers without losing security. Conversely, Ethereum allows for the creation of massively decentralized projects on top of a highly secure network, with many ready-to-use solutions and best practices that have survived over the years.

For a skilled smart-contract developer in either ecosystem, switching to the other is not a significant issue because many approaches are very similar, even if the tooling is very different. The primary data structures work in a similar way, and all important properties of the code are the same. Therefore, don't miss out on parallel technologies as they may enable you to build something great.
Related articles
Who is MixBytes?
MixBytes is a team of expert blockchain auditors and security researchers specializing in providing comprehensive smart contract audits and technical advisory services for EVM-compatible and Substrate-based projects. Join us on Twitter to stay up-to-date with the latest industry trends and insights.
Disclaimer
The information contained in this Website is for educational and informational purposes only and shall not be understood or construed as financial or investment advice.
Other posts