Contact Us

Hire Team Stars

Hire Team Reality

Hire Dream Team

Audit Application Form

Author: Sergey Boogerwooger

Security researcher at MixBytes

Security researcher at MixBytes

Introduction

We will analyze different parts of the protocol and we'll do our best to show where an auditor's gaze should stop and which parts are security critical. Tornado Cash was chosen as an example because its codebase became the basis for many other projects that use "set-membership" proofs and it has fully proven itself as a real-world ZK application. Also, Tornado is not too large, so it can be observed in the article, and it is not too small, using multiple important SNARK-friendly primitives.

Contracts logic audit

Also, an important thing is the usage of the nullifierHash instead of a "pure" nullifier. It's made to mitigate the griefing of the withdrawer, when an attacker sees user's nullifier, frontruns the withdrawal, creates their own deposit with the same nullifier, withdraws it and so, denies the withdrawal of the original user.

You can read about the whole algorithm in many existing articles, we don't need to dive deeply into it, our goal is to concentrate on the ZK-related part.

Analyzing the ZK-related project structure

How ZK proofs are used

The main potential attack vectors to be considered are:

forging the incorrect proof

in this case an attacker can use some "wrong" values (i.e. values outside allowed ranges, zero values) producing a correct proof. These values later can be used after verification by the protocol

the reuse of existing proofs

in this case, an attacker can take a previously presented proof, change the values that are passed as public inputs keeping this proof valid, and re-submit the proof with the changed values

zero-knowledge abuse

in this case information from the public values or the proof can be used to recover private values and disclose them or to make some suggestions about their contents

Keeping in mind these vectors, we need to determine the values used as inputs and outputs of the proof system and analyze these vectors for each of them. To be more specific, for each input value(s) and proof(s) we need to check:

forging

the value can be generated by a malicious attacker in a way, allowing to generate a valid proof

replay

the same value or proof can be reused a second time, leading to a critical action to be performed

zero-knowledge abuse

the private value can be reconstructed or some of its critical properties can be suggested using public data

With all above, let's move to a higher-level flow analysis.

Analyzing the high-level flow

Deposit() accepts a single bytes32 _commitment value. This commitment is added to the Merkle Tree, its value is stored in the Deposit emitted event, and this commitment is also stored in the contract to avoid duplicate use of the same commitment. The fixed amount of ETH is accepted with the call and added to the contract balance.

The commitment values in events are used by users to reconstruct the Merkle tree, containing their commitment later. Let's remember this fact because these are the values that are somehow controlled by the attacker (any user can add any commitment to the tree).

Withdraw() is the most critical part of logic where ZK proof is used and as the only part of the code that transfers ETH outside, it's the main entry point of possible attacks. We see input parameters:

```
function withdraw(
bytes calldata _proof,
bytes32 _root,
bytes32 _nullifierHash,
address payable _recipient,
address payable _relayer,
uint256 _fee,
uint256 _refund
) external payable nonReentrant {
```

The first three parameters are the most important for us: _proof, _root, _nullifierHash - it's the ZK proof, that proves that the user knows the secret value that is being concatenated with the nullifier and hashed, becomes one of the leaves of the Merkle tree.

The _proof allows to verify if this secret leaf belongs to our Merkle tree but not only this fact. This proof should NOT be reusable with other withdraw parameters because in case when, for example, _recipient is not included in the proof (the proof is valid for different _recipient values), an attacker can steal the proof and use it with their own _recipient address. Let's keep it in mind.

Now, three Solidity checks here:

```
require(_fee <= denomination, "Fee exceeds transfer value");
require(!nullifierHashes[_nullifierHash], "The note has been already spent");
require(isKnownRoot(_root), "Cannot find your merkle root"); // Make sure to use a recent one
```

Another important thing to mention around this check is that arithmetic overflows are handled differently in EVM and inside the circuit (because values are processed over different fields), and some value, that doesn't lead to an overflow in EVM, overflows inside the circuit (or vice versa). So, the range check of the _fee parameter is important here not only to protect the Solidity logic.

The second one is the _nullifierHash check which is used to avoid secondary use of the same commitment. Let me recall how the nullifier is used: a user concatenates secret and nullifier and hashes it using the Pedersen hashing, receiving the commitment. The code from the test is here:

```
function generateDeposit() {
let deposit = {
secret: rbigint(31),
nullifier: rbigint(31),
}
const preimage = Buffer.concat([deposit.nullifier.leInt2Buff(31), deposit.secret.leInt2Buff(31)])
deposit.commitment = pedersenHash(preimage)
return deposit
}
```

The last require(isKnownRoot(_root) check is related to a Tornado's logic of keeping several Merkle roots in a contract, creating a new Merkle root with each leaf added and tracking these roots in a list having the ROOT_HISTORY_SIZE size. This allows a user to create a proof for

Here we see a low severity problem: the ability of an attacker to continuously create new Merkle roots with a speed that doesn't allow users to use "fresh" Merkle roots because they are all already outdated, blocking users from withdrawing (it's expensive for the attacker and doesn't prevent the user from generating new proofs, so it's a minor problem).

Another point to mention is the ability for an arbitrary user to add a new Merkle root and somehow control it. No ideas on this stage, but maybe it can be used in future, let's remember it as well.

The next important part of the code:

```
verifier.verifyProof(
_proof,
[
uint256(_root),
uint256(_nullifierHash),
uint256(_recipient),
uint256(_relayer),
_fee,
_refund
]
)
```

In the parameters we see all the values provided by the user, so for all values: _root, _nullifierHash, _recipient, _relayer, _fee, _refund we need to recall the main vectors of ZKP attacks: forge, replay and zero-knowledge abuse. To accomplish it, we need to move to the circuit analysis.

The next Solidity part of the function is not in our audit scope, instead of saving the used _nullifierHash - this check is important. The withdraw logic and other Solidity parts can be checked by yourselves. Let's move to the ZKP part.

Analyzing the ZKP part

Circuit constraints

From the main Circom circuit template, we receive the whole pack of the inputs:

```
signal input root;
signal input nullifierHash;
signal input recipient; // not taking part in any computations
signal input relayer; // not taking part in any computations
signal input fee; // not taking part in any computations
signal input refund; // not taking part in any computations
signal private input nullifier;
signal private input secret;
signal private input pathElements[levels];
signal private input pathIndices[levels];
```

As shown in the picture above, EVERY input was constrained in the circuit, even the pack of the recipient, relayer, fee, refund variables - they are specially constrained using multiplication (lines 61:64) to avoid their exclusion from the circuit by the Circom compiler (actually, it's not needed now, Circom will add the constraints for the input signal automatically, but here it requires only 4 constraints, which is a negligibly small amount). Now, the proof is "derived" from ALL input signals, and any change in any of them breaks the proof validity.

Circuit processing

We have already checked that all inputs are constrained and each input has a corresponding constraint. Next, we see that nullifier and secret are split into 248-bits by the Num2Bits circuit. Here, we check that ALL the bits of input values are used, none is missed, for example, if the hashing of nullifier uses only 247 bits (not 248), then two different nullifiers will lead to the same commitment hash in the circuit that can lead to repeatable use of the proof with another nullifier). Sometimes developers want to reduce the amount of constraints and make proving (proof generation and size) lighter by reducing the bit lengths of some values, leading to security problems.

- a problem with "non-unique x-coordinate" from the article above is present here (resulting commitment and nullifierHash outputs use only coordinate "x" here, but this particular Pedersen implementation is not susceptible to this issue).
- a "variable-length input" problem of collisions of the preimages with different bit lengths or partial bit lengths less than the curve order. It's also not present here because all inputs have fixed-size encoding.

The next point to be mentioned is the homomorphism of the Pedersen hash, meaning that (simplifying): H(M1 + M2) = H(M1) * H(M2), so having a "previous" Pedersen hash from some secret value s , one can compute H(s + <anything>) as H(s) * H(<anything>) not even knowing s. In our case it's not applicable as the secret and nullifier are used only once, but when Pedersen hashing is used multiple times, we should always consider this case.

Another part of the circuit is the Merkle proof check. Here is the code snippet:

One more issue with Merkle trees is

In Tornado's implementation the "append-only" Merkle tree is used, and each new leaf is added from left to right. So, different scenarios with inclusion/exclusion of the leaves are not applicable here (like repeatable insert/remove/insert/remove scenarios), but you should remember to check them when using more complicated logic with a Merkle tree.

Still another point to check is the size of the tree, here it's 20 levels (here) which is enough, as in case of overflow, new leaves will not be added (checked here) and a new contract with the same logic can be easily deployed.

Inputs forgeability, reuse and zero-knowledge abuse

root

This is the Merkle root of the tree, implemented in this circuit. We cannot forge this root according to this require(isKnownRoot(_root), "Cannot find your merkle root");. The reuse of the same root does make sense only with another nullifier, so there is no problem here. The root is public that's why we don't have a ZK problem here. Passed.

nullifierHash

This is the Pedersen hash of nullifier + secret. We have already discussed Pedersen hashing above, and, since the nullifierHash and secret nullifier and secret inputs are tied and checked by the circuit, we can say it's safe against the forgeability of the nullifierHash. The replayability issues are mitigated by the require(!nullifierHashes[_nullifierHash], "..."); check in the contract here. The zero-knowledge property here is a little tricky because the Pedersen hash doesn't generate fully pseudorandom values and in some cases some suggestions about its preimage can be made, but in our case the secure generation of secret and nullifier mitigates this issue. Passed.

recipient, relayer, fee, refund

These values are external parameters used mostly by the contracts. We have already checked that they are included in the proof, so they cannot be replaced by other values using the same proof. Their replay is possible simply by design and they are public, so no zero-knowledge compromise is present. Passed.

nullifier + secret

Forging these parameters, allowing the same deposit to be reused twice or more, represents the main attack vector against Tornado. The goal of an attacker here is to submit a secondary nullifierHash "derived" from the single deposit (a single pair of secret and nullifier). If the Pedersen hash is checked in the contract, using a different Pedersen implementation (other than the one used in Circom), there could be some issues when the same values lead to different hashes in two implementations, but here the commitment hashing is checked in the circuit, making this attack vector inapplicable. Another forge/replay attack vector is possible when the secret + nullifier encoding somehow uses different bit lengths, allowing situations like (secre[t] + nullifier) == (secre + [t]nullifier), but here this situation is also not applicable. Zero-knowledge abuse cannot be exploited here as well due to the use of Pedersen hashing with enough bits of security and a suitably secure elliptic curve. Passed.

pathElements[levels] + pathIndices[levels]

These values are the Merkle proof of the commitment inclusion. The Merkle tree topics have been discussed above. The amount of levels is a circuit template parameter, so we cannot play with the size of the Merkle proof and secondary preimage attacks. Now, zero-knowledge property: in case when a single, "first in the tree" deposit was performed and a single withdrawal has followed, we, of course, can determine which leaf was used and deanonymize the depositor. Tornado works only when multiple deposits from different users were made. So, it's not possible to deploy "my own Tornado pool for myself", it will face this problem. The anonymity here is provided by many independent users, and when developing solutions based on Tornado scheme, you should always consider this issue. This is a good example of how the security of an underlying ZK protocol can become nonsense if the protocol is used incorrectly.

Final checklist-based analysis

ZK proof reuse with other parameters

already checked above for all parameters, mostly mitigated by nullifierHash. This part can also be checked automatically by sequentially changing values in the witness and checking if the proof is still valid

Parameters, not included in the proof, leading to proof reuse

checked in the circuit part, all parameters are constrained

The sender disclosed by presenting the proof

covered by design, all is good if enough independent deposits have been made

Non-pseudorandom properties of SNARK-friendly hashes (Pedersen, MiMC)

pseudorandom properties of used hashings is not used here, not applicable

Pedersen, homomorphic property, bit lengths, other specific issues

discussed above, checked

MiMC, rounds, enough security

(ref) discussed above

Elliptic curves, issues with the "infinity" and (0,0) points

not applicable here

Merkle tree, zero-leaves, second preimage, multiple inclusin/exclusion issues

discussed above, checked

Field arithmetics, overflows

not applicable here

Groth16 issues, like malleability of proofs

(ref), not applicable here due to nullifierHash

Fiat-Shamir heuristic issues

(ref) this part is related to the underlying proof system. Here, it requires not only the analysis of Tornado Cash circuits, but also the implementation of the proof system. It should be perfomed in a full-scale audit, but it is not the case for our article

Trusted setup procedure

this is an important part of the audit if the proof system with the trusted setup is used. This setup procedure should also be checked by the auditor (there are many articles about this procedure)

…

There are many subjects in the checklist, we will not present them all, but you should definitely have your own checklist and keep it up-to-date as fresh security issues in ZK come up often and you can't miss them.

Conclusion

Moreover, this area is evolving rapidly, new proof systems come out every few months, and sometimes the analysis results become obsolete faster than the whole analysis gets ready - that's one of the reasons for the low number of qualified official ZK audit reports. Teams simply switch to new algorithms, change proof systems, significantly reconsider their internal logic. So, in order to be competitive in this market, you should keep up with many topics in this area, which makes this field extremely attractive for any highly skilled specialist who wants to work at the cutting edge of modern computer science and security. Good luck!

Who is MixBytes?

Disclaimer

Other posts