Ensure the security of your smart contracts

zkSNARKS, Circom
(Part 2)

Author: Sergey Boogerwooger
Security researcher at MixBytes
Intro
In the previous article we have stopped at the sequental hashing allowing us to build Merkle proofs to prove membership of some private value in the list with the given Merkle root or to prove the knowledge of the hash preimage. We tried two hashing functions: Keccak-256 and MiMC and played around with circuit sizes, witness generation complexity and size of the proofs. Now, let’s take a look at other hashing functions and then examine their usage for building the membership proofs.
MiMC, Pedersen, Poseidon and other SNARK-friends
As we understood from the latest experiment, where we replaced Keccak hashing with MiMC, there are hashing algorithms with design that better fits the circuits. But you should remember that their usage differs from our common hashing (Keccak, SHA256). The most effective for finite field operations (used in circuits) are the algorithms using the operations over finite fileds, so we can pass an input signal directly to the hashing function in the circuit (without “splitting” it into separate bits). But there are many differences: safe number of rounds, collision resistance, specific atacks, and it’s good to study them more thoroughly than it’s presented in this paper. Also, you should remember this area is very young, lots of papers are really “fresh”, so we make no claims to completeness.

It’s useful to study this work about how many bits, rounds for different algorithms grant enough level of security. Note that “the amount of bits” in security for algorithms over finite fields doesn’t equal the bit length of the hash because binary presentaion of a number doesn’t have direct “1-to-1” mapping to the point on elliptic curve, the amount of “binary” points is less than the amount of possible binary values. So, in SNARK-friendly algorithms we need to estimate a “corresponding security” (like “this 256bit hash has a 128bit security”). We cannot directly tell you which level of security you should choose, in addition, all these algorithms can be used differently for different purposes, and you need to handle their security level on your own. But we’ll try to present some information about their wrong usage in some cases and give the links to interesting docs.
MiMC
MiMC specs are here. MiMC was designed to have a minimal amount of multiplications generating very compact curcuits. In our previous example we used MiMC implementation from circlomib, it uses MiMC variant with an exponent of 7 and we used 91 rounds in code. As I found in the discussion about hashing here: MiMC/e7r91, 448bits of input, 730 constraints (targeting 256bit security level) and MiMC/e7r46, 448bits of input, 370 constraints (targeting 128bit security level), it seems that even 46 rounds (128bit of security) can be enough for many tasks.

The MiMC round function is the base for more complicated algorithms, the more significant now is GMiMC which uses the MiMC round function in the Festel-based construction, improving the algorithm security. We will meet MiMC as a part of other algorithms and in the tiniest circuit implementations.
Pedersen hash
Pedersen hash (article) is also a popular algorithm for circuits. Pedersen hash returns a point on the curve (Baby JubJub is used in circomlib’s pedersen.circom). As MiMC it also allows to skip Num2Bits<->Bits2Num transformations, reducing the constraints amount by using signals as the inputs.

Pedersen hash has some specific properties that makes it extremely useful to solve some problems like homomorphic addition. It’s a very important feature because you can make a “commitment of a”: present a Pedersen hash of some value a, then make a “commitment of b” and multiply them receiving the commitment(a + b) == commitment(a) * commitment(b) (simplified here, because additional randoms are used in this scheme). A good short explanation of Pedersen commitments is given here. Such a “hidden” addition lets us solve some security problems: for example in zk-votings (summarizing votes without disclosure) or operations with private token balances.

These features are not “free”, we have to “to pay” for them with some security problems - for example, Pedersen hash is collision-safe enough only for fixed-size preimages. It means that in real usage, if you can pass preimages vith variable bitlength or use some padding, it can be dangerous.

Another point is that Pedersen has a “natural collision” with two symmetric points: (x, y) and (x, -y) - that also can be dangerous if you use only the x coordinate as the result of hashing. More detailed explanations of such problems can be found here.

You can experiment with pedersenn.circom in our edu repo by running run_all.sh pedersenn with 256 sequental hashings, but it cannot be directly compared with MiMC by the amount of constraints becuse these hashes are used differently. In MiMC we had 91 rounds, in Pedersen we don’t have “rounds”, but have n inputs. It can be used to build Merkle proofs, but you should always mention the security properties of this algorithm. We will meet Pedersen in cases requiring miltiple inputs and in homomorphic schemes.
Poseidon
The Poseidon (and also its neighbor STARKAD) algorithm improves MiMC design with methods inspired by both the AES (S-blocks) and Keccak (sponge) constructions maintaining security against statistical as well as algebraic attacks. They both use HADES permutations inside and are very efficient in circuits due to a low amount of multiplications. You can find the Poseidon description here), while an interesting analysis of both HADES-based algorithms is given here.

We will meet Poseidon in many modern curcuits thanks to its good balance between security and the amount of curcuit constraints. IMHO, now, in May, 2023, it’s the most popular algorithm for a lot of good zk-cases where the collision-resistant, secure and SNARKs-friendly algorithm is required.
Hashing - conclusion
As you maybe mentioned in the latest article, there are also other SNARK/STARK-friendly hashing algorithms like Marvellous (with Vision and Rescue permutations) and they are also promising.

In some hashing algorithms, you may have mentioned the “sponge” pattern (like the one in Keccak). Such algorithms are so good because of the possibility to use this pattern not only for hashing. With a simple change of the order and contents of incoming data stream, these algorithms can be transformed into other cryptographic primitives: stream cyphers, pseudorandom generators, MAC(Message Authenticaton Codes), used for key derivation, etc.

Unfortunately, we cannot stop on all of these nuances, it’s the work of the academic cryptographers, and it will make the article rather unpractical, but I highly recommend reading at least the basic articles. You should definitely know them to choose correct algorithms for your tasks.
Let’s take a look at the usage of these algorithms in real tasks.
Set membership proofs, Tornado
Now, having hashing in our toolbox, let’s prove some “set membership”. One of good articles describing practicals with set membership is this one demonstrating a process implemented in Tornado Cash. Let’s also examine some useful pieces of code from Tornado tests.
A user generates a commitment for their deposit
  • the commitment for the deposit is pedersenHash(secret, nullifier).

The user sends 1 ETH with the commitment to the contract here
  • the commitment is added to a Merkle tree in the contract, implemented here

Now, the user wants to prove their knowledge of the nullifier to “nullify” their commitment and take 1 ETH back
  • the user prepares the input for the prover (containing nullifier and secret) here
  • the user generates the proof here

The user sends the proof to the withdraw() function of the contract here
  • the contract checks that this nullifier hash was not used previously here
  • the contract verifies the proof using (the verifyProof() function)
  • the contract performs the withrdawal (with fee and fixation of “spent” nullifier)
Now, let’s take a look at the proving circuit here. The main private signals are: nullifier, secret, pathElements[levels] and pathIndices[levels]. The last two parameters are the Merkle proof that is checked in this part of the circuit (same sequental hashings that were in our previous examples).

There is a security-important part in this circuit here where we see strange “unused” internal signals like recipientSquare <== recipient * recipient. The cause is the fact that the recipient, fee, relayer and refund input signals are not used in any calculation inside the circuit and not added to any constraint. Without it, input signals recipient, fee, relayer and refund will not be used in proof generation. It will lead to a situation when we have a single witness (without these values inside) for multiple possible inputs and can use the same witness with any values of recipient, fee, relayer and refund allowing client-side attacks with a reuse of the witness.

This addition includes only 4 constraints and makes the witness “bound” to the input values mitigating the tampering of unused values. Squares are used to guarantee that the constraints will be added to the circuit (in case of simple internal recipientSignal <== recipient optimizer can exclude this useless constraint).

In this version of Tornado, we see the “append only” Merkle tree where after each commitment a new Merkle root is generated while the previous root goes backward in the roots history. In order to build the proof, you need all the leaves of the tree; in Tornado they can be reconstructed from all the past events in the contract (see here). Every owner of a tree leaf can reconstruct the whole tree from the history and return their deposit. It seems inefficient for large amounts of leaves, but you can deploy any number of new verifier contract instances and switch to a new instance if the tree becomes too large, solving the problems with the tree capacity. The most important is the fact that the instances are trustless and can be easily deployed in any quantities. The idea to build a mixer on set-membership proofs was on the surface because “payment for inclusion/exclusion” is a “natural” way for most blockchain applications, and Tornado fully proved that it does work in production.

If you only change the numbers and namings, Tornado can be simply converted to a tickets-selling service, preserving visitors personal data, or non-interactive one-time login, this scheme is very practical. Let’s move to the next examples.
Set-membership proofs, Semaphore
Another example of set-membership usage is the Semaphore project allowing to prove groups membership and ability for group members to cast signals (i.e. votes) privately in the group. The process is managed by clear and minimalistic smart-contracts that are easy to read. You can read the whole contract logic related to groups here, where we find the function verifyProof() - very similar to previous verifications, fixating the nullifier (to prevent its reuse) as well. Semaphore also uses the incremental Merkle tree implemented in this npm package where you can take the implementation containing some features useful for development.

The first one is that you can pass your own hashing function for the tree being created here. It’s useful when you want to try different hashing types for your circuits. We see that Semaphore uses the Poseidon hash for its trees (here).

The second feature of this tree is the ablilty to set arity - the amount of children of each node. This can be needed to build Merkle trees having, for example, five children for each node (quinary Merkle tree). It has a reduced amount of levels, helping to make faster proofs generation, smaller proof sizes and a larger capacity (trading it for gas consumption). In this article, you can learn about the comparison of MiMC, Pedersen and Poseidon in terms of gas consumption, capacity of the tree, and effectiveness of proof generation.

Then, let’s have a look at Semaphore’s circuit here. It looks similar to Tornado, and we can also mention the same as Tornado’s part at the end preventing the tampering of signalHash.
Elliptic curves, keys ownership and signatures proofs
The second important part for almost every practical task with zkSNARKs is proving a private key ownership and signature validation. Any call to an Ethereum smart-contract already contains the signer of the transaction (tx.origin), and it’s not needed to prove the ownership of this address. But the core of ZK-based Verified Credendials protocols is the ability to publicly add some “hidden” address to a list and give its owner the ability to prove later they own this address. To accomplish this task, we need to have the proofs of knowledge of the private key corresponding to the given public key (or ETH address).

To be more practical, let’s study the particluar variant of proof system for Ethereum addresses. As we know, an ETH address is the last 20 bytes of keccak256(public_key), so we need to prove that we know the private signal (private key) that is transformed to the public output (ETH address). Ethereum uses the secp256k1 elliptic curve (same as Bitcoin), and, similar to the hashing in the circuits, this curve is not snark-friendly due to many multiplications, the field size, different operations with points on the curve. But in this case we don’t have a choice, we definitely need these proofs. Let’s see how it’s made in the same circom-ecdsa repo.

First, let’s prove that we own some ETH address. It can be made with eth_addr.circom. To start with, we see the conversion from the private key to a public key using the secp256k maths in the ECDSAPrivToPub circuit. In this circuit, we see the Num2Bits operations here and logic branches handling “is zero” conditions here. Then we see a conversion from the public key to the ETH addr with Keccak here. It seems like they will not be lightweight proofs, let’s check.

You can clone circom-ecdsa, analyse the build_eth_addr.sh script, mention that input_eth_addr.json contains our private key (split by 4x64bit blocks) and run

cd scripts/eth_addr
# copy or download powersOfTau28_hez_final_${POWERTAU}.ptau to ../../circuits/pot20_final.ptau
sh ./build_eth_addr.sh 
I had run this script with POWERTAU=21 (on a really powerful machine) and received:

...
non-linear constraints: 247380
...
****GENERATING WITNESS FOR SAMPLE INPUT****
DONE (9s)
...
****GENERATING PROOF FOR SAMPLE INPUT****
DONE (5s)
...
and the size of the proving key:

172M ../../build/eth_addr/eth_addr.zkey
You can find basic descriptions of implemented functions in the first article from 0xPARC (the creators of circom-ecdsa repo) here. Also, you can find the first benchmarks there and compare the first version of circut (the constraints amount is 416 883) and current values (the constraints amount is 247 380). Maths optimizations can help you a lot, so to build SNARKs, you need to be “up-to-date” with new approaches replacing your circuits with more optimized versions. For elliptic curves, there are many different optimizations. You can read about some of them in the “zk-ECDSA Part 2: Under the Hood” article.

Another important point learned from circom-ecdsa circuits is the difficulty of signature checking. As described in the article above, signature verification requires modular multiplication + inverse (as shown here), elliptic curve multiplication by a scalar and curve point addition (and Keccak to hash the message itself, of course). So, this circuit is much heavier than the “privatekey-to-pubkey” one and requires much more constraints.
As you see, the secp256k1 curve, its signatures and Keccak hashing are not very SNARK-friendly algorithms, and, even with all optimizations, the direct usage of Ethereum addresses, private keys and signatures is extremely heavy for the prover.

In order to avoid this bottleneck, SNARK researchers presented other algorithms, such as new elliptic curves requiring significantly less amounts of constraints allowing to implement more lightweight signature checking and key ownership proofs. This is a long story about different curves and their usage in circuits and these topics will move our article deep into theory, so let’s simply present at least one example of a SNARK-friendly optimization.

A good example is the Baby JubJub curve described in EIP-2494. It’s a “Twisted Edwards Curve”(article) allowing signing using the EdDSA algorithm with some useful properties. For example, it permits to use the same operations for points addition and doubling (BabyAdd and BabyDbl). It’s much simpler than the same operation on the secp256k1 curve (point addition for unequal points here and point doubling here).
Let’s take a look at what can be built with the “elliptic-native”, snark-friendly algorithms like Baby JubJub.
ZK-Voting
Voting is one of key applications of zero-knowledge protocols. There are many problems in it: privacy of the votes, verifiability of the results, resistance to votes buyout, etc. The whole transparency and verifiability of these protocols is a very complicated task which is not fully resolved at the moment. There are only really close approaches that come near the final solution, but the whole pack of problems is still uncovered. There were many attempts to implement a really usable secure voting using, for example, homomorphic schemes (simple article), but SNARKs-based schemes are much more flexible because they provide the means to combine multiple mechanics in a single proof. Our example will be the Vocdoni zk-voting proof-of-concept presented in the Aragon project blog. It’s highly recommended to read this article. In this case the team tried to reach:

  • “Identity and ballot obfuscation” - to obuscate and hide the identity of the voter, hiding the vote itself is also in the plans
  • “Coercion-resistance” - when the ballot is cast, then, after the reveal of the results, even the voter cannot prove, did he vote “yes” or “no”, only prove the fact that they voted and their vote was counted correctly
  • “Receipt-freeness” - to remove the connection “one ballot” = “one receipt(transaction)”. It is truly difficult to avoid the problem. Any relationship between the voter and their ballot should be broken, while keeping the voting results publicly verifiable. It’s planned to achieve with intermediate relays, “mixers” of votes combining multiple votes in one pack where the votes are indistinguishable.

Let’s move to the implementation. First - read about technical aspects of the implementations here. We see a lot of logic around: we need Ethereum storage proofs, pre-registration of voters, also there is a whole ecosystem around the voting (separate chain, aggregating votes, commiting votings in Ethereum). All this is needed to keep all security restrictions above, so be ready - the amount of background logic in your projects can also be significant. The most important part for us is circits and the proof system, so let’s concentrate on them.

First, we mention that checking Ethereum storage proofs cannot be efficiently done (we need to have the proofs with a Keccak256 hashing that are too heavy) and cannot directly use the secp256k1 curve signatures (they are also too heavy). That’s why the circuit uses the Baby JubJub curve, Pedersen hashes on this curve for signatures, and Poseidon hashes for commitments. To have the ability to utilize the non-secp256k1 curve, Vocdoni uses a contract holding the mapping from the ETH address to the corresponding Baby JubJub public key. This allows to setup all voters and their private Baby JubJub public keys before the voting.

In the second place: in the current scheme, when the voter sends the ballot, the vote itself is a public signal, but the voter’s identity is hidden. As it is said: “anyone can verify that any vote has been uniquely cast by an eligible voter”, but it’s impossible to reconstruct whose vote it was. It’s planned to hide the vote in the future but now the vote value itself is public.
Let’s see the input signals of proposed circuits (from here):

We provide THESE values to a witness generator:
public “Census Root” and public “ElectionID”
  • Merkle root of “used/unused” votes + electionID used to identify the current voting
private “Merkle Proof”
  • proving that the current vote is included into the Merkle tree
public “Vote” + private “Private key” and “Vote signature”
  • the signed vote being added to a Merkle tree (signed using Baby JubJub)
public “Nullifier”
  • value constructed as hash(Private key, ElectionID) allowing a user to track their vote and deny the voting twice

or THESE values
private “Reveal keys” with public “Commit keys”
  • “Commit keys” with their corresponding hashes that ALL will be revealed at the end of voting.
Note the last part with commit-reveal keys provided by the “miners” of voting. Any person knowing ALL Reveal keys can produce a valid proof for voting. If somebody had “bought” my vote and requires the proof that I have voted “yes”, anybody (including me) can provide the correct proof for “yes”, or “no”, or whatever, making the proving of the vote useless after the election. Of course, the vote can be bought by “buying the private key” and all private data, but it’s the same situation when the buyer votes himself. It’s also a weak point of the protocol beacuse if all miners collude to tamper the voting results and will reveal all the reveal keys to somebody, they can tamper the votes (but if even one of them is honest and holds the reveal key in private until the end of the election - they cannot do it).

You can use such “OR” proofs to make your proofs useless after their usage. In ZK the “cannot reveal private values” sometimes transforms to a strange “anybody can reveal and prove ANY values” :)

Here, we will study the proof-of-concept circuit without the part with “Commit-Reveal” inputs (only the voting).

I found the code of circuit here. Let’s take a look at important points:

Census circuit, input signals here. We see:
public signals: processId[2], censusRoot, nullifier, votingWeight and voteHash[2]
  • we show to the public: electionID, its Merkle root, the nullifier, our voting “power” and our vote
private signals: factoryWeight, censusSiblings[realNLevels], privateKey
  • we keep private the maximum amount of voting “power” available to us (factoryWeight), Merkle proof of inclusion of our nullifier into the Merkle tree (censusSiblings[realNLevels]) and the Baby JubJub private key
The next stop point in code is when we check that our votingWeight is less than factoryWeight and we encounter the sub-circuit LessEqThan. Inside the cicomlib’s comparators.circom it uses LessThan here that allows to make the so-called “range proofs”. We see Num2Bits transform containing at least “bits amount” of contstraints and signals because in order to compare two numbers on the curve, we cannot simply “sub” them since the result of addition on a curve can lead to a situation when c = a + b will be less (from our usual binary point of view) than a, or b, and there is no way other than analyse the bits. That’s why range proofs are really expensive in zkSNARKs circuits, they add many constraints, especially if they are used with different bitlengths, as a non-optimizable part of the circuits, etc. In the same time they are unavoidable for business logic and security purposes, there are no “overflows” in field arithmetics, and totally different values (in binary) can be “equal” in the field.

Next stop: the Baby JubJub private key -> public key transform is here and then hashing it with Poseidon is here. This simple part of the circuit shows how effective snark-friendly alorithms can be - Baby JubJub requires a small amount of constraints (thanks to optimized multiplications), Poseidon uses the signals without any bits transformations. All these choices allow to significantly reduce the witness size and proof generation time.

Next: the ZkAddress circuit is here (the code is above in the same file). Here, the Poseidon hash of the Baby JubJub public key is reduced to 20 bytes (to produce Vocodni address which is 20 bytes long). It requires the Num2Bits and Bits2Num sub-circuits adding another pack of contstraints (for inputs-outputs of bits<->num). There is no choice here, but if it’s possible, it’s better to avoid bits expansions and reductions (it can also sometimes lead to a lower cryptographic security level).

Next: the Merkle tree, and the verifier of a Merkle proof is here. Here, we work with a sparse Merkle tree - implemented here. The optimization trick of this tree is the design adding new values “from left to right” (a new value is added to the next “free” index) making it possible to have a “sparse” tree at the right side (most of right branches are empty and contain zeroes) while the “condensed” branches with values are at the left side. The description and discussion of these ideas are here (a short version is here). Now, with this primitive we can make the inclusion/exclusion (INSER/UPDATE/DELETE) proofs effectively. In the current circuit:

We check inclusion proof by setting smtClaimExists.fnc <== 0 here (smtClaimExists.fnc <== 1 for exclusion proof) and prove that the previous value in the tree was zero here.

The last: the check of the nullifier is here: it must be equal to the hash made from processId[2] and privateKey.
You can perform all steps to play around with the presented proof-of-concept, I simply compiled the census.circom circuit and recieved ~40k of constraints for 160 levels in the Merkle tree (compare it to the single Keccak hashing from the first article with ~150k of constraints). We used a private->public transform, verified the Merkle proof, hashed some intermediate signals - all this has been done only for 40k constraints saving the witness size and proof generation time for the users. Of course, to become a real application many things need to be added: the second part of circuit with commit-reveal keys, client-side sofrware, preparing proofs, management contracts - all of it is important for the voting system. But the core zk-part seems to be usable enough to become a good solution.

The description of the full voting cycle using Vocdoini was presented in this video and described here.
Conclusion
In this paper, we reviewed some SNARK-friendly algorithms, their usage in real projects and showed they can really help to build the projects working in real life. As I said before, this field is still under active research, and all used algorithms cannot be treated as 100% safe as the approved international standards like Keccak or secp256k1. That’s why the well-known blockchains don’t switch to these algorithms very fast; they wait until they become fully accepted by cryptographers. But if some blockchain’s internals are built completely on SNARK-friendly primitives and its addresses, keys, hashes can be directly passed to effective circuits, such a system can be widely used for any ZK purposes. There are already many projects addressing these problems, and we will discuss them in the following articles.

Also, we would like to show great respect to the academic researchers working in this area in universities and startups. It’s extremely important since we all need safe and effective algorithms. However, at the same time we understand that acceptance by the worldwide cryptography community cannot be fast. If you think about scientific work in cryptography, maybe you would like to join them - it’s real “top-notch” technologies in computer science.

In the next article, we will go deeper and take a look at other proving systems (we used Groth16 in most examples), STARKs, with different requirements for the setup, proving-verifying costs and also will study new use cases. See you…
Who is MixBytes?
MixBytes is a team of expert blockchain auditors and security researchers specializing in providing comprehensive smart contract audits and technical advisory services for EVM-compatible and Substrate-based projects. Join us on Twitter to stay up-to-date with the latest industry trends and insights.
Disclaimer
The information contained in this Website is for educational and informational purposes only and shall not be understood or construed as financial or investment advice.
Other posts