Contact Us

Hire Team Stars

Hire Team Reality

Hire Dream Team

Audit Application Form

(Part 2)

Author: Sergey Boogerwooger

Security researcher at MixBytes

Security researcher at MixBytes

Intro

MiMC, Pedersen, Poseidon and other SNARK-friends

It’s useful to study this work about how many bits, rounds for different algorithms grant enough level of security. Note that “the amount of bits” in security for algorithms over finite fields doesn’t equal the bit length of the hash because binary presentaion of a number doesn’t have direct “1-to-1” mapping to the point on elliptic curve, the amount of “binary” points is less than the amount of possible binary values. So, in SNARK-friendly algorithms we need to estimate a “corresponding security” (like “this 256bit hash has a 128bit security”). We cannot directly tell you which level of security you should choose, in addition, all these algorithms can be used differently for different purposes, and you need to handle their security level on your own. But we’ll try to present some information about their wrong usage in some cases and give the links to interesting docs.

MiMC

The MiMC round function is the base for more complicated algorithms, the more significant now is GMiMC which uses the MiMC round function in the Festel-based construction, improving the algorithm security. We will meet MiMC as a part of other algorithms and in the tiniest circuit implementations.

Pedersen hash

Pedersen hash has some specific properties that makes it extremely useful to solve some problems like homomorphic addition. It’s a very important feature because you can make a “commitment of a”: present a Pedersen hash of some value a, then make a “commitment of b” and multiply them receiving the commitment(a + b) == commitment(a) * commitment(b) (

These features are not “free”, we have to “to pay” for them with some security problems - for example, Pedersen hash is collision-safe enough only for fixed-size preimages. It means that in real usage, if you can pass preimages vith variable bitlength or use some padding, it can be dangerous.

Another point is that Pedersen has a “natural collision” with two symmetric points: (x, y) and (x, -y) - that also can be dangerous if you use only the x coordinate as the result of hashing. More detailed explanations of such problems can be found here.

You can experiment with pedersenn.circom in our edu repo by running run_all.sh pedersenn with 256 sequental hashings, but it cannot be directly compared with MiMC by the amount of constraints becuse these hashes are used differently. In MiMC we had 91 rounds, in Pedersen we don’t have “rounds”, but have n inputs. It can be used to build Merkle proofs, but you should always mention the security properties of this algorithm. We will meet Pedersen in cases requiring miltiple inputs and in homomorphic schemes.

Poseidon

We will meet Poseidon in many modern curcuits thanks to its good balance between security and the amount of curcuit constraints. IMHO, now, in May, 2023, it’s the most popular algorithm for a lot of good zk-cases where the collision-resistant, secure and SNARKs-friendly algorithm is required.

Hashing - conclusion

In some hashing algorithms, you may have mentioned the “sponge” pattern (like the one in Keccak). Such algorithms are so good because of the possibility to use this pattern not only for hashing. With a simple change of the order and contents of incoming data stream, these algorithms can be transformed into other cryptographic primitives: stream cyphers, pseudorandom generators, MAC(Message Authenticaton Codes), used for key derivation, etc.

Unfortunately, we cannot stop on all of these nuances, it’s the work of the academic cryptographers, and it will make the article rather unpractical, but I highly recommend reading at least the basic articles. You should definitely know them to choose correct algorithms for your tasks.

Let’s take a look at the usage of these algorithms in real tasks.

Set membership proofs, Tornado

- the commitment for the deposit is pedersenHash(secret, nullifier).

The user sends 1 ETH with the commitment to the contract here

- the commitment is added to a Merkle tree in the contract, implemented here

Now, the user wants to prove their knowledge of the nullifier to “nullify” their commitment and take 1 ETH back

- the user prepares the input for the prover (containing nullifier and secret) here
- the user generates the proof here

The user sends the proof to the withdraw() function of the contract here

There is a security-important part in this circuit here where we see strange “unused” internal signals like recipientSquare <== recipient * recipient. The cause is the fact that the recipient, fee, relayer and refund input signals are not used in any calculation inside the circuit and not added to any constraint. Without it, input signals recipient, fee, relayer and refund will not be used in proof generation. It will lead to a situation when we have a single witness (without these values inside) for multiple possible inputs and can use the same witness with any values of recipient, fee, relayer and refund allowing client-side attacks with a reuse of the witness.

This addition includes only 4 constraints and makes the witness “bound” to the input values mitigating the tampering of unused values. Squares are used to guarantee that the constraints will be added to the circuit (in case of simple internal recipientSignal <== recipient optimizer can exclude this useless constraint).

In this version of Tornado, we see the “append only” Merkle tree where after each commitment a new Merkle root is generated while the previous root goes backward in the roots history. In order to build the proof, you need all the leaves of the tree; in Tornado they can be reconstructed from all the past events in the contract (see here). Every owner of a tree leaf can reconstruct the whole tree from the history and return their deposit. It seems inefficient for large amounts of leaves, but you can deploy any number of new verifier contract instances and switch to a new instance if the tree becomes too large, solving the problems with the tree capacity. The most important is the fact that the instances are trustless and can be easily deployed in any quantities. The idea to build a mixer on set-membership proofs was on the surface because “payment for inclusion/exclusion” is a “natural” way for most blockchain applications, and Tornado fully proved that it does work in production.

If you only change the numbers and namings, Tornado can be simply converted to a tickets-selling service, preserving visitors personal data, or non-interactive one-time login, this scheme is very practical. Let’s move to the next examples.

Set-membership proofs, Semaphore

The first one is that you can pass your own hashing function for the tree being created here. It’s useful when you want to try different hashing types for your circuits. We see that Semaphore uses the Poseidon hash for its trees (here).

The second feature of this tree is the ablilty to set arity - the amount of children of each node. This can be needed to build Merkle trees having, for example, five children for each node (quinary Merkle tree). It has a reduced amount of levels, helping to make faster proofs generation, smaller proof sizes and a larger capacity (trading it for gas consumption). In this article, you can learn about the comparison of MiMC, Pedersen and Poseidon in terms of gas consumption, capacity of the tree, and effectiveness of proof generation.

Then, let’s have a look at Semaphore’s circuit here. It looks similar to Tornado, and we can also mention the same as Tornado’s part at the end preventing the tampering of signalHash.

Elliptic curves, keys ownership and signatures proofs

To be more practical, let’s study the particluar variant of proof system for Ethereum addresses. As we know, an ETH address is the last 20 bytes of keccak256(public_key), so we need to prove that we know the private signal (private key) that is transformed to the public output (ETH address). Ethereum uses the secp256k1 elliptic curve (same as Bitcoin), and, similar to the hashing in the circuits, this curve is not snark-friendly due to many multiplications, the field size, different operations with points on the curve. But in this case we don’t have a choice, we definitely need these proofs. Let’s see how it’s made in the same circom-ecdsa repo.

First, let’s prove that we own some ETH address. It can be made with eth_addr.circom. To start with, we see the conversion from the private key to a public key using the secp256k maths in the ECDSAPrivToPub circuit. In this circuit, we see the Num2Bits operations here and logic branches handling “is zero” conditions here. Then we see a conversion from the public key to the ETH addr with Keccak here. It seems like they will not be lightweight proofs, let’s check.

You can clone circom-ecdsa, analyse the build_eth_addr.sh script, mention that input_eth_addr.json contains our private key (split by 4x64bit blocks) and run

```
cd scripts/eth_addr
# copy or download powersOfTau28_hez_final_${POWERTAU}.ptau to ../../circuits/pot20_final.ptau
sh ./build_eth_addr.sh
```

I had run this script with POWERTAU=21 (on a really powerful machine) and received:

```
...
non-linear constraints: 247380
...
****GENERATING WITNESS FOR SAMPLE INPUT****
DONE (9s)
...
****GENERATING PROOF FOR SAMPLE INPUT****
DONE (5s)
...
```

and the size of the proving key:

`172M ../../build/eth_addr/eth_addr.zkey`

Another important point learned from circom-ecdsa circuits is the difficulty of signature checking. As described in the article above, signature verification requires modular multiplication + inverse (as shown here), elliptic curve multiplication by a scalar and curve point addition (and Keccak to hash the message itself, of course). So, this circuit is much heavier than the “privatekey-to-pubkey” one and requires much more constraints.

As you see, the secp256k1 curve, its signatures and Keccak hashing are not very SNARK-friendly algorithms, and, even with all optimizations, the direct usage of Ethereum addresses, private keys and signatures is extremely heavy for the prover.

In order to avoid this bottleneck, SNARK researchers presented other algorithms, such as new elliptic curves requiring significantly less amounts of constraints allowing to implement more lightweight signature checking and key ownership proofs. This is a long story about different curves and their usage in circuits and these topics will move our article deep into theory, so let’s simply present at least one example of a SNARK-friendly optimization.

A good example is the Baby JubJub curve described in EIP-2494. It’s a “Twisted Edwards Curve”(article) allowing signing using the EdDSA algorithm with some useful properties. For example, it permits to use the same operations for points addition and doubling (BabyAdd and BabyDbl). It’s much simpler than the same operation on the secp256k1 curve (point addition for unequal points here and point doubling here).

Let’s take a look at what can be built with the “elliptic-native”, snark-friendly algorithms like Baby JubJub.

ZK-Voting

- “Identity and ballot obfuscation” - to obuscate and hide the identity of the voter, hiding the vote itself is also in the plans
- “Coercion-resistance” - when the ballot is cast, then, after the reveal of the results, even the voter cannot prove, did he vote “yes” or “no”, only prove the fact that they voted and their vote was counted correctly
- “Receipt-freeness” - to remove the connection “one ballot” = “one receipt(transaction)”. It is truly difficult to avoid the problem. Any relationship between the voter and their ballot should be broken, while keeping the voting results publicly verifiable. It’s planned to achieve with intermediate relays, “mixers” of votes combining multiple votes in one pack where the votes are indistinguishable.

Let’s move to the implementation. First - read about technical aspects of the implementations here. We see a lot of logic around: we need Ethereum storage proofs, pre-registration of voters, also there is a whole ecosystem around the voting (separate chain, aggregating votes, commiting votings in Ethereum). All this is needed to keep all security restrictions above, so be ready - the amount of background logic in your projects can also be significant. The most important part for us is circits and the proof system, so let’s concentrate on them.

First, we mention that checking Ethereum storage proofs cannot be efficiently done (we need to have the proofs with a Keccak256 hashing that are too heavy) and cannot directly use the secp256k1 curve signatures (they are also too heavy). That’s why the circuit uses the Baby JubJub curve, Pedersen hashes on this curve for signatures, and Poseidon hashes for commitments. To have the ability to utilize the non-secp256k1 curve, Vocdoni uses a contract holding the mapping from the ETH address to the corresponding Baby JubJub public key. This allows to setup all voters and their private Baby JubJub public keys before the voting.

In the second place: in the current scheme, when the voter sends the ballot, the vote itself is a public signal, but the voter’s identity is hidden. As it is said: “anyone can verify that any vote has been uniquely cast by an eligible voter”, but it’s impossible to reconstruct whose vote it was. It’s planned to hide the vote in the future but now the vote value itself is public.

Let’s see the input signals of proposed circuits (from here):

public “Census Root” and public “ElectionID”

- Merkle root of “used/unused” votes + electionID used to identify the current voting

- proving that the current vote is included into the Merkle tree

- the signed vote being added to a Merkle tree (signed using Baby JubJub)

- value constructed as hash(Private key, ElectionID) allowing a user to track their vote and deny the voting twice

or THESE values

private “Reveal keys” with public “Commit keys”

- “Commit keys” with their corresponding hashes that ALL will be revealed at the end of voting.

You can use such “OR” proofs to make your proofs useless after their usage. In ZK the “cannot reveal private values” sometimes transforms to a strange “anybody can reveal and prove ANY values” :)

Here, we will study the proof-of-concept circuit without the part with “Commit-Reveal” inputs (only the voting).

I found the code of circuit here. Let’s take a look at important points:

Census circuit, input signals here. We see:

- we show to the public: electionID, its Merkle root, the nullifier, our voting “power” and our vote

- we keep private the maximum amount of voting “power” available to us (factoryWeight), Merkle proof of inclusion of our nullifier into the Merkle tree (censusSiblings[realNLevels]) and the Baby JubJub private key

Next stop: the Baby JubJub private key -> public key transform is here and then hashing it with Poseidon is here. This simple part of the circuit shows how effective snark-friendly alorithms can be - Baby JubJub requires a small amount of constraints (thanks to optimized multiplications), Poseidon uses the signals without any bits transformations. All these choices allow to significantly reduce the witness size and proof generation time.

Next: the ZkAddress circuit is here (the code is above in the same file). Here, the Poseidon hash of the Baby JubJub public key is reduced to 20 bytes (to produce Vocodni address which is 20 bytes long). It requires the Num2Bits and Bits2Num sub-circuits adding another pack of contstraints (for inputs-outputs of bits<->num). There is no choice here, but if it’s possible, it’s better to avoid bits expansions and reductions (it can also sometimes lead to a lower cryptographic security level).

Next: the Merkle tree, and the verifier of a Merkle proof is here. Here, we work with a sparse Merkle tree - implemented here. The optimization trick of this tree is the design adding new values “from left to right” (a new value is added to the next “free” index) making it possible to have a “sparse” tree at the right side (most of right branches are empty and contain zeroes) while the “condensed” branches with values are at the left side. The description and discussion of these ideas are here (a short version is here). Now, with this primitive we can make the inclusion/exclusion (INSER/UPDATE/DELETE) proofs effectively. In the current circuit:

We check inclusion proof by setting smtClaimExists.fnc <== 0 here (smtClaimExists.fnc <== 1 for exclusion proof) and prove that the previous value in the tree was zero here.

The last: the check of the nullifier is here: it must be equal to the hash made from processId[2] and privateKey.

You can perform all steps to play around with the presented proof-of-concept, I simply compiled the census.circom circuit and recieved ~40k of constraints for 160 levels in the Merkle tree (compare it to the single Keccak hashing from the first article with ~150k of constraints). We used a private->public transform, verified the Merkle proof, hashed some intermediate signals - all this has been done only for 40k constraints saving the witness size and proof generation time for the users. Of course, to become a real application many things need to be added: the second part of circuit with commit-reveal keys, client-side sofrware, preparing proofs, management contracts - all of it is important for the voting system. But the core zk-part seems to be usable enough to become a good solution.

The description of the full voting cycle using Vocdoini was presented in this video and described here.

Conclusion

Also, we would like to show great respect to the academic researchers working in this area in universities and startups. It’s extremely important since we all need safe and effective algorithms. However, at the same time we understand that acceptance by the worldwide cryptography community cannot be fast. If you think about scientific work in cryptography, maybe you would like to join them - it’s real “top-notch” technologies in computer science.

In the next article, we will go deeper and take a look at other proving systems (we used Groth16 in most examples), STARKs, with different requirements for the setup, proving-verifying costs and also will study new use cases. See you…

Who is MixBytes?

Disclaimer

Other posts