Ensure the security of your smart contracts

zkSNARKS, Circom
(Part 1)

Author: Sergey Boogerwooger
Security researcher at MixBytes
Intro
zkSNARKs and proof systems are a super interesting piece of technology. But, like any young technology, there is a lack of articles and best practices. Most of them are simplified explanations of zkSNARKs technology, or highly-specialized articles and studies, that are difficult to understand until you've tried working with them hands-on. The goal of this set of articles is to fill this gap between common and highly-professional knowledge, allowing you to start from "Hello, world" and sequentially move to your first circuits, prepare your own codebase, and deal with the common problems in zkSNARKs development.

First, read the simple basics here if you haven't done so already. You should understand something like:

  • "I have some private data that, when processed by this algorithm, gives this public output"
    • "I know such input signal(s) that, when processed by the circuit, gives this output signal(s)"
      • "I can provide a proof, constructed from valid signals that proves I know a solution for the constraints system of equations"

In short, I have a "circuit" - similar to an electronic circuit in real life that takes some input signals, pushes them through different logical and arithmetical gates, and generates some output signals. I take some input signals, and perform the computation over the circuit, generating a "witness": a set of all signals, valid for this circuit. Then, using this "witness", I calculate a much shorter proof.I present this proof and the public part of inputs-outputs to the verifier. The verifier takes public values and proof and verifies them together against the circuit verification data (verification key).

It is important to note that verification for different proof systems requires a "setup" when a verification key for the current circuit is generated. This verification key, allows anyone with this verification key to check proofs for any valid combination of inputs-outputs (where part of them can be kept private).

This is the "ARK" (Arguments of Knowledge) part of "Succinct Non-interactive Arguments of Knowledge" (SNARKs).

"N" means "Non-Interactive" - another property of SNARKS. It means that proofs can be sent to the verifier without any previous interactions - no "challenge-response" protocols, no pseudorandom "nonces", "salts", "challenge" values (we see them everywhere in Web2, while for blockchains absence of additional transactions is a critical property). All public inputs-outputs-proof are sent in one transaction. It's a super important property for using SNARKs in scaling solutions.

What about "S" (Succinct)? In some cases, verification of the proof can be many times faster than the calculation of the witness and the proof. A client can perform a super-heavy computation, processing thousands of user transactions for example, performing a computation with complexity O(N), and generate one, constant-size single proof, that all state changes were performed correctly, according to the state transition function. Verification in some proof systems takes O(1)! You can prove O(N) computation steps using an O(1), constant-size, constant-verification-time proof. It's super "Succinct".

Aaaand, "zk"(zero-knowledge)? "zk" part is written with small letters because in proof systems we control by ourselves, whether our "provable" function provides zero-knowledge or not. All proving logic works absolutely similarly, whether the provided signals are private or public. We receive "zk" property "for free" by designing our circuit function so that private inputs cannot be calculated from public signals. If we prove that we know a and b, such as c <== a * b, then if a and c are public, anyone can compute b, and this circuit is not "zk". But if we calculate some c <== hash(a), then a can be kept private. Many cryptographic operations are built on one-way functions, and we can make some inputs private, making it impossible to reconstruct them from public data (such as the private key from the public key or the hash preimage from the hash itself).
Where are the fantastic things?
Why don't we see proving systems everywhere if they're so cool? That's because the circuit function's complexity greatly affects the computations required at all stages. The art of developing proving systems involves the composition of algorithms that generate highly effective circuits. This allows for reliable verification keys, proofs, and generation times for users. Only a restricted set of algorithms at the current moment can be effectively implemented, allowing users to perform zk-transactions from their browsers and wallets without having to use huge multiGB-sized keys, wait for hours for proof generation, and perform heavy verification. More can be done when proving systems are used in L2 blockchain nodes for scaling, "packing" multiple state transitions into a single proof, that is then "presented" to L1. Potentially, many cool things can also be done in traditional Web2, but "they are too busy" :)

Let's see what can be made and what the costs are.
Hello, multiplying world!
First, let's go through all the basic "hello, world" steps here. Highly recommended to to watch the very informative video from Jordi Baylina (with our great respect to him and the Iden3 team).

When writing the code in *.circom we don't write the code performing computations. Instead, we write a generator of the circuit that, after being generated, will allow us to prove any computation on this circuit. We don't program a "function, that calculates c = a*b", but we program a "code that generates a program that calculates c <== a*b" and allows proving that we executed this program on some inputs. The process resembles programming a macro rather than an executable code.

I will not provide basic instructions from the Circom docs. Instead, I will only provide the command lines executed from their docs. This will help you save time repeating the same steps. Additionally, I have removed some options not needed for this article and changed the order of some operations. This will ensure that the part common between the development iterations of the circuit will only need to be executed once (i.e., the first generation of the Power Of Tau)."

To have a clear understanding of what all the commands below do, it's a good idea to look at another informative video from Jordi about the flow of development.

Let's start, and perform the first part, which is circuit-independent (we don't need to repeat it for our simple circuits).

# install all prereqiusites (Rust, circom, snarkjs)
snarkjs powersoftau new bn128 12 pot12_0000.ptau -v
snarkjs powersoftau contribute pot12_0000.ptau pot12_0001.ptau --name="First contribution" -v
snarkjs powersoftau prepare phase2 pot12_0001.ptau pot12_final.ptau -v
We generated a ptau setup file in the first line with POWERTAU=12.The power of tau determines the size of internal polynomials and restricts maximum number of constraints in the circuit. This affects the sizes of witnesses, proofs, and witness generation code. For the first example 12 is enough, but subsequent iterations will require more powers.

The second line, "contribute", adds another party that adds its entropy to the ptau files. This contribution procedure allows not only the creator of the circuit but any other N parties to setup the proof system, and if >= 1 of N will honestly "forget" and will not disclose its random entropy, the whole proof system is considered secure. In the next repos and scripts, we will skip "contribute" phases, to simplify and speed up the process, but they're still important for the project's security in a production setup. Also, some proving systems don't require additional contributions to setup.

The third line, the "phase2", finalizes ptau generation after all contributions.

Now, the circuit-dependent part (repeats when the code in multiplier2.circom changes):

Let's create a Rank1 constraints system, saving it to a .r1cs file and generating the .wasm file, allowing the user to calculate witness:

circom multiplier2.circom --r1cs --wasm
We generated multiplier2.r1cs containing a system of equations, that should have a solution: a combination of signals that make all constraint equations correct. Proving the knowledge of such a solution proves that we performed a computation that satisfies all constraints.

Now, we move to the second part of the trusted setup. First, we use our multiplier2.r1cs file to generate multiplier2_0000.zkey, Then, after the contribution, we generate the final .zkey, multiplier2_0001.zkey, which will be used for proving. Then we export verification_key.json (which will be used for verification) in the last line.

snarkjs groth16 setup multiplier2.r1cs pot12_final.ptau multiplier2_0000.zkey
snarkjs zkey contribute multiplier2_0000.zkey multiplier2_0001.zkey --name="1st Contributor Name" -v
snarkjs zkey export verificationkey multiplier2_0001.zkey verification_key.json
It's crucial to carefully read the setup docs provided, as the Groth16 proving system requires the contribution phase, whereas the Plonk and FFLONK systems do not require a "trusted setup" with contributions.

With all the necessary components now in place, the process of proving and verifying can be done on the client's side. Let's proceed to the multiplier2_js dir.

cd multiplier2_js

# create input.json file with {"a": "3", "b": "11"}
echo "{\"a\": \"3\", \"b\": \"11\"}" > input.json

# calculate the witness, containing all signals
node generate_witness.js multiplier2.wasm input.json witness.wtns

# generate the proof
snarkjs groth16 prove ../multiplier2_0001.zkey witness.wtns proof.json public.json

# check the proof!
snarkjs groth16 verify ../verification_key.json public.json proof.json
We constructed some a and b (a=3, b=11) in input.json, then generated the witness.wtns (containing a, b, and c <== a*b signals values), then made a proof.json from witness.wtns using multiplier2_0001.zkey (created during trusted setup).

After that, we present the result value of c (in public.json, it's the only one public signal in our circuit) and proof.json to the verification procedure. Then, anybody having verification_key.json can check, that we have a witness with a, b, such, that public c <== a*b.

You can play with other values, by changing values in input.json or in public.json contents and check if the proof is correct. But also remember, that arithmetical operations in circuits work over finite fields, and the operations with too large values will not lead to an overflow. We'll return to these nuances later.
Going deeper
First, read the circom language docs, because many common things are not possible in circuits. A review of the language is not the purpose of this article, but a few notes are important to clarify the questions we will address next.

You should remember that a circuit has a number of operations that is known at compile time(!), so there cannot be cycles with unknown number of iterations, "new" signals instead of predefined ones, etc. All signals can only go forward, so any "second write" to the same signal is not possible (signals are immutable). Also, circuits allow only quadratic combinations of signals; you can make s <== a * b + c but cannot make s <== a * b * c + d (remember the graphical scheme of the part of the circuit shown here).

Another super-important thing is the meaning of operators <-- and === (when we write <== it means exactly <--- first, then ===). <-- generates a new signal from other signals and variables, and is used to calculate the next signal only (used when we generate a witness). While === adds a constraint, that is used to prove this computation (used to build the verification key). Missing constraints checks === after signals calculation with <-- can lead to a situation when your proving side calculates the witness correctly, but the proof check can return a "false positive" result on wrong, specially crafted data. It's described in more detaills in this video.

Let's continue to develop our next circuit.
Powering a*b
I will use one of our educational repos here (tested on Ubuntu 20.04) to automate all the steps mentioned above in a single bash script, allowing us to measure the time of execution and the size of generated files.

The main run_all.sh script takes CIRCUIT_NAME - name of .circom template file, and it does the following:

  • Downloads large "ptau" files from the snakjs repo (instead of generation) for POWERTAU power of tau
  • Re-creates a new empty BUILD_DIR for each new run, where all zkeys, wasm, inputs, proofs will be generated
  • Compiles the .circuit
  • Performs generation of the proving .zkey
  • Generates inputs
  • Generates the witness
  • Generates the proof
  • Checks the proof
  • Outputs file sizes, time of witness generation, proving, verifying

It's boring, but you'll really need similar automations because during development, you need to examine all parameters of your proving system, such as the time of witness generation, proving, verifying, size of files, amount of constraints, etc. Repos with circuits contain similar scripts (such as this one), because blind development of SNARKs is useless. It's better to study run_all.sh script, find where you can modify inputs, powers of tau, add your additional debug, or simply rewrite it as you like. In the repo, you will find the multiplier2.circom from the first example. You can run it with ./run_all.sh multiplier2 and experiment.

Now, let's try to prove calculation of something with more multiplications, for example (a*b)^N. You can try to modify previous multiplier2.circom to accomplish this, add accumulator vars in a cycle, and try to put multiplications in a single "signal-accumulator" like we do in other languages, but it will be painful because such a flow cannot be transformed to a one-way predefined circuit. The solution is to use fixed N as a circuit template parameter and generate the proving system for fixed N like "powering private (a*b) to the power of N=32".

I made it (possibly wrong) in a powerabn.circom, which you can find here. You can run the example from the same repo with ./run_all.sh powerabn.

powerabn.circom
pragma circom 2.0.0;

template PowerABN(N) {
    signal input a;  
    signal input b;  

    signal output c;

    signal d[N + 1];
    signal e[N + 1];

    var i = 1;

    d[0] <== 1;
    e[0] <== 1;

    while (i <= N) {
        d[i] <== a * d[i-1];
        e[i] <== b * e[i-1];
        i++;        
    }

    c <== d[i-1] * e[i-1]; 
}

component main = PowerABN(32);
Let's first take a look at the circuit parameters. The output of snarkjs info -c powerabn.r1cs showed us that:

[INFO]  snarkJS: # of Wires: 68
[INFO]  snarkJS: # of Constraints: 63
while in the previous primer with multiplier2 there were:
multiplier2.circom

pragma circom 2.0.0;

template Multiplier2 () {  
   signal input a;  
   signal input b;  
   signal output c;  

   c <== a * b;  
}
component main = Multiplier2();
and

[INFO]  snarkJS: # of Wires: 4
[INFO]  snarkJS: # of Constraints: 1
In multiplier2 we took a and b (2 wires) connected it with the signal c (+1 wire) and another (+1 wire) for the output of c, and checked the only 1 constraint when c <== a * b

In the powerabn circuit we have 66 wires - it's our a, b and c (+4 wires) and (+31 * 2) wires, that was used for internal signals packs d[] and e[](written 31 times). And 31 * 2 + 1 = 63 constraints ( d[] and e[] writes and c output)

Don't forget to study snarkjs options to examine R1CS cicuits. In some cases you will need to print info about the constraint system, or at least you should understand the size of the witness, because each prover should generate such a witness on the client's side.

Next step - let'slets increase N and look at our circuit. Change N in powerabn.circom to "10000" here. If we used POWERTAU=12 in run_all.sh, we cannot build the circuit because circuit too big for this power of tau ceremony. 20001*2 > 2**12. In such cases, you need to re-setup the proving system with more powers of tau. For large amounts of constraints, the first phase can take a very long time, even if you download files - they are very big. Later we will need at least POWERTAU=21 to play with cryptographic proofs, so you need to put POWERTAU=21 in run_all.sh to continue. All previous circuits will work because they are smaller, but cryptographic primitives require powersoftau* files of significant size.

Now, having N=10000, you can metion that sizes of *witness.wtns and *powerabn.wasm greatly increased:

...
non-linear constraints: 20001
...
[PROFILE] 4.0K build/powerabn_verification_key.json
...
[PROFILE] Witness generation time: 0:00.03
...
[PROFILE] Prove time: 0:00.92
...
[PROFILE] Verify time: 0:00.40
...
[PROFILE] 116K powerabn.wasm
[PROFILE] 628K powerabn_witness.wtns
Compare it to previous setup with N=32

...
non-linear constraints: 63
...
[PROFILE] 4.0K build/powerabn_verification_key.json
...
[PROFILE] Witness generation time: 0:00.02
...
[PROFILE] Prove time: 0:00.43
...
[PROFILE] Verify time: 0:00.39
...
[PROFILE] 36K powerabn.wasm
[PROFILE] 4.0K powerabn_witness.wtns
Also, now you can play with powerabn.circom adding more N, for example, 200000. Now the proving-verifying time is significant:

non-linear constraints: 400001
...
[PROFILE] 4.0K build/powerabn_verification_key.json
...
[PROFILE] Witness generation time: 0:00.14
...
[PROFILE] Prove time: 0:07.55
...
[PROFILE] Verify time: 0:00.42
...
[PROFILE] 1.6M powerabn.wasm
[PROFILE] 13M powerabn_witness.wtns
But you may have mentioned a boring "always the same" line with 4.0K size of verification_key.json and the same verification time ~ 0:00.40s.
All our circuits, with large POWERTAU and tons of constraints, have a constant 4Kb size of the verification key, which is checked for constant time!

This is the "silver bullet" of zkSNARKs. A 4kb size is affordable to be put in a smart-contract or microservice, and if the verification time is small enough, we can prove massive computations non-interactively, having only kilobytes of code on the "server" side. This allows us to use SNARKs to prove processing of large packs of operations, "packing" them in a single proof.

But you should remember that it works only until the size of the circuit and constraints system become too big, and the client's side becomes too heavy. If you will add more powers of tau, constraints, and signals, you can easily receive a working SNARK but it may be totally useless in a real environment because proving is simply unaffordable by the client. Effective design of circuits, keeping all these things in balance is the heart of composing pratical proving systems.
Closer to practice - hashing
Which algorithms have the properties that allow them to build really useful circuits? Maths are good, but cryptographic operations are better. Maths in circuits are restricted by the finite fields and require adaptation of common algorithms to be used in circuits. Many cryptographic algorithms operate on finite fields with constant size, use field multiplications, allow one-way transformations (extremely useful for ZK), and allow for the creation of effective fixed-size circuits. This is the reason for the presence of many cryptography-related circuits, and the lack of widely-used computations with SNARKs.

We're going to try using keccak hashing and attempt to hash some input N times. I took the implementation (just copied all needed files and added circomlib to the dependencies with npm install) of the keccak hashing circuit from this repo - a library of circuits, allowing to prove different cryptographic operations, We will return to it many times later.

It's useful to take a look at the keccak.circom template and the implementaion in C (e.g., "absorb" phase). They are similar because hashing and symmetric cryptography algorithms are "circuits" by design, performing the same operations many times over a finite set of bits. Let's prove the hashing of something.

Let's take a look at:

keccakn.circom

pragma circom 2.0.0;

include "vocdoni-keccak/keccak.circom";
include "../node_modules/circomlib/circuits/bitify.circom";

template KeccakN(N) {
    signal input a;
    signal keccak_in[256];
    signal keccak_out[256];
    signal output out;

    component toNBits = Num2Bits(256); // (1)
    component fromNBits = Bits2Num(256);


    // need to build N keccak circuits to perform N-times hashing
    component keccak[N];

    toNBits.in <== a;


    var i;
    keccak[0] = Keccak(256,256);
    for (i=0; i<256; i++) {
        keccak[0].in[i] <== toNBits.out[i];
    }


    var j;
    for (j=1; j<N; j++) {  // (2)
        keccak[j] = Keccak(256,256); // (3)
        for (i=0; i<256; i++) {
            keccak[j].in[i] <== keccak[j-1].out[i];
        }
    }


    for (i=0; i<256; i++) {
        fromNBits.in[i] <== keccak[j-1].out[i];
    }
    out <== fromNBits.out;
}

component main = KeccakN(1); // (4)
We see the very useful primitive circuits Num2Bits() and Bits2Num() at (1), which allow us to convert a signal value to the bits array. Then, we initialize the first keccak circuit keccak[0] = Keccak(256,256); and after that,in (2), we add N-1 additional keccak circuits, passing the outputs of the previous circuit to the next one. Inputs for keccak circuit are bits, and each reuse of keccak require to add 256 new signals and constraints.

You can run it with ./run_all.sh keccakn (and POWERTAU=21). Note that even for N == 1, this circuit is large ( 151k constraints for a single operation, and a large witness size):

non-linear constraints: 151104
...
[PROFILE] 4.0K ../keccakn_verification_key.json
...
[PROFILE] Witness generation time: 0:00.23
...
[PROFILE] Prove time: 0:02.65
...
[PROFILE] Verify time: 0:00.42
...
[PROFILE] 844K keccakn.wasm
[PROFILE] 4.7M keccakn_witness.wtns
With N==16 you will receive circuit too big for this power of tau ceremony. 2413824*2 > 2**21, let's try N==8:

...
non-linear constraints: 1207040
...
[PROFILE] 4.0K ../keccakn_verification_key.json
...
[PROFILE] Witness generation time: 0:01.80
...
[PROFILE] Prove time: 0:19.26
...
[PROFILE] Verify time: 0:00.41
...
[PROFILE] 4.9M keccakn.wasm
[PROFILE] 37M keccakn_witness.wtns
Here we have a much longer proving time, a very large witness size, and large keccakn.wasm size. All this affects the client side, which is much less powerful than our server.

I selected the keccak256 algorithm to demonstrate that direct usage of existing crypto algorithms in proof systems can be tricky. We made a proof system, allowing the user to prove that he has a preimage for some hash or "hashes onion with N layers". This proof is useful, allowing you to prove that you really own some data with this hash. But to move forward, we need circuits with more sequential hashing iterations, because "it's a Merkle tree there".
A (sparse) Merkle tree there
You should know what Merkle tree is (otherwise it's better to choose another article). When we have some list with 2^N elements, the Merkle proof has the size of N, and checking the Merkle proof is exactly sequential hashing (when we go up to the Merkle root). We implemented a circuit with N sequential keccak hashings, but a tree with our poor N=8 allows only 256 elements.

That's why modern zkSNARKs use hashing functions that have fewer multiplications, operate in finite fileds but are still safe. One of examples: MiMC hashing function, designed especially for SNARKs (see the article). Let's try to repeat N hashings with MiMC (you can run it with ./run_all.sh mimcn):

mimcn.circom
pragma circom 2.0.0;

include "../node_modules/circomlib/circuits/bitify.circom";
include "../node_modules/circomlib/circuits/mimc.circom";

template MiMCN(N) {
    signal input a;
    signal output out;

    component mimc[N];

    var k = 1;
    mimc[0] = MiMC7(91); // amount of rounds
    mimc[0].x_in <== a;
    mimc[0].k <== k;

    var j;
    for (j=1; j<N; j++) {
        mimc[j] = MiMC7(91);
        mimc[j].x_in <== mimc[j-1].out;
        mimc[j].k <== k;
    }

    out <== mimc[j-1].out;
}

component main = MiMCN(256);
We used N=256 iterations of hashing, having:

non-linear constraints: 93184
...
[PROFILE] 4.0K ../mimcn_verification_key.json
...
[PROFILE] Witness generation time: 0:00.06
...
[PROFILE] Prove time: 0:02.24
...
[PROFILE] Verify time: 0:00.42
...
[PROFILE] 408K mimcn.wasm
[PROFILE] 2.9M mimcn_witness.wtns
That's really much, much better! A Merkle tree with 2^256 leafs is much better than a tree with 2^8 leafs, while all parameters seem affordable on the client's side.

There are several SNARK-friendly cryptographic primitives, and we'll talk about them in the next articles.

After playing around with Keccak and MiMC sequential hashing, we have discovered the most frequently used pattern in modern SNARKs - "membership proofs". With a list of values (which could potentially be private), we can prove that an element belongs to a list defined by a single Merkle root. By using a "Sparse Merkle Tree" pattern, we can also efficiently make inclusion and exclusion proofs.

This scheme allows the building of architectures where the server keeps a super-compact, constant-sized amount of data - Merkle root + verification key, while the entirety of the data and proving logic resides on the client's side. Proving private facts such as "my address is on the list," "I added my address to the list," or "I removed my address from the list" enables the creation of votings, verified credentials, on-chain mixers, and other super-scalable, constant-sized services for users.
Conclusion
Unfortunately, the complexity on the client side is still a big problem. Even with an effective verifier on a smart-contract side, the client's efforts to prove something complicated require very heavy computations. That's why SNARKs are now widely used to build L2 solutions because L2 nodes have enough resources to calculate proofs for large packs of operations. Pure "in-browser" zk-proofs are also present in production. A good example is Tornado.Cash, which uses proofs suitable for real user tasks that work from the browser.

We will continue studying more cryptographic circuits and their usage in real applications in the next "Part 2" article.

P.S. Don't forget one important thing when playing with Circom - don't use "cirquit" instead of "circuit" :)
Who is MixBytes?
MixBytes is a team of expert blockchain auditors and security researchers specializing in providing comprehensive smart contract audits and technical advisory services for EVM-compatible and Substrate-based projects. Join us on Twitter to stay up-to-date with the latest industry trends and insights.
Disclaimer
The information contained in this Website is for educational and informational purposes only and shall not be understood or construed as financial or investment advice.
Other posts