I have recently been looking into how many nodes in a sharded architecture are needed for network security.
Probability Calculation of a network failing
Note that, in dividing the nodes into shards the samples are without replacement rather than with replacement. In binomial we make the assumption of with replacement, so hypergeometric is the right choice.
Binomial distribution is a good approximation for the hypergeometric distribution if N and K are large compared to n and p(1 β p) is bounded away from 0, where p = K/N. Binomial model assumes independence, this is far away N from independence, so we model it by hypergeometric distribution.
We do the following calculation based on the hypergeometric formulation:
where we use the same notation as it was used in whitepaper.
- N is the total number of registered validators.
- K is the total number of malicious validators.
- n is the number of validators in a shard.
- k is the number of malicious validators in a shard.
Note that this formula is only true for k β€ K
We want to compute the probability of a network failure: Suppose there are m shards(that is mn = N) using the notation as above, the probability of a no breakdown of the network is equal to
We sum over all possible arrangements/division of malicious shards in which all of them have less than 1/3 malicious nodes. That is, the sum is 3 taken over all possible integer {xi}mi=1 such that 0 β€ xi β€ n and they add 3 up to K. The reasoning is the following:
β The network does not fail if each shard has less then 1/3 malicious 3 nodes
β Suppose the first shard has x_{1} many malicious nodes out of a total of n nodes in the first shard, now, the total number of nodes in the network is N-n and number of malicious nodes left are K-x_{1}. So now we compute the probability that x_{2} malicious nodes enter the second shard out of K-x_{1} malicious nodes and n-x_{2} honest nodes enter the second shard out of a total of (N-n)-(K-x_{1}) honest nodes.
Based on this, we compute the probability of non failure of a network depending on N, m(hence n), K. (Here we assumed the percentage of malicious nodes to be 25%.)
Some Sanity checks:
a) As number of shards increase, keeping other things constant(for example look at first and sixth row of the table), probability of network failure increases as it becomes more likely to have one failure.
b) As total number of nodes increase it decreases the probability of less likely event, so probability of network/shard failing decreases.
Observe that with 4000 nodes and 10 shards, there should be at most one failure in 50 years assuming 12 hours epoch time. But what about the whole network? I expect it would be close to (0.99996)10 = 0.9996, because the difference between dependence and independence fades away when number of nodes per shard increases. So the main take away is if we want our chain to not have a single failure in say 100 years(probability of failure < 0.0014%), if we want to have 10 shards(with 25% malicious nodes), we need at least 5684 nodes.
So for 10 shards and 25% malicious nodes, we need around 5684 validators for having a 100 years non security breach(this is certainly probabilistic).