Probability of Network Failure


#1

I have recently been looking into how many nodes in a sharded architecture are needed for network security.

Probability Calculation of a network failing

Note that, in dividing the nodes into shards the samples are without replacement rather than with replacement. In binomial we make the assumption of with replacement, so hypergeometric is the right choice.

Binomial distribution is a good approximation for the hypergeometric distribution if N and K are large compared to n and p(1 βˆ’ p) is bounded away from 0, where p = K/N. Binomial model assumes independence, this is far away N from independence, so we model it by hypergeometric distribution.

We do the following calculation based on the hypergeometric formulation:

image

where we use the same notation as it was used in whitepaper.

  • N is the total number of registered validators.
  • K is the total number of malicious validators.
  • n is the number of validators in a shard.
  • k is the number of malicious validators in a shard.
    Note that this formula is only true for k ≀ K

We want to compute the probability of a network failure: Suppose there are m shards(that is mn = N) using the notation as above, the probability of a no breakdown of the network is equal to

We sum over all possible arrangements/division of malicious shards in which all of them have less than 1/3 malicious nodes. That is, the sum is 3 taken over all possible integer {xi}mi=1 such that 0 ≀ xi ≀ n and they add 3 up to K. The reasoning is the following:

– The network does not fail if each shard has less then 1/3 malicious 3 nodes
– Suppose the first shard has x_{1} many malicious nodes out of a total of n nodes in the first shard, now, the total number of nodes in the network is N-n and number of malicious nodes left are K-x_{1}. So now we compute the probability that x_{2} malicious nodes enter the second shard out of K-x_{1} malicious nodes and n-x_{2} honest nodes enter the second shard out of a total of (N-n)-(K-x_{1}) honest nodes.

Based on this, we compute the probability of non failure of a network depending on N, m(hence n), K. (Here we assumed the percentage of malicious nodes to be 25%.)

Some Sanity checks:
a) As number of shards increase, keeping other things constant(for example look at first and sixth row of the table), probability of network failure increases as it becomes more likely to have one failure.

b) As total number of nodes increase it decreases the probability of less likely event, so probability of network/shard failing decreases.

Observe that with 4000 nodes and 10 shards, there should be at most one failure in 50 years assuming 12 hours epoch time. But what about the whole network? I expect it would be close to (0.99996)10 = 0.9996, because the difference between dependence and independence fades away when number of nodes per shard increases. So the main take away is if we want our chain to not have a single failure in say 100 years(probability of failure < 0.0014%), if we want to have 10 shards(with 25% malicious nodes), we need at least 5684 nodes.

So for 10 shards and 25% malicious nodes, we need around 5684 validators for having a 100 years non security breach(this is certainly probabilistic).


#2

This is a critical analysis @swg. It is a wake up call for all sharding-based protocols in that the commonly used hyper-geometric probability of a single shard failing is not at all the same as the true probability of failure among the entire set of shards in the network.

One thing that reassures me is that β€œfailure” in this case is not catastrophic. Failure is simply that malicious nodes control more than 1/3 of the voting power in a committee. Unless they also manage to partition the shard network, these malicious nodes can only stall consensus and render the shard useless for the duration of the epoch. Certainly a bad outcome but not one that breaks the entire network.

The probability of the whole network breaking is the probability of 2/3 malicious majority in any shard, which I would guess is still exceedingly small even in this new formulation.