# Analysis of gossip based dissemination

Gossip based protocols are widely used in distributed systems for robust dissemination of information. The problem: spreading a message among a set of processes. For example, in the Bitcoin P2P network, whenever a new transaction happens, it needs to be broadcast to all peers in order for it to end up on the blockchain. Typically, such information originates at one of the nodes in the network, and needs to be communicated to the rest of the peers.

One elegant solution to this problem mimics how rumours spread in the society by
word of mouth, namely “gossip” based protocols. The gossip component of every
non-faulty process in the network maintains two main pieces of state: a list of
other known live peers, and a buffer of recent messages. Then every `T`

seconds,
(called the *gossip period*), every node executes the following:

- Pick
`f`

peers from the membership list at random. Here,`f`

is called the gossip fanout. - Send recent messages from our messages buffer to each of these peers.

Of course, in parallel, each node must listen for messages and update its message buffer.

The details like the value of the gossip fanout, and what exact messages to relay, and for how many rounds, etc. are what make different gossip protocols different, but we aren’t going to talk much about it here.

## Analysis

The analysis focuses on finding upper bounds for:

- The number of rounds needed to get an update to all participants.
- The load on each participating peer.

To begin with, we note that the inherent random nature of the algorithm means
that we can only comment about the *expected* behaviour of the protocol —
i.e., we are after the number of rounds that a cluster of `N`

nodes needs to
have a message disseminated across the cluster *with high probability*.

### Dissemination time

This refers to the number of gossip rounds before we can be reasonably sure about a message having been disseminated across the cluster.

This analysis is borrowed (indirectly) from work on epidemiology — our problem is not so different from a situation where an infected organism comes in contact with random uninfected individuals, thereby infecting them. These individuals in turn go on to infect others and so on.

For our purpose, we shall say that a node in possession of a particular message
`M`

is “infected”, while nodes which don’t know about `M`

yet are uninfected.
Let’s further suppose that after round `T`

, `x`

nodes are still uninfected, and
`y`

nodes are infected. Of course, at `T=0`

, `x=N-1`

, and `y=1`

(i.e., we have
one “infected” node that knows of the message `M`

and is now out to infect
others with this knowledge). Now because each node picks `f`

others at random to
gossip with, and because the proportion of uninfected nodes in the cluster is
$\frac{x}{N}$, on average any given infected node will pick $\frac{x}{N}f$
uninfected nodes. Since there are $y$ such infected nodes, on average, we’ll
see $\frac{f}{N}xy$ infected-uninfected interactions in a round. Since an
uninfected node turns into an infected node after receiving the message `M`

, on
average, each round results in a *decrease* in the number of uninfected nodes
($x$) by this quantity $\frac{f}{N}xy$. Therefore,

Let $ \beta = \frac{f}{N} $

Then,

Let $\frac{1}{x\left(N-x\right)} = \frac{A}{x} + \frac{B}{N-x} \implies A\left(N-x\right) + Bx = 1$.

Now setting $x=0 \implies A = \frac{1}{N}$, and setting $x=N \implies B = \frac{1}{N}$. Using this in $\eqref{eqn}$, we have

Where $C$ is the constant of integration. To find it, we note that at $T=0, x=N-1$. Using this in \eqref{eqn2}, we find that $C=-\frac{ln\left(N-1\right)}{N}$. Substituting this value for $C$ in \eqref{eqn2} then yields:

So, as $T$ (number of gossip rounds) grows, the expected value of the fraction
$x\over N$ of uninfected nodes rapidly approaches zero. Since the number of
*infected* nodes $y=N-x$, we have the proportion of uninfected nodes

As we grow $T\to\infty$,

Hence the expected proportion of infected nodes reaches 100% as the protocol
keeps running. But this begs the question, how long in practice do we have to
wait before the whole cluster gets the new message with high probability? Let’s
call $T_{1\over2}$ the number of rounds when *half* of the cluster gets the
update, i.e., at $T=T_{h}$, $\frac{x}{N}=\frac{y}{N}=\frac{1}{2}$.
Therefore, from the expression for $\frac{x}{N}$ above,

Now using $\beta = {f\over N}$, we get $T_h = \frac{1}{f}\ln{\left(N-1\right)} = O\left(\ln{N}\right)$. Hence, the half life of the process is logarithmic in the number of participants $N$. Now, to find the number of rounds it takes for $99\%$ of the uninfected nodes to befome infected can be computed as:

So we see that within a logarithmic number of rounds, a *most* of the nodes
get infected with high probability.

### Message load per member

A dissemination mechanism is no good if it puts unreasonable load on member
nodes, and/or fails to spread out the load evenly across the cluster. In gossip
based dissemination, each node sends out $f$ messages, and receives on
expectation $f\over N$ messages *per round*. Since $\Theta\left(\ln{N}\right)$
rounds are needed for a message to get to the whole cluster, the load on each
node is also logarithmic in $N$, and every node has similar load. This is quite
nice, since one can grow the cluster to very huge sizes, and even then the load
on each node remains reasonably low.

## Conclusion

Gossip based protocols are amazingly simple and robust, which is why they form crucial elements of failure detection and membership layers of many large scale distributed systems. Note that I did not talk about nodes failing in this post, and the focus was on the use of gossip for dissemination. However, gossip itself can be used to implement failure detectors, like the SWIM failure detector(very readable paper). Here, every node periodically gossips cluster membership updates. The protocol is augmented with several features like acknowledgements and indirect acknowledgements allows the failure detector to scale with tunable false positive characteristics.