Explaining the IOTA Congestion Control Algorithm
The full article was originally published by William Sanders on blog.iota.org. Read the full article here.
This article explains in some technical detail the IOTA Congestion Control Algorithm (ICCA) used in the IOTA 2.0 protocol (Coordicide). We wanted to offer this overview to clarify the relationship between access and mana that was raised in our previous post about mana. Furthermore, we are aware that many are eager to learn more about Coordicide, and stakeholders from across the IOTA landscape benefit from this knowledge. That said, the information contained in this article is not “required knowledge” to use IOTA. In addition, all of the aspects discussed in this article will be part of our full specification. If you have questions, please visit our Discord to discuss them with our developers and researchers. Congestion control is an important and fascinating aspect of Coordicide, so we hope that you enjoy this explanation.
We would particularly like to express our gratitude towards Bob Shorten and his team at Imperial College London for their work in co-developing this important algorithm. Their work has been instrumental in developing this important component and validating its behavior.
For a fully detailed explanation, see our first article on this topic which has been accepted for presentation at the 7th IEEE/IFIP Workshop on Security for Emerging Distributed Network Technologies.
What is congestion control and why do we need it?
In most DLTs, information is relayed across the network through a process called gossiping, where participating nodes receive messages from one neighbor and then forward them to other neighbors. Naturally, nodes receive many messages, so they must decide which of these messages to relay to their neighbors and in what order to do so. A congestion control algorithm makes these decisions.
Congestion occurs when a network has more traffic than it can handle. Without proper congestion control, the network can become oversaturated and cease to function because nodes can overwhelm their neighbors with more messages than can be handled by an individual node. By deciding which messages to forward, a congestion control algorithm is utilized to prevent this oversaturation. In an unsharded DLT, the problem is particularly acute because every network participant (“node”) must process all valid transactions. Therefore, the total throughput of the network (informally known as transactions per second) is ultimately limited by the available resources such as internet connectivity, device processing, and storage capabilities.
However, since a congestion control algorithm decides what messages are gossiped, it essentially determines who has access to the network; the congestion control algorithm has a profound effect on the user experience. In order to elucidate “Coordicide”, the authors would like to explain ICCA which will be used in the IOTA 2.0 protocol (Coordicide).
Congestion control is a very old topic that was intently studied as the infrastructure of the Internet was developed. However, unlike most networks, DLTs have very strict requirements which make congestion control a particularly difficult problem to tackle. In particular, the following requirements must be satisfied (there are other requirements which we go into further detail below, but these are the most important):
- Fair access: network access must be granted proportionally to some “scarce resource”
- Attack resistance: malicious nodes cannot disrupt the network
- Consistency: all nodes need to write the same messages to their local ledger
Blockchains solve congestion control by using some mechanisms to elect leaders to mine blocks. Access is fair since the rate a leader is chosen is proportional to either the amount of stake or hashing power they have. Since access is limited only to the leaders, congestion is never possible, even during an attack. Lastly, since blockchains have a consensus module to elect a leader, consensus is achieved. Limiting access to “leaders” makes it easier for PoS and PoW architectures but centralises the access.
Since IOTA uses a DAG, and not a blockchain, we cannot (and do not want to) use any sort of leader election process. Instead, the ICCA uses what is called a scheduler, which selects messages which are “scheduled”. Scheduled messages are written to the local tangle and forwarded on to the node’s neighbors. The scheduler schedules messages at a constant rate, which ensures that none of the neighbors are overwhelmed.
Now we can discuss how the ICCA achieves our three key requirements mentioned above, fair access, attack resistance, and consistency. First, we can see that locally, each node schedules traffic fairly according to mana. It turns out that this also holds true globally and that access is fairly granted according to mana.
Second, what happens if an attacker tries to spam the network? Locally, nodes will not process the attacker’s messages faster than their allowed rate. So the attacker’s messages will pile up and their queue will grow. All of the nodes will detect this and banish the attacker from the network. Of course, there are numerous other attack methods we have studied, but the scheduler is difficult to trick so the algorithm is quite resilient against attackers.
Lastly, since the scheduler never deletes honest messages, the algorithm ensures that they eventually reach all the nodes. If an attacker is banished from the network, what happens to the messages they issued? Since the attacker’s behavior is detectable, most nodes will banish the attacker at the same moment, ensuring the ledgers are almost the same. Then, the gossip protocol has a mechanism called solidification which can rectify minute differences in tangles.
Below you will find a more technical overview of the ICCA as a deeper analysis of the requirements is presented.
Requirements for any DLT
There are some general requirements and a few of those are key in the success of any DLT congestion control algorithm: consistency, fair access, utilization, fair latency, and attack resistance.
Consistency means that every node writes the exact same messages to the ledger. When the network is congested, nodes may be obliged to drop some messages leading to inconsistencies. The goal of the ICCA is to reduce to the minimum message drops and to fix the remaining inconsistencies through solidification requests. Without this, there is no consensus.
Since the available network resources are finite, access (equivalently, throughput) has to be restricted in some way. In DLTs, this is usually guaranteed by restricting access through a “scarce resource”. Many DLTs use energy as a scarce resource by requiring miners to do PoW. Proof of Stake DLTs often use the token itself as a scarce resource; IOTA uses mana (see our mana blog post and its follow up).
Fair access means that all nodes should get a fair share of throughput. It’s important to note that this access must be linearly proportional with respect to the amount of mana: if access is given sublinearly – giving increasingly less access with more mana – then large actors will split their mana into smaller nodes to gain unfair access. If it is superlinear – giving more access with more mana – then big holders have an advantage. This incentivises nodes to pool their resources, like with mining pools. Both sublinear and superlinear access allocations are unfair in general and promote behavior that could hurt the network in the long term.
If there is demand, the full potential of the network will be used.
For example, think of a trivial access control where throughput of each node is capped by their percentage of mana, say a 10% mana allows a user a maximum of 10% of the bandwidth. This would be fair, but inefficient: in fact, if say 95% of the mana is offline, then the network would operate at 5% capacity. An efficient congestion control mechanism will have to repurpose that unused capacity.
Combining utilization and fairness is complicated. The question of “How much mana do I need for X TPS” is hard to answer. But remember, utilization is a good thing! The network would be unusable without it.
Utilization and fairness together form a concept called “Max-min fairness”. This is an important concept from classical networking. Routers relaying internet traffic need these exact properties (imagine if your local routers did not have these properties, your home Internet would be unusable).
All honest users should experience a similar latency. Latency is the time it takes for an issued message to propagate to all nodes. This latency must be small for both a good user experience and for the stability of the protocol.
User experience should speak for itself; nobody likes delays. For stability of the protocol, however, if the latency was dramatically different for different users, then it would introduce perverse incentives to deviate from the algorithm. Punishing malicious nodes’ latency is good, from a game theory perspective, which is why if node latencies were to increase when they misbehave, then they would be receiving punishment, and are thus encouraged to behave.
These properties mentioned above must hold for honest nodes even if an attacker tries to break the system. Nodes have an incentive to increase their access and decrease their latency. Deviations should be impossible or punished.
IOTA Congestion Control Algorithm (ICCA)
The IOTA Congestion Control Algorithm (ICCA) streamlines the transaction process to minimize the impact of potential congestion and to regulate writing access to the ledger. The ICCA accomplishes this through its three parts: the scheduler, the rate setter, and the blacklister.
The scheduler determines which transactions are written into the ledger and gossiped. The rate setter uses a classical Additive Increase Multiplicative Decrease (AIMD) algorithm to adjust each individual’s rate. Blacklisting censors nodes attacking the network.
This – as far as we know – is the first non Proof of Work, DAG-based congestion control algorithm that has been published within the cryptocurrency field. It is the most novel component of Coordicide and a keystone to the IOTA 2.0 protocol.
From the networking layer (see this link on terminology), messages arrive through gossip from neighbors, are filtered for duplicates and invalid messages, and arrive in the inbox where they queue per the issuing node ID. The inbox also receives messages created locally by the node’s rate setter. The blacklister monitors the queues of the inbox for malicious behaviour, and the scheduler decides which messages get to leave the inbox. After a message is scheduled, the following tasks occur simultaneously:
- The message is gossiped to all neighbors but the one from which we have received it (given back to the networking layer)
- The relevant application (from the application layer) is called to parse the payload
- The message is “written” to the local tangle, where it can be considered as a tip, voted upon by FPC, booked to the ledger, etc
In the ICCA, we are proposing the use of a modified version of a Deficit Round Robin scheduler. Apart from the fact that such a scheduler addresses all the aforementioned requirements (i.e. consistency, utilization, fairness, and attack resistance), we chose it for the fact that it is lightweight. For instance, it is the default option to efficiently handle a very large number of threads for the Linux kernel.
The scheduler works as such: after messages are received, they are placed in the inbox. Each message is assigned a “work score” based on how many resources it consumes. Larger messages have larger scores, since they use more bandwidth. A message with a transaction containing 100 signatures has a higher score than a transaction with one signature, because it takes more resources validating the signatures. Essentially, the scheduler makes sure that a maximum number of resources are consumed per second. The researchers and engineers developing this have complete freedom to define this work function in order to tailor the algorithm to the exact needs of the IOTA network.
When messages are received, they are sorted by their issuing node into different queues. The scheduler visits each queue, schedules a number of messages based on the mana (the “scarce resource” used in IOTA) held by the issuing node, and then moves onto the next queue. If a queue is empty, it is skipped; the scheduler then schedules from other queues. In this way, all nodes have their messages processed at a rate proportional to their mana unless their queue becomes empty.
The inbox of a node is divided up into queues, with a queue for every node in the network; a Deficit Round Robin scheduler can efficiently handle tens of thousands of queues. Messages are then placed in the queue of the node that issued the message. The scheduler visits each non-empty queue in a deterministic order. Each queue is assigned a quantity called “credit” to that node. When the scheduler visits a queue, it adds a certain priority value proportionally to the mana of the corresponding node ID. It then schedules messages from that node’s queue. Every time a message is scheduled and further processed, the message’s “work score” is deducted from the credit. When no more messages from N’s queue can be scheduled or the credit becomes negative, the scheduler moves onto the next queue.
Currently, we schedule messages at a fixed/maximum preset rate. And at this time, we are analyzing and studying different means to make this threshold dynamic where sharding would let the network scale naturally.
But what happens if an attacker or a selfish node does not follow the scheduler rules? Assume that a selfish node decides to forward only its own transactions. That would lead to inflating its corresponding queue at honest nodes’ inbox buffers resulting in large delays only for its own transactions. The same happens for malicious nodes: as long as honest nodes follow the deficit round robin scheduler, transactions from attackers would not be forwarded.
How does a node know how many messages it can issue? It uses the rate setter. Based on this queue length, the rate setter uses the AIMD algorithm to regulate its own transaction generation rate. A similar algorithm is used by routers in the Internet to adjust traffic.
A node can monitor the length of its own queue. When this queue gets too long, it knows it’s issuing too many messages. A node increases their issuance rate slowly (Additive increase) until it’s queue gets too long. At this point, the node rapidly decreases its issue rate (Multiplicative decrease) to prevent upcoming congestion. Eventually, this makes the rate setter converge to a fair issuance rate, depending on the current traffic conditions.
As an additional benefit, the rate setter makes sure that latency for transactions issued by nodes following protocol is low. Nodes that do not follow this algorithm will flood their queues and increase their latency. This way, nodes are incentivized to use the rate setter.
What happens if malicious nodes don’t use the rate setter? What if they don’t care about increasing the latency, for example a spam attack? The following sections contain security measures we set in place that would trigger before an attacker could harm consistency.
What happens if the node does not use the rate setter? Then they will not reduce their rate when their queue grows and so their queue will continue to grow. Because the other honest nodes only have so much memory, no queue cannot be allowed to grow indefinitely. So, the ICCA has the blacklister to limit the queue lengths. Depending on the mana, a node is allowed to have a certain number of messages enqueued. When this number exceeds a certain threshold, the node is temporarily blacklisted, meaning that no other transactions from that node are added to the inbox for some time. If a node’s queue begins to explode, the new incoming messages for that given node are dropped.
An attacker can attempt to flood the network with large amounts of spam, directly trying to overload hundreds of nodes at once. To protect against such attacks we use a mechanism called Rate control as a coarse break, which physically limits the amount of messages one node can make. For this, we use an adaptive PoW as a rate control mechanism. For honest nodes, the PoW difficulty is relatively low. But if a node begins to spam, the difficulty exponentially increases, physically preventing them from issuing more messages. So, this mechanism would prevent the system from being completely overloaded.
As useful as the rate control mechanism is, it may not be necessary. The ICCA has the potential to be strong enough that this module is redundant. However, in the first version of the Coordicide protocol we want to guarantee the maximum protection, and potential subsequent decrease or removal of any PoW are left as a future optimization. On the GoShimmer testnet, we will study the interplay between the Adaptive PoW Rate control and the ICCA.
Zero mana messages
Unfortunately, the ICCA requires nodes to have some mana to issue messages, because 0 mana nodes will never receive credit and without credit, their messages will not be scheduled. This is important to avoid inexpensive attacks: without it low mana nodes could create bursts of transactions, thereby congesting the network and affecting ledger consistency.
We can offer various options to let zero mana nodes be able to issue their transactions. One way is via a faucet where any node can voluntarily offer its unused mana to new people willing to try out the network, for whatever use case they have in mind. Another option is to use the mana market where you can determine how and when to rent mana as you need. Additionally, another method includes buying tokens and transferring them by pledging them to the node.
Finally, we are investigating a way to dynamically adapt the minimum mana threshold depending on the percentage of active mana and on the current traffic conditions.
During periods of low congestion, “validated low mana nodes” are allowed to issue their transactions and will be scheduled by honest nodes. A “validated low mana node” is a node which satisfies the following two conditions: one, its mana is less than the minimum mana threshold; and two, the node has recently performed a large PoW. When congestion is low, then honest nodes will temporarily allow scheduling from these validated low mana nodes. Once congestion becomes large, validated low mana nodes cannot issue their transactions; they have to wait for low congestion periods.