The full article was originally published by Olaf van Wijk on Medium. Read the full article here.
How best to describe what I am trying to by quoting the problem statement from Serguei Popov?
From the #tangle-math discord channel on 20–11–2018:
Serguei Popov [IF]11/20/2018: “Let me call it “semipermanode problem”. It is of course difficult to store all the transaction since the beginning, since it requires a lot of space. On the other hand, many of those transactions need not be stored, because they are almost/completely useless (e.g. spam). Assume that you want to store only a specific set of transactions: say, the historical data from the temperature sensors captured in the center of Munich (which is of course important and needs to be stored forever since you believe that many people or entities will need it in the future). To prove that that data is genuine, you don’t need to store the whole Tangle; you just need a connected (to the “present time”) subgraph of the Tangle which contains the transactions of the interest. So, question: how to efficiently build-and-maintain connected subgraphs which contain “interesting” information? Also, we may need to have some “quick time-traversing chains” (so that it’s possible to go “quickly” to the past along them), but those can be issued by the semipermanode owners. And, the semipermanodes can charge something for access to the data. “In Bitcoin, you pay for transactions, but accessing the ledger is free. In IOTA, the transactions are free, but you may have to pay for accessing the ledger (in case you don’t want to store it yourself)” — this looks as a reasonable general principle.”
The challenge consists of multi sub-problems, one of them is actual data-storage. For that I started looking more into the inner workings of IRI and created a couchbase database adapter to have more fine-grained control over the ‘permanence’ of data storage. You can read the article here.
As Dr Popov’s says is that there is a need for connected subgraphs and this is the focus of the encoding mechanism I am describing here.
Specifically: “A space efficient path description”
The space efficient part in here is extremely important. Because if you want to store a specific transaction it would be ridiculous if you need 100x the space to describe what you actually want to store.
A requirement as well is that given a node that wants to obtain the data can request and validate the transactions immediately from a node holds the data (with a connected subgraph).
To achieve this I identified 4 unique descriptors that can describe the entire tangle. The benefit of just 4 descriptors is that each descriptor only requires 2 bits: