Efficient Use of Iota MAM for Storing Machine and Sensor Data
The full article was originally published by TangleKit on Medium. Read the full article here.
How to Store and Update Machine Data on IOTA
Proposal for a Message Structure using IOTA MAM
Masked Authenticated Messaging (short MAM) allows to publish sensor data on the IOTA–tangle. This data is stored in so called messages, bundles of IOTA transactions automatically distributed on the nodes of the IOTA network and stored in an immutable form.
MAM is focused on messaging, which means publishing up to date data for further use within the IOTA environment, like the upcoming smart contracts. As this data is already published on the immutable IOTA tangle, it could also serve as long-term documentation. This could help to bring transparency into processes, where partner companies or customers are interested in the data. Possible examples are the tracking of production and supply chain processes where customers get reliable information about the source and production circumstances of their products.
This article therefore focuses on long-term storage and use of data. New concepts are introduced to improve the MAM functionalities to better serve the described examples. While the current MAM focuses on messaging, additional features could be of use to search through and process data.
Experiences with MAM
In a previous article we presented a prototype sensor station monitoring dikes (also: levees, embankments etc.). This sensor station measures different exemplary environmental conditions and publishes them on the IOTA-tangle using Masked Authenticated Messaging (MAM). We used the sensor station to learn about the use of IOTA MAM.
Beside the limited duration that transactions and therefore data remains available on the IOTA-network (a result of snapshots and the current inavailability of permanodes) there are two other important points to notice:
#1: Linked List offers no Random Access
The first point is a direct result of MAMs (linked-) list structure for channels. This structure works fine if a client keeps up with the channel and always reads the newest information. If a client only needs sporadic access to specific information, this structure only allows for relatively inefficient access. When permanodes and long running sensor channels were combined, very long lists would be the result. A sensor station publishing its data every 10 seconds would result in over 52000 messages in a single year. Searching for the sensor readings of a specific day would require iterating through all these messages until the required messages are found.
#2: Handling of Erroneous Data
During the operation of the sensor station erroneous data was generated multiple times. These errors range from hardware damages (the water temperature sensor failed and gave quite random results) to human errors (like starting the sensor station operating system to apply changes without disabling the sensor-functionality first). These erroneous sensor readings where uploaded to the IOTA-tangle automatically, resulting in false data being mixed into the correct.
While such errors will become less likely with a non-prototype sensor station and established operating procedures, a realistic scenario would include huge numbers of sensor stations running in parallel, making erroneous data on the tangle very likely.
These erroneous records mixed into the correct records limit the value of the latter for long term documentation and as basis for further development of the measured system.
MAM (and its successor MAM2) focus on encryption and authentication of data published on the IOTA-tangle. In this article we propose another layer building upon MAM, structuring the published data for improved usability.
The approach has two main goals, solving the previous described limitations:
- Index the records within the messages to allow fast access
- Allow records to be corrected or declared invalid, while keeping the history for data integrity
The core idea is to link messages together like MAM, but in a more complex structure. Additional messages are added to hold metadata.
While adjustments are necessary, the approach is equally thinkable for MAM and MAM2.
To find a message containing a specific record, either iteration though the messages or additional metadata is necessary. The assumption is that every record has a unique ID, with newer records having higher IDs. A simple example would be a timestamp, in its form as Unix epoch. A search possible criterium would be everything between two timestamps.
While metadata for an index could be stored outside the tangle, this would enable manipulations by referencing alternative messages, which could also be published on the tangle. The approach presented in this article publishes all metadata on the channel, guaranteeing data integrity.
Figure  shows a simplified example of the proposed index-structure. The black block represents the entry point, the first message on the channel. From there the yellow blocks form a linked list. The numbers in the blocks represent the highest record-ID contained.