FAIR Data Principles & Digital Twins

IOTICS

- Last Updated: December 2, 2024

IOTICS

- Last Updated: December 2, 2024

In plays, films, books, and music, there is often a key moment where everything in the story comes together. Software and data engineers know these moments, when after days of work you get everything together in your code and data so you can finally write and run the ‘if’ statement. A version of the “if” statement might be, if river.level > X and rainfall.forecast > Y, then…. When it comes to rainfall and rivers, the ‘then’ part could involve millions of pounds of damage, weeks of transport disruption, and possible loss of life. We will use this flooding algorithm to explore how fair data principles and digital twins could interact and cooperate.

'In a FAIR world, computers can find and understand data, but we still can’t program them with that “if” statement when the data is in large datasets.' -IOTICS

The "If" Statement

The ‘if’ statement is our first kind of data interaction. A computer algorithm brings two pieces of data together so they can be compared, and some insight can be gained. But what those pieces of data are and how they get to the “if” statement is more complex than you might think.

There’s no search engine for data: not publicly and rarely within enterprises. There are attempts at searchability such as data.gov.uk, but they are intended for people, not algorithms. It is said that data scientists spend at least 50 percent of their time looking for data rather than looking at data. This epic waste of time is because data is hidden, deliberately or unintentionally, in silos, in datasets, behind APIs, or in program-unfriendly formats, such as PDF. This is not findable by machines, but what if it was?

Access & Interoperability

When it comes to access and interoperability, the two are linked. A computer may be able to find some data but may not be able to understand it. It would help interoperability if the date could have some metadata to indicate that the river level was measured in meters and the rainfall in millimeters, for example. We now have the find, access, and interoperate, and the data interaction in the “if” statement is re-using that data for our new purpose.

This is the basis of the FAIR data principles, conceived by a consortium of leading scientists and organizations to ensure that scientific data sets could be found and used by machines, with minimal human intervention. FAIR stands for Findable, Accessible, Interoperable, and Reusable – and it’s going mainstream.

The FAIR World

In a FAIR world, computers can find and understand data, but we still can’t program them with that “if” statement when the data is in large datasets. In our flooding scenario, what our algorithm also needs is the river level at a specific location and the rainfall forecast at a different location, probably well upstream from the place where the flood is likely to occur. So, even if our algorithm can find the right dataset, it still needs to know how to run a query against the dataset to find the data it wants.

There is an element of granularity of the data that is important - and that’s where digital twins come in. Digital twins are a virtualization of an asset’s data. The asset itself is a useful level of granularity here. Our algorithm needs to choose the appropriate rainfall forecasts and required river levels. Metadata about the assets beyond their location might also be useful. Knowing who operated them would help our algorithm assign weight to the readings if some operators’ data proved more reliable and accurate than others. Having some provenance of the data as actually coming from that twin and the twin really being the one operated by the Environment Agency, for example, would build trust in the output of our algorithm. The exchange of metadata between twins to establish trust and access is our second data interaction.

Timeliness

The final step to get to the ‘if’ statement is about timeliness. Homeowners won’t appreciate being told on Wednesday that a flood would occur on Tuesday when their houses are already knee-deep in muddy water. The data needs to flow between the twins and the algorithm as close to real time as possible so that the predictions are available in a timely way. This is not just important in our flooding scenario; it’s important in business, where latency between something happening and the business reacting to it can cost millions.

We have reached a point where we have an algorithm running, exchanging data with digital twins. But what does the algorithm do in the ‘then’ part of the ‘if’ equation? What if it could share the data back with other digital twins, or create new twins of the likely flood locations and have them share into a growing ecosystem of cooperative twins?

Data & Twin Interactions

If the algorithm has its own digital twin, it simplifies the model where everything is a twin and creates symmetry. The twin of the algorithm interacts with the twins of the data sources. Data interactions are twin interactions and twin interactions are the exchange of data and metadata between twins. If fair data principles and twins could interact and cooperate, imagine what transformations could be achieved.