CCC – FAIR Foundation Forum

“The evolving role of DATA in the AI era “

18 September 2023  Leiden

“If we regulate AI and get it wrong, then the future of AI belongs to the Chinese“. When you hear a really challenging statement within five minutes of getting through the door, then you know that, in terms of conferences and seminars, you are in the right place at the right time. The seminar leaders, supported by the remarkable range of expertise displayed by the speakers, provided a small group with wide data experience with exactly the antidote needed to the last nine months of generative AI hype: a cold, clean, refreshing glass of reality. It was time to stop reading press releases and start thinking for ourselves.

FAIR’s leadership, committed to a world where DATA is findable, accessible, interoperable, and reusable, began the debate at the requisite point. While it is satisfying that 40% of scientists know about FAIR and what it stands for, why is it that when we communicate the findings of science and the claims and assertions which result from experimentation, we produce old style narratives for human consumption rather than, as a priority, creating data in formats and structures which machines can use, communicate and with which they can interact. After all, we are long past the point where human beings could master the daily flows of new information in most research domains: only in a machine intelligence world can we hope to deploy what we know is known in order to create new levels of insight and value.

So do we need to reinvent publishing? The mood in the room was much more in favourof enabling publishers and researchers to live and work in a world where the vital elements of the data that they handled was machine actionable. Discussion of the FAIR enabling resources and of FAIR Digital Objects gave substance to this. The emphasis was on accountability and consistency in a world where the data stays where it is, and we use it by visiting it. Consistency and standardisation therefore become important if we are not to find a silo with the door locked when we arrive. It was important then to think about DATA being FAIR “by design“ and think of FAIRificationas a normal workflow process.

If we imagine that by enabling better machine to machine communication with more consistency then we will improve AI accuracy and derive benefits in cost and time terms then we are probably right. If we think that we are going to reduce mistakes and errors, or eliminate “hallucinations“when we need to be careful. Some hallucinations at least might well be machine to machine communications that we, as humans, do not understand very well! By this time, we were in the midst of discussion on augmenting our knowledge transfer communication processes, not by a new style of publishing, but by what the FAIR team termed “nano publishing“. Isolating claims and assertions, and enabling them to be uniquely identified and coded as triples offered huge advantages. These did not end with the ability of knowledge graphs to collate and compare claims. This form of communication had built in indicators of provenance which could be readily machine assessed. And there was the potential to add indicators which could be used by researchers to demonstrate their confidence in individual findings. The room was plainly fascinated by the way in which the early work of Tobias Kuhn and his colleagues was developed by Erik Shultes, who effectively outlined it here, and the GO FAIR team. Some of us even speculated that we were looking at the future of peer review! 

Despite the speculative temptations, the thinking in the room remained very practical. How did you ensure that machine interoperability was built in from the beginning of communication processing? FAIR were experimenting with Editorial Manager, seeking to implant nanopublishingwithin existing manuscript processing workflows. Others believed we needed to go down another layer. Persuade SAP to incorporate it (not huge optimism there)?Incorporate it into the electronic lab notebook? FAIR should not be an overt process, but a protocol as embedded and unconsidered and invisible as TCP-IP. The debate in the room about how best to embed change was intense, although agreement on the necessity of doing so was unanimous.

The last section of the day looked closely at the value, and the ROI, of FAIR. Martin Romacker (Roche) and Jane Lomax (SciBite) clearly had little difficulty, pointing to benefits in cost and time, as well as in a wide range of other factors. in a world where the meaning as well as the acceptance of scientific findings can change over time, certainty in terms of identity, provenance, versioning and relationships became foundation requirements for everyone working with DATA in science. Calling machines intelligent and then not talking to them intelligently in a language that they could understand was not acceptable, and the resolve in the room at the end of the day was palpable. If the AI era is to deliver its benefits, then improving the human to machine interface in order to enable the machine to machine interface was the vital priority. And did we resolve the AI regulatory issues as well? Perhaps not: maybe we need another form to do that!

The forum benefited hugely from the quality of its leadership, provided by Tracey Armstrong (CCC) and Barend Mons (GO FAIR). Apart from speakers mentioned above, valuable contributions were made by Babis Marmanis (CCC), Lars Jensen (NFF centre for protein research) and Lauren Tulloch (CCC).