Talking to Machines in Their Language , not Ours

Filed Under Uncategorized | Leave a Comment

We have been over this ground before , you may be thinking. Isn’t this the one where I say there are now more readers in machine intelligence than in bone , blood and tissue . And that these machine readers communicate effectively with each other , perform advanced analysis , and are increasingly able to perform as writers of at least parts of articles as well as readers. So traditional human roles in scholarly communications – like reading all the pertinent articles , or doing the literature review , or managing the citations – can be increasingly automated . Yes , it is that article again , but with a new twist , courtesy of an interview conducted by Frontiers with Professor Barend Mons , who holds the chair in BioSemantics at the University of Leiden .Going behind the interview and picking up some of his references quickly showed me how facile I was being when I first described here the trends that I was seeing . If we follow Dr Mons , then we turn scholarly publishing on its head – almost literally .

Let’s start where all publishing begins . Scholarly communications reflects the way in which humans behave . They tell stories . The research article is a structured narrative . As publishers we have always known that narrative was the bedrock of communication . So the issues that Dr Mons broaches are startling , obvious and critical . Narrative is not the language of machines . Data is the language of machines . In order for our machines to understand what we are talking about we have to explain what we mean . So we turn content into data and then add explanations , pointers and guidelines in terms of metadata . And even then we struggle , because we still see the research article as the primary output of the research process . But as far as the machine-as-reader is concerned this is not true . The primary output of the research process is the evidential data itself and what emerges from it . The research article , in a machine-driven world , is a secondary artefact , a narrative explanation for humans , and one which needs far more attention than it currently gets if it is ever to be of real use to machines .

So we are in a mess . The availability of data is poor and inconsistent ( my words ) . Mons points to the speed of theoretical change in science – knowledge is now longer dominated for a generation by a scholar and his disciples ( he quotes Max Planck to the effect that in his day science progressed funeral by funeral ). The data is not prepared consistently and ( again , my words ) publishers are doing little to coordinate data availability , linkage , and preparation . They do not even see it as their job . Dr Mons , as an expert in the field ( he was one of the chief proponents of the FAIR principles and is regarded as a leading Open Science advocate. He is also the elected president of CODATA , the research data arm of the International Science Council ), plainly sees as urgent the need to enrich metadata and enable more effective machine-based communication . When I look back on the past decade of entity extraction , synonym definition, taxonomy creation and ontology development I find it dispiriting that we are not further on than we are . But then Dr Mons directed his listeners towards BioPortal ( https://bioportal.bioontology.org ). This helps to visualise the problem . 896 ontologies in biomedicine alone , creating 13,315,989 classes . Only the machine can now map evidence produced under different ontologies at scale , and by Dr Mons account it needs standardisation , precision and greater detail to do so .

If the people we call publishers today are going to take the challenge seriously of migrating from Journals and monographs to becoming the support services element of the scholarly communications infrastructure , then the challenge begins here . We need a business model for enhancing , standardising and co-ordinating data availability . This is not about repositories , storage or Cloud , necessarily – but it is all about discoverable availability , data cataloguing and establishing improved standards for metadata application . And there is an implied challenge here . Dr Mons nods towards the role of advanced data analysis in helping the discovery of Covid 19 vaccines . But the task he describes is not one of using what we know to track more dangerous and contagious variants . He sees the challenge as the requirement to use our data and analytical powers to profile all of the feasible variants which could possibly become a threat, and developing our vaccines to the point where they met those threats before they arose . If our data is not re-unable and is unfit for purpose we do not get there . The depth of the challenge could not be clearer .

Some participants clearly see a way forward . The recent announcement of a strategic relationship between Digital Science and Ontochem is an encouraging example . But most of us for the most part are still article-centric and not data-driven . We urgently need business models for data-enhancement , to do what we should have been doing over the past decade .

Jul

28

Talking to Machines in Their Language , not Ours

Comments

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin