We have been over this ground before , you may be thinking. Isn’t this the one where I say there are now more readers in machine intelligence than in bone , blood and tissue . And that these machine readers communicate effectively with each other , perform advanced analysis , and are increasingly able to perform as writers of at least parts of articles as well as readers. So traditional human roles in scholarly communications – like reading all the pertinent articles , or doing the literature review , or managing the citations – can be increasingly automated . Yes , it is that article again , but with a new twist , courtesy of an interview conducted by Frontiers with Professor Barend Mons , who holds the chair in BioSemantics at the University of Leiden .Going behind the interview and picking up some of his references quickly showed me how facile I was being when I first described here the trends that I was seeing . If we follow Dr Mons , then we turn scholarly publishing on its head – almost literally . 

Let’s start where all publishing begins . Scholarly communications reflects the way in which humans behave . They tell stories . The research article is a structured narrative . As publishers we have always known that narrative was the bedrock of communication . So the issues that Dr Mons broaches are startling , obvious and critical . Narrative is not the language of machines . Data is the language of machines . In order for our machines to understand what we are talking about we have to explain what we mean . So we turn content into data and then add explanations , pointers and guidelines in terms of metadata . And even then we struggle , because we still see the research article as the primary output of the research process . But as far as the machine-as-reader is concerned this is not true . The primary output of the research process is the evidential data itself and what emerges from it . The research article , in a machine-driven world , is a secondary artefact , a narrative explanation for humans , and one which needs far more attention than it currently gets if it is ever to be of real use to machines . 

So we are in a mess . The availability of data is poor and inconsistent ( my words ) . Mons points to the speed of theoretical change in science – knowledge is now longer dominated for a generation by a scholar and his disciples ( he quotes Max Planck to the effect that in his day science progressed funeral by funeral ). The data is not prepared consistently and ( again , my words ) publishers are doing little to coordinate data availability , linkage , and preparation . They do not even see it as their job . Dr Mons , as an expert in the field ( he was one of the chief proponents of the FAIR principles and is regarded as a leading Open Science advocate. He is also the elected president of CODATA , the research data arm of the International Science Council ), plainly sees as urgent the need to enrich metadata and enable more effective machine-based communication . When I look back on the past decade of entity extraction , synonym definition, taxonomy creation  and ontology development I find it dispiriting that we are not further on than we are . But then Dr Mons directed his listeners towards BioPortal ( https://bioportal.bioontology.org ). This helps to visualise the problem . 896 ontologies in biomedicine alone , creating 13,315,989 classes . Only the machine can now map evidence produced under different ontologies at scale , and by Dr Mons account it needs standardisation , precision and greater detail to do so . 

If the people we call publishers today are going to take the challenge seriously of migrating from Journals and monographs to becoming the support services element of the scholarly communications infrastructure , then the challenge begins here . We need a business model for enhancing , standardising and co-ordinating data availability . This is not about repositories , storage or Cloud , necessarily – but it is all about discoverable availability , data cataloguing and establishing improved standards for metadata application . And there is an implied challenge here . Dr Mons nods towards the role of advanced data analysis in helping the discovery of Covid 19 vaccines . But the task he describes is not one of using what we know to track more dangerous and contagious variants . He sees the challenge as the requirement to use our data and analytical powers to profile all of the feasible variants which could possibly become a threat, and developing our vaccines to the point where they met those threats before they arose . If our data is not re-unable and is unfit for purpose we do not get there . The depth of the challenge could not be clearer . 

Some participants clearly see a way forward . The  recent announcement of a strategic relationship between Digital Science and Ontochem is an encouraging example . But most of us for the most part are still article-centric and not data-driven  . We urgently need business models for data-enhancement , to do what we should have been doing over the past decade . 

“ Its as though the creative process is no longer contained within an individual skull, if indeed it ever was. Everything today is to some extent the reflection of something else “. William Gibson . (‘ Pattern Recognition ‘ , 2003)

We are now fairly used to AI . There is nothing very unexpected then about intelligent systems that write sports reports or business news . Or fully synthesised AI voiceovers in advertisements , which may also be created entirely by CGI . Creating new Beatles-alike pop songs is likewise six years in the past for the Sony CSL Lab Flow Machines . Even replacing the missing pieces of Rembrandt’s ‘Night Watch’ or creating a wholly new ‘Rembrandt’ seems not entirely unusual .Indeed, with GitHub’s Co-Pilot , we even have a picture of the machine sitting at our elbow writing the code that we were thinking about writing , a vision of some future scenario of autonomous machine creativity .  Given enough data , enough machine intelligence and enough machine learning capacity , and we believe that anything is possible . So why , in terms of our laws and regulations , do we fail to register the capacity of intelligent machines to generate original and creative work which cannot belong entirely to the owners of the machines or the writers of the programs , or the owners of the data ? 

A very informative seminar organised by IBiL, University College , London , tackled the issues surrounding copyright and AI last week . Excellent speakers from the UK Intellectual Property Office and from the music rights licensing body laid out the current steps to review the law and the conservative view of protecting the livelihoods of living creators . All of this is clear and understandable . We know that the law will always be five years behind the front line of innovation , and we certainly want the rights of the live creators of original works to be protected . But there is also a real worry here that we will delay or inhibit development work vital to the growth of a strong AI-based sector of the economy , and within it the creativity and originality that should be associated with this. Tobias McKenney of Google spoke graphically of the iterative processes of modelling , adjusting and remodelling and wondered if current provision protected the model , or the process , or the output ?  He also pointed out the need to regulate against bias , and the use of selective data , through audit and certification, and for the protection of AI creativity to be global . Martin Senftleben, from Amsterdam University , proved a fascinatingly different professor of IP in that he argued that the objectives of our society were more important than the legal objectives , and saw copyright as something aimed at restricting acts rather than encouraging them . So perhaps AI creativity did need a boost , and perhaps a new neighbouring right , a ‘single equitable payment ‘ , was needed to secure the data availability that would in turn stimulate development . 

The quotation at the head of this piece reminds us that all art , and science , is derivative in some sense . Not only do we stand on  “ the shoulders of giants “ but we look through their eyes and make use of their brains as well . Perhaps it would be better to shelve the debate on whether a machine can be original and creative , and concentrate on ensuring that the data is available and licensable to make AI the effective boon to our society that it certainly can be . This means arguing for something like a re-use right which makes data holders explain why their data cannot be used , neighbouring rights around standard terms and payments to licensing societies on data re-use , and standard core terms for data and text mining licences . Let’s get this up the agenda , and leave the arguments about ownership , creativity and originality until later, until we are prepared to debate whether a machine can have a legal personality – or , indeed , a simulacrum ot consciousness .  


Seminar :





AI voice synthesis www.WellSaidLabs.com and www.VocaliD.ai

https://www.youtube.com/watch?v=LSHZ_b05W7o  Daddy’s Car Flow Machines Sony CSL

NLG data narrative platforms.  www.narrativescience.com ; www.automatedinsights.com ; www.yseop.com;  www.primer.ai ; www.arria.com


keep looking »