Dear reader , I am aware that I have been a poor correspondent in recent weeks , but in truth I have been doing something I should have done long ago : gaining some experience of AI companies , talking to their potential customers and reading a book . Lets start at the end and work backwards . 

The book that has eaten the last week of my life is Edward Wilson-Lee’s fine new publication ,The Catalogue of Shipwrecked Books , which describes the eventful life of Christopher Columbus’ illegitimate son , Hernando , and his attempts to build a universal library of human knowledge . Hernando collected printed works , including pamphlets and short works , in an age when many Scholars then still regarded all print as meretricious rubbish . He built a catalogue of his collection , and then realised that he could not search it effectively unless he knew what was in the books , so started compiling summaries – epitomes – and then subject indexing, as well as inventing hieroglyphs to describe the physical properties . In other words , in the 1520s in Seville he built an elaborate metadata environment , but was eventually defeated by the avalanche of new books pouring out of the presses of Venice and Nuremburg and Paris . Wilson-Lee very properly draws many parallels with the early days of the Internet and the Web . 

As i closed this wonderful book , my mind went back to an MIT Media Lab talk in 1985 given by Marvin Minsky . We need reminding how long the central ideas of AI have been with us . At the end of his talk , the Father of AI kindly took questions , and a tame librarian in the front row asked “ Professor , If you were looking back from some inconceivably distant date , like , say , 2020, what would surprize you that you have in 2020 but which we do not have now ? “ . After a thoughtful moment , the great man replies “ Well , I guess that I would praise your wonderful libraries , but  still be surprized that none of the books spoke to each other “ . At that he left the room , but from then the idea of  books interrogating books , updating each other and creating fresh metadata and then fresh knowledge in the process of interaction has been part of my own Turing test . So I find it easy to say that we do not have much AI in what we call the information industry . We have a meaningless  PR AI , a sort of magic dust we sprinkle liberally ( AI-enhanced , AI-driven , AI- enabled etc ) but few things pass the “ books speaking to books and realising things not known before “ test 

And yet we can and we will . The key questions are , however: will current knowledge ownership permit this without a struggle , and will there be a dispute over the ownership of the results of these interactions ? This battle is already shaping up in academic and commercial research , so it was dispiriting to find when talking to AI companies that it seems there is really no business model in place yet enabling co-operation . Partly this is a problem of perception . Owners and publishers see the AI players as technicians adding another tier of value under contract – and then going away again . The AI software developers see themselves as partners , developing an entirely new generation of knowledge engine . And neither of them will really get anywhere until we all begin to accept the implications of the fact that no one , not even Elsevier , as enough stuff in one place to make it work at scale . And while one can imagine real AI in broad niches — Life Sciences – the same still applies . And if we try it in narrow niches , how do we know that we have fully covered the crossovers into other disciplines which have been so illuminating for researchers  in this generation ? In our agriscience intelligent system how much do we include on food packaging , or consumer market research , or  plant diseases, or pricing data ? 

So what happens next ? In the short term it is easy to envisage branded AI – Elsevier AI , Springer Nature AI ? I am not sure where this gets us . In the medium term I certainly hope to see some data sharing efforts to invest in AI partnerships and licence data across the face of the industry . It is true that there are some neutral players – Clarivate Analytics for example and in some ways Digital Science – who are neutral to the knowledge production cycle and have hugely valuable metadata collections . They could be a vital building block in joint ventures with AI players , but their coverage is still narrow , and in the course of the last month I even heard a publisher say “ I don’t know why we let Clarivate use our data – we don’t get anything for it ! “. 

Of course , unless we share our data we are not going to get anywhere . And given the EU Parliament rejection of data metering and enhanced copyright protection last week all these markets are wide open for for massive external problem solving – who remembers Google Scholar ? The solution is clear – we need a collaborative model for data licensing and joint ownership of AI initiatives . We have to ensure that data software entrepreneurs get a payback and that investment and data licensing show proper returns , just as Hernando rewarded the booksellers who collected his volumes all across Europe . In a networked world collaboration is often said to the the natural way of working . It is probably the only way that AI can be fully implemented by the scholarly communications world . Hernando died knowing his great scheme had failed . AI will succeed if it shows real benefits to research and those who fund it , As it succeeds it will find other ways of sourcing knowledge if those who commercially control access today are not able to find a way of leading the charge , and not dragged along in its wake . 


keep looking »