When I first talked about open access and the decline of the scientific journal, 20 years ago, it was fortunate that I had Dirk Haank available to tell the world not to listen to demented consultants with no skin in the game. When I spoke some 15 years ago, about the inevitable declined of the subscription science Journal, it was pleasing to hear Kent Anderson reassuring us, all that I was simply a mad dog out on license. Now, as I read the strategy revision for their open access policy published by the Gates Foundation, on April 7, I am very happy to indulge the Panglossian philosophers of the scholarly communications marketplace once again and while I wait for them to tell us that nothing has really changed and everything will go on  just as before in the best of all journal, publishing worlds, I am heading down to the marketplace to link arms with Cassandra. We shall chant “ O woe! O woe ! The day of the open access, journal is nearly over, and it’s end can be told with confidence!“

Of course, this might take another 15 years. I’ve reached an age myself when time is not a very worrying factor. In the 57 years that have passed since I started work in the educational and academic publishing sector I have been acutely, aware that commercial publishers, while being politely prepared to entertain speculation about the future, have necessarily to attend to  this year’s financial results and the expectations of investors. When my speculations were deemed too far-fetched, my clients in the boardroom tended to say “our strategies are clear – follow the money!” Today, my response to them would be quick, and immediate:“Watch what the funders are doing with the money, and then, follow the data! “

Many will argue that Gates is a small funder in terms of article contributions. It’s work creates around 4000 articles a year, and through its payment of APCs it contributes a mere $6 million per annum  to the coffers of scholarly publishers . But it is an influential player and in its revised open access strategy it may have detected something which is present in the minds of the larger funders, and eventually of governments themselves. What is the duty of the funder in terms of ensuring that articles detailing research results are available to the community at large? In the time of Henry Oldenberg in the 1660s, the answer would have been to get them into the Transactions of the Royal Society. Today, it is to get them onto an authorised pre-print server with a CC-BY license as soon as possible after the research is completed and the article is ready, and to accompany it by linked datasets of the evidential material on a similar license on a similarly approved site. Speed is of the essence, access to all is key and critical. Subsequent reuse of the material in a journal, subsequent acts of peer review and downstream reuse are not the key concerns of the funding foundation. By this fresh twist in the end of its open access policy, the Gates Foundation have saved $6 million, which can now go back into the research fund . And by using F1000 , who already supply the internal Gates, publishing systems, to create F1000 Verixiv, the pre-print server of choice, they have provided tools, which researchers can use (or not) to fulfil the mandate.

If other funders follow this route, then the scholarly communications research community in science faces a choice. For many, more pressurised by getting the next research program underway than anything else, it will be simple to leave things there, and not necessarily press forward to eventual journal publication. For others, given the needs of institutions for publication, to secure tenure or satisfy other funders requirements, publication will remain essential until the way in which science results are assessed, begins to change.One of the things that I recall from conversations with Eugene Garfield, in the 1980s , was his repeated assertion that better ways than citation indexing would be found to assess the worth of science research articles. Like Winston Churchill on democracy, he maintained vigorously that what he had created was the “best worst way“ of doing the job. The challenge now, I would suggest, is whether some latter day Garfield can perform his 1956 breakthrough, and create a way of indexing and illuminating what is good science for a modern world. That measurement and indexation has to be available as soon as possible after the first appearance of the claim, wherever it appears in digital form.In the meanwhile, getting the knowledge immediately into the marketplace, and getting the data available to aide reproduceability supports other research in progress and supports integrity. And that is critical for funders and researcher alike.

Such new systems will emerge in their own time. In the meanwhile the way we measure, achievement, t which have been gamed and manipulated endlessly and need in any case to be renewed or replaced , experienceincreasing pressure,. This applies as much to peer review as anything else. If publishers are to stay in the loop, then they need to change their relationships as wellAs the relationship between Gates, andF1000 shows, whatever takes place in terms of “publication “ and where it takes place in the ecosystem may become more important to the institution or the funder to the researcher or the research lab. In terms of attracting sponsorships, investment, and industrial research cooperation,  universities may have more interest in publication than most, especially if the research community sort out a better way of ranking science than by citation indexng.(Footnote: what a clever man that Vitek Tracz was! The Tesla of science publishing! Long after his retirement, we shall be using the tools he created for white label sponsored publishing! )

So there it is! Cassandra and I have now done a full lap of the forum, and I can feel that the rotten vegetables are getting ready to fly through the air! next time, if I survive, I plan to “follow the data” myself, and look at the role of publishers as data aggregators, data curators, and data traders. and we shall remember the old saying: “how do you know if the searcher is a person or machine? Well, only machines read the full article!“

Two contrasted views of the future struggle against each other whenever we sit down to talk data strategy. One could be called the Syndication School. It says “forget getting all the data in one environment – use smart tools and go out and search it where it is, using licensing models to get access where it is not public.” And if the data is inside a corporate fire wall as distinct from a paywall? MarkLogic’s excellent portal for Pharmaceutical companies is an example of an emerging solution. 

But what happens if the data is in documented content files with insufficient metadata? Or if that metadata has been applied differently in different sources? Or three or four different sorts of content as data need to be drawn from differently located source files which need to be identified and related to each other before being useful in an intelligent study process. Let’s call this the Aggregation School – assembly has to take place before process gets going. But let’s not confuse it with bulk aggregators like ProQuest. 

And now put AI out of your mind. The term is now almost as meaningless as a prime ministerial pronouncement in the UK. This morning saw the announcement of three more really exciting new fundings in the Catalyst Awards series from Digital Science. BTP Analytics, Intoolab and MLprior are all clever solutions using intelligent analysis to service real researcher and industry needs. But the need to label everything AI is perverse: those who grew through 25 years of expert systems and neural networks will know the difference between great analytics and breakthrough creative machine intelligence. 

But while we are waiting, real problem-tackling work is going on in the business of aggregating multi- sourced content. The example that I have seen this week is dramatic and needs wider understanding. But let’s start with the issue – the ability, or inability, especially in the life sciences, for one researcher to reproduce the experiments created and enacted and recorded in another lab simply by reading the journal article. The reasons are fairly obvious – data not linked to article or not published; methodology section of article was a bare summary (video could not be published in article?); article only has abbreviated references section; article does not have full metadata coverage sufficient to discover what it does have; metadata schema used was radically different to other aligned articles of interest; relevant reproducibility data is not in article but in pre-print sever; conference proceedings; institutional or private data repositories: annotations, responses to blogs or commentaries, in code repositories; in thesis collections; or even in pre-existing libraries of protocols etc. And all or any of these may be Open, or paywalled.

In other words, the prior problem of reproducibility is not enacting the experiment by producing the same laboratory conditions – it is in researching and assembling all the evidence around the publication of the experiment. This time-consuming detective work is a waste of research time and a constraint on good science, and calling for AI does not fix it. 

But profeza.com claim they are well down the arduous track towards doing so. And it seems to me both a fair claim and an object lesson in the real data handling problems when no magic wand technology can be applied. Profeza, an Indian-based outfit founded by two microbiologists, started with the grunt work and are now ready to apply the smart stuff. In other words they have now made a sufficient aggregation of links between the disparate data sources listed above to begin to develop helpful algorithms and begin to roll out services and solutions. The first, CREDIT Suite, will be aimed at publishers who want to attract researchers as users and authors by demonstrating that they are  improving reproducibility. Later services will involve key researcher communities, and market support services for pharma and reagent suppliers as well as intelligence feeds for funders and institutions. It is important to remember that whenever we think of connecting dispersed data sets the outcome is almost always multiple service development for the markets thus connected. 

Twenty years ago publishers would have shrugged and said “if researchers really want this they can do it for themselves. Today, in the gathering storm of Open, publishers need to demonstrate their value in the supply chain before the old world of journals turns into a pre-print sever before our very eyes. And before long we may have reproducibility factors introduced into methodological peer review. While it will certainly have competitors, Profeza have made a big stride forward by recognising the real difficulties, doing the underlying work of identifying and making the data linkages, and then creating the service environment. They deserve the success which will follow.

keep looking »