If you are an STM publisher reading this, then it may already be too late for you to act decisively enough to put yourself in the vanguard of change. For I am not the first to say what I am about to say, and there is now a good literature based around the idea that the network is a world of small beginnings, followed by mass change at unprecedented rates that catch whole industries unawares. We are coming to one of those points, and my growing realization was triggered into certainty by being sent a link to a Harvard Business Review article from November 2010 (thank you, Alexander van Boetzelaar for making sure I saw this).  Since HBR as an old world publisher makes a business of paid-for reprints I cannot give the link, but it is reprint R1011B.

The article is called “The Next Scientific Revolution”, by Tony Hey, a director of Microsoft Research and one of the Fourth Paradigm people who made such an impact in 2009. Their arguments, pioneered by the late Jim Gray, saw scientific enquiry gathering force as the experimental methods of early Greece and China were subsumed into the modern theoretical science of the Newtonian age, and then carried forward through computation and simulation into the age of high performance computing in the last century. So now we stand on the verges of a fourth step , the ability to concentrate unprecedented quantities of data and apply to it data mining and analytics, that, unlike the rule-based enquiries of the previous period, are able to throw out unsuspected relationships and connections that in turn are the source of further enquiry.

All of this reminds me of Timo Hannay of Nature and his work with the Signalling Gateway consortium of cell science researchers based in San Diego. I am not sure how successful that was for all parties involved, and to an extent it does not matter (especially given the lead time in experience given to Nature by this work). To me this was a signal of something else: on the network the user will decide and make the revolutionary progress, and we “publishers” will have to be ready in an instant to follow, developing the service envelope in which users will be able to do what they need to do. At the moment we are all sitting around in STM talking about overpublishing, the impossibility of bench science absorbing the soaring output of research articles, or libraries to keep up on restricted budgets, when the real underlying problem we are not seeing is the fact that the evidence behind those articles is “unpublished” and unconcentrated, and that as the advanced data mining and analytics tools become increasingly available they have insufficient scale targets in terms of collected data.

Of course, there are big data collections available. And their usage and profitability is significant. Many are non-profit and some are quasi-monopolistic. But I see huge growth in this area, especially in physics, chemistry and the life sciences, to the point where “evidence aggregation and access management and quality control” is the name of the business, not journal publishing. Mr Hey comments in his article “Critically, too, most of us believe scientific publishing will change dramatically in the future.”  “We foresee the end product today – papers that discuss an experiment and its findings and just refer to datasets – morphing into a wrapper for the data themselves, which other researchers will be able to access directly over the internet, probe with their own questions, or even mash into their own datasets in creative ways that yield insights that the first researcher may never of dreamed of.”

What does “access directly” mean in this context? Well, it could mean that universities and researchers allow outside access to evidential data, but this poses other problems. Security and vetting loom large. Then again, evidential peer review may be a requirement – was the evidence created accurately, ethically or using reliable methodologies? Plenty of tasks for publishers here. Then again, can I hire tools to play in this sandpit? Is the unstructured content searchable, and is metadata consistent and reliable? These are all services “publishers” can offer, in a business model that attracts deposit fees for incoming data as well as usage fees. But there will be natural monopolies. It may be true, as Mr Hey claims, that “through data analysis scientists are zeroing in on a way to stop HIV in its tracks”, but how many human immunodeficientcy virus data stores can there be? Right, only one.

So the new high ground will have fewer players. A few of those will be survivors from the journal publishing years, and I hope one at least will have the decency to blush when recalling the pressure put on people like me, in my EPS days, to remove the ever-growing revenues of the science database industry (human genomics, geospatial, environmental, for the most part), from the STM definition since it was not “real” science publishing – and reduced their share-of-market figures! But then again, maybe they should look around them. Isn’t what is being described here exactly what LexisNexis are doing with Seisint and Choicepoint, or Thomson Reuters with Clearforest. And why? Because their users dictate that this shall be so. For the same reason this is endemic in patent enquiry: see my erstwhile colleague David Bousfield anatomizing this fascinatingly only last week (https://clients.outsellinc.com/insights/index.php?p=11416). And why have market-leading technology companies in this space – think of MarkLogic and their work on XML and the problems of unstructured data – made such an impact in recent years in media and government (aka intelligence)? I see a pattern, and if I am right, or even half right, it poses problems for those who do not see it.

I rest my case. Next Friday I shall do the Rupert Murdoch 80th birthday edition, for which I plan to bake a special cake!


Name (required)

Email (required)


Speak your mind