Sometimes you go to a conference that just crackles with the excited atmosphere that surrounds the moment that has come. When Houlihan Lokey were putting together their conference on data and analytics, which took place last week, I can well imagine that there was a conversation that went “we need to attract 150 at least so let’s invite 350“. We have all done it. And then comes a day when almost 300 of the invitees turn up, it’s standing room only in front of the coffee urn, and the room pulsates with conversation, networking, and commentary. So it was at the Mandarin Oriental in London last Wednesday, and there were other virtues as well. Working in panels of corporate leaders and entrepreneurs, a short conference with short sessions had real insight to offer. There is a lesson there for all of us still indeed to put on 3 day events – short and intensive and double track does leave a worry that one might have missed something as well as an appetite for more. 

After a Keynote by Phil Snow, the CEO of FactSet, the conference resolved into four panels covering insurance, research and IP, risk and compliance, and lastly a group of founders talking about their companies. And while companies like FactSet now take a fully integrated view of the marriage of the content and technology with data and analytics, it is also clear that companies in the sectors covered straggle across the entire spectrum from a few APIs and data feeds, right through to advanced algorithmic experimentation and  prototyped machine learning applications. And everywhere we spoke about what AI might mean to the business. But no where did we define what exactly that might mean, or demonstrate very tangibly  real examples of it in action. And this for me strengthens a prejudice. It is one thing to look back on the algorithms that we have been using for five years and refer to them in publicity as a “AI -driven service”, but quite another thing to produce creative and decision-making systems  capable of acting autonomously and creatively. 

Yet the buzz of conversation in tearoom was all about people wanting to take advantage of the technology breakthroughs and data availability, and wanting to invest in opportunistic new enterprises. This is much better than the other way round, of course: many of us remember the  period after the “dot com bust” when the money dried up and investors only wanted to look at historic cash flows. But as the data and analytics revolution presses forward further, there have to be satisfying opportunities to create real returns in a measurable timespan. I do not think this will be a problem but I do think that we have to expect disappointments after the exaggerated wave of expectations around AI and machine learning. And from conferences like this it is becoming clearer and clearer that workflow will remain a key focus. Creating longer and longer strands of work process robotics and using intelligent technology to provide decision-making support and  then improved decision-making itself seems likely. While RPA (robotic process automation) is making real inroads into clerical process, it is not yet either having an impact on nontrivial decision-making, or upon the business of bringing wider ranges of knowledge to address decision s normally made by that most fallible of qualities, human judgement. 

Looking back, there was another element that did not surface at Wednesday‘s fascinating event. Feedback is what improves machines and makes the development track accelerate. But as we build more and more feedback loops into these knowledge systems we learn more and more about the behaviour of customers, and the gaps between how people actually behave and what they say (or we think) they want, grow larger. The “exhaust data” resulting from usage  does not get much of a mention on these occasions. But if, for example, we looked at the field of scholarly communications and the research and IP markets, I could at least make the argument that content consumption at some point in the future will be the prerogative of machines only. The idea of researchers reading research articles or journals will become bizarre. There will simply be too much content in any one  discipline. The most important thing will be for machines to read, digest, understand and map the knowledge base, allowing researchers to position their own work in terms of the workflow of the domain. And one other piece of  information will then become vitally important. The researcher will need feedback  to know who has downloaded  his own findings, how they were rated, and whether other scholars’  knowledge maps matched his own. Great contextual data drawn from a wider and wider range of sources is fuelling the revolution in data and analytics. Great analysis of feedback data coming off these new solutions will drive the direction of travel.

None of this lies at the door of Houlihan Lokey. By providing a place for a heterodox group of investors and entrepreneurs to mingle and talk they do us all a favour, and in the process demonstrate just how hot the data and analytics field is at the present moment.

You can see a long way from Fiesole. John Milton, in Paradise Lost, remembered the red orb of the sun sinking over the Tuscan hills and likened it to the burnished shield on Satan’s back as he is cast into Hell in Paradise Lost. Some of the delegates at the annual Fiesole Retreat, looking at Open Access and the future of scholarly communication, may have felt similarly cast down, but, if so, they kept it to themselves in a meeting, celebrating its 21st birthday, that lived up to a reputation for real debate, direct speaking, but total respect for the positions of delegates from all sides of the scholarly information workflow. This meeting, a joint venture of Casalini Libri and the Charleston library conference, was at its very best as the European Commission, critically important library interests, publishers of all disciplines, and OA providers alongside traditional subscription journals all contributed viewpoints on a developing situation in scholarly communications which desperately needs the debate engendered here. 

As an observer of the debate and anchorman for the ensuing discussion I have waited ten days before adding my own view to all this. In truth, I cannot sum up the complexity and detail, or render the passion and eloquence of many of the arguments. But the cumulative effect on me was to sharpen the conclusion that I was witnessing something coming, however slowly, to an end. The debate about OA and Plan S is not an end in itself. Subscription publishing will never reassert itself and OA disappear. Nor will the world slowly become totally OA. The changes and the debate point to bigger and more fundamental changes. I was left feeling that just as we have been through Digital Replacement – all paper based content went digital – followed by Digital Transformation – the workflows and processes went digital and became wholly network interconnected – we now approach Digital Re-invention – in which the forms and artefacts of the analogue world themselves give way to digital connectivity which not only alters relationships in the network, but introduces the computer, the machine as reader and researcher, into the workflow. 

We are now in a situation where the old generalities are becoming useless. STM and HSS are near meaningless, given the differences between Life Sciences and Physics, or Chemistry, as research communication fields. Likewise statistical social sciences and humanities. And when I asked what the identifiable critical information problems of scientists were I got two answers – Reproducibility and Methodology. In other words, researchers were anxious to repeat previous experiments using the same or different data or conditions in order to see if results were the same, and they wished to explore the methods used by successful experiments  in order to justify a choice of methodology. Response to these demands requires that all of the data is available and connected by metadata, which is evidently not the case. And of course, specialist services will come into play to meet the needs – in these cases protocols.io, and Ripeta and Gigantum (both new members at Digital Science). These are the type of tools that researchers will use. So what about the books, journals, articles? Who will read them? The answer of course is the intelligent machine, and the nomenclature will change as it becomes obvious that the machine is only interested in content-as-data, not in format at all. 

I asked, again and in vain, whether any publishers present had an idea of the current proportion of usage made by non search bot machines. But the fact is we are not measuring this. And we all nodded when someone said the next generation just want to get the preprint done and stop there – getting something into the network with a growing confidence that it will be found seems to be the thing. We are certainly getting smarter at measuring impact and dissemination, though still behind the curve in accomplishing those vital matters. And, Lordy, Lordy, we do have an industry hang up about the way academics are rewarded with tenure and grant support. Is it so frightening for us to imagine change here because we have hung the future of academic publishing around the neck of an archaic system of academic rewards? Why is it that we always think that change only occurs in our sub sector and the rest of the world stays constant? There is already movement around impact factors in academic review systems. The very fact of PlanS shows funders getting more interested in measuring impact and increasing dissemination. The only certainty about a network is that when one position alters, so do all the rest. 

So my concerns about this sector remain  more about the pace of change than the direction. Work like the eLife Reproducible Document Stack (RDS) is fascinating in this regard – will we interconnect the research lab manuals and review the work in progress at some point? Or will publishing be an automated function of the RDS in time. Whatever happens, we will always need the presence of cross industry multi-disciplinary groups like Fiesole to get the vital perspective, the view from a hill.

« go backkeep looking »