Sitting through the summer months beside a misty inlet on the Nova Scotian coast it is all too easy to lose oneself in the high politics of OA and OER, of the negotiations between a country as large as California and a country as large as Elsevier. Or whether a power like Pearson can withstand a force as large as McGraw with added Cengage. I am in the midst of Churchill’s Marlborough: His Life and Times. There momentous events revolve around a backstairs word at Court. There great armies wheel in the Low Countries as Louis XIV and William of Orange contend for supremacy. Wonderful stuff, but the stiff of history? Nothing about peasants as soldiers, or about harvests and food supplies? Likewise, if we tell the story of the massive changes taking place in the way content is created and intermediated for re-use by scholars and teachers without starting with the foot-soldiers, by which I mean not just researchers and teachers but students and pupils as well, then I think we are in danger of mistaking the momentum as well as the impact of what is happening now. 

When our historians look back, hopefully a little more analytically than Churchill. I think they will be amazed by the slowness of it all. We are now 30 years beyond the Darpanet becoming the Internet. And over 20 of life in a Web-based world. Phone books are an historical curiosity and newspapers in print are about to follow. Business services have been transformed and the way most of us work and communicate and entertain ourselves is firmly digital. Yet nothing has been as conservative and loathe to change as  academic and educational establishments throughout the developed world, and they have maintained their success in imposing these constraints on the rest of the world. From examination systems to pre-publication peer review traditional quality markers have remained in place for the assurance, it is held, of governments, taxpayers and all participants in the process. And while the majority of inert content became digital very early in the the 30 year cycle of digitisation, workflow and process did not. Thus content providers were held in a hiatus. As change took place at the margins, you needed to supply learning systems as well as textbooks (who would have guessed that it would be 2019 before Pearson declared itself Digital First?). And by the same token, who could have imagined that we would be in 2019 before elife’s Reproducible Document Stack feasibly and technically allowed an “article” to contain video, moving graphics, manipulable graphs and evidential datasets?

It is not hard to identify the forces of conservatism that created  this content Cold War, when everyone had to keep things as they had always been, and as a result of which publishing consolidated – and is still consolidating into two or three big players in each sector, it is harder to detect the forces of change that are turning these markets into an arms race. These factors are mostly not to do with the digital revolution, much as commentators like me would like the opposite to be true. Mostly they are to do with the foot soldiers of Marlborough’s armies, those conscripted peasants, those end users. When we look back we shall see that it was the revolt of middle class American parents and their student children against textbook prices, the wish of the Chinese government to get its research recognised globally with out a pay wall, the wish of science researchers to demonstrate outcomes quicker in order to secure reliable forward funding and the wish of all foot soldiers to secure more interoperability of content in the device – dominated, data centric world in to which they have now emerged, that made change happen.

And how do we know that? You need an instrument of great sensitivity to measure change, or maybe change is a reflection of an image in the glass plate of some corporate office. Whatever else is said of them, I hold Elsevier to be a hugely knowledgeable reflection of the markets they serve. So I regard their purchase of Parity Computing as a highly significant move. When publishers and information providers buy their suppliers, not their competitors, it says to me that whatever tech development they are doing in their considerable in-house services, it is neither enough, or fast enough. It says that still more must be done to ensure that their content-as-data is ready for intelligent manipulation. It also says that the developments being created by that supplier are too important, and their investment value too great, to think of sharing them with a competitor using that supplier. 

Markets change when users change. But when the demand for change occurs, we usually have the technology – think of the 20 year migration from Expert systems and Neural Networks to machine learning and AI – to meet that new demand. The push is rarely the other way round. 

Apologies to those kind readers who expected an earlier interjection in December. Truth to tell, I was speechless. Caught somewhere between astonishment at my fellow countrymen’s mania for national self harming, my own complete self-identification historically, culturally and pschychologically as a. “European”, and impatience with all the wise and honest Americans who I know and who cannot collectively somehow re-enact the Emperors clothes nursery tale, there suddenly seemed nothing left to say worth saying, least of all around the topic of electronic information and digital society.

But then I returned to Nova Scotia again for the holidays, and in its clear, cold, sunny air it seems a dereliction of a bloggers duty not to have a message at New Year. And by dint of looking over everyone’s shoulders, I see that Rule One of the New Year message is to make a recommendation, preferably to nominate something as the something of the Year. And as it happens I do have a Book of the Year for this information industry. Please read The Catalogue of Ship-wrecked Books, by Edward Wilson Lee. The inevitable pesky publishers sub-title in the US purports to sell it as a book about Christopher Columbus and his son, but the UK edition hits the point – it is about the attempt by Columbus’s son to build a universal library in Seville, getting royal patronage and setting up buying agents in the great early cities of print to create an early Internet Archive, making available a stream of knowledge as rich as the gold and silver of Peru and Mexico just then flowing into the royal coffers.

The attempt fails of course, but it does set off arguments about the nature of Knowledge which we need to keep having as we dimly perceive the arrival of the leading edge of the development of knowledge products and solutions. And here comes Rule Two: Issue a Warning. And here is mine – Refrain in 2019 from labelling everything you see as AI sourced, related or derived. We are still in the Colon Columbus stage in building the universal knowledge base. Let’s save AI as a term for when AI arrives. Many people are doing really clever things, but they are at best embryonic knowledge products. We are really quite far away from new knowledge created in a machine-driven context without human intervention. Indeed we are still a long way from getting enough information as metadata in a machine understandable form, and when we do we usually do not understand what we have done.

So here comes Rule Three: declare a News Story of the Year. And here is mine. The gracious acknowledgement by Google that their automated recruitment system, which analyses thousands of CVs to produce the best candidates, had a male bias built in to it. And of course it did! Feed the past into an expert system and it replicates the flaws of the past. And its not that the systems doing the analytics are not clever, its just that the dumb data and the dumb documents are not as dumb as we think, and in fact they are larded with all of the mistakes we have ever made. And we need to know that before we evaluate the outcomes as Intelligent, or even believable.

And if we need to be careful about the nature of the information we are using, we need to deal in known quantities. Rule Four: try to make an insight. Mine concerns differentiating between data and documents. The other night, as one does on cold and isolated coastline, we fell to discussing derivations. My wife produced her weightlifters copy of Merriam Webster, and we got into derivations old-style. Datum, neutral, is always related to single objects of an incontrovertible nature. Docuumentum carries the idea of learning throughout its history. When we talk about content-as-data, what do we really mean? And when we talk about AI, do we speak of Intelligence created by machines deriving knowledge from pure data, or of machines learning from knowledge available, fallacies and all, in order to postulate new knowledge? We do need to be clear about our, as derived from our inputs, or we will surely be disappointed by what happens next. We need to start listening very carefully to conversations about concept analysis, concept-based searching and conceptual analysis.

Which logically brings me to Rule Five. End with a prediction. Mine would concern a question I asked in several sessions at Frankfurt this year and have had little but confusion as a result. My question was “What proportion of your readership is machines, and what economic benefits does that readership bring to you?”. I think machine readership will become much more important in 2019, as we seek to monetise it and as we seek to evaluate what content in context means in the context of analytical systems. So just as none of us knew how many machines were reading us this year, next year I think most of us will be aware. And whether those were just browsers, or bots, or knowledge harvesters, or what?

And then I notice there is a Rule Six. You end by wishing every kind reader who reaches this point a happy, healthy and prosperous New Year, which I do for all in 2019. After all, using my rule-based system this column could be written by a machine next year – and read by one too!

keep looking »