In the past week I have attended two MarkLogic World events, one in London and the other in Amsterdam. Your modern software company greets its clients these days in football stadiums (Arsenal and Ajax respectively). Audiences of publishers are large and getting larger – over 500 attended the two events. And enthusiasm grows as more and more data, and content-as-data, owners begin to realise what they must do if they are enable themselves for the age of data. While MarkLogic is not the only platform which can accomplish that enablement, it is by far the most prevalent in “Big Publishing”, and its recent price policy change now brings it into the range of medium-sized and niche players. Adding semantic analysis brings even closer the notion that this type of platform can be instrumental in helping us to ever faster, customer-responsive, new product development, and reminds us of the hassle that many content-providers still suffer in bringing together diverse data streams, created at different points in time within different logical structure, in order to develop new solutions demanded in the market.

It is not hard to get enthusiastic about all of this. It comes on the back of the growing fashion for NoSQL databases, of which MarkLogic is probably the leading exponent. It comes at a time when the visible problems of the relational database world are becoming more important than the historical virtues. This poses problems, and timing issues, for the industry giants (Oracle, IBM, SAP etc). But the last two weeks made one thing very clear to me: those remaining publishers who still think that they can build and maintain their own underlying platform structures are living in a dream world. This game is moving away from them and into a speed of development and complexity of tools that makes it improbable that you can stay competitive and profitable without utilizing a third party solution of this type. This is demonstrated to me by the worry of CTOs in Europe about whether they can recruit enough MarkLogic proficient staff quickly enough.

My interest in all of this derives both from trying to measure how the industry will modernize itself in the face of data-driven demand, and work I have done with MarkLogic on how they present themselves as a solution-vendor. And in the latter role I found myself wondering at these meetings about our ability to reach a common language. One which allows software players to use their own images, but express them in terms that the CTO and the CFO can understand. At present so much of the dialogue of the software vendors is specialized to the world of the CTO and CIO. In publishing we have to engage the people who write the cheques, and while I have regularly in this column pleaded for a greater effort from senior management to really understand something about the software on which their businesses are based, I also feel that vendors must extend their efforts to find a language of communication that makes it easy.

It starts with the very word “platform”. Something on which everything sits? Yes, indeed – but what. In my view, for example, platform without search is a non sequitor: how can you re-use differently structured or unstructured data without it? Or interrogate third party data? Then again, I am with those who define “platform” in enterprize terms. Surely we cannot go on addressing our business as publishers in a series of silos. If the platform carries our data, then it must carry it all – customers, sales, usage, performance as well as product and content, so that the solutions that we build come out of all that we know. And this means that the platform must be addressable in a number of ways: it interested me to see MarkLogic, so long in the XML/XQuery world, now enabling Java and JavaScript.

But if we are worried that their are no standard descriptions of a “platform”, it is even more worrying that the whole world of semantics is now beset by a thorn hedge of imprecise language. And when I commented on this to friends and colleagues, they all, to a man, asked how I would explain it. And since I heard these terms first at a lecture by Tim Berners Lee on SPARQL some long years ago, I share their timidity about departing from the sacred canon. But we really do have to do more than try to persuade the CEO that even if he does not understand triples and triple stores, it will all be all right on the night! So try telling him how to teach a machine to read – vital if is to understand how other machines write in a M2M age. Surely you would start by creating a specialized word list – followed by a lesson in basic sentence structure so that machine understanding of subject/verb/object was on the ground floor of the learning process. And when you had a vocabulary and a way of understanding the positioning of a word in context, and lots and lots of those positional contexts, you next need some rules which allow you to infer meaning in context. Lo and behold, we have built triple stores, taxonomy, inference rules and ontology and still never defined RDF!

The purists will hate this, I know. And I am almost certainly over-generalising, simplifying too much and generally getting it wrong. But my point remains: if we are to carry this next stage of the software revolution which is driving change in our industry then we have to find the words to express it to the Board, and despite the huge amount of re-platforming taking place amongst the 500 or so publishers that I have sat with in the past two weeks, we do not yet approach an explanatory language.

Footnote: One linguistic innovation – bitemporality! Introducing MarkLogic 8, “bitempoaral” was used as a term for dating content arrival and subsequent access, a problem that I have always encountered in forays into legal data (What law was in force then? etc) and in compliance datasets (Did they have the information? Did they look at it at the time or subsequently?) This is a very valuable additional resource and again indicates a vendor listening to its clients, but I hope they never have to defend this miscegenated term before an audience of lawyers! OK, I know it is the correct expression in the SQL world, but when we speak to the CEO please can we call it an Audit trail?

“Its a moral and an ethical system”. Richard Charkin, in a passable imitation of the new business-like Archbishop of Canterbury, defended copyright at last week’s epic Publishers Forum in Berlin, though we all knew that he was referring to a set of trading rules which led Byron to tell his publisher, John Murray, that “Barabbas was the first publisher”. Klopotek’s Berlin show, over 250 strong this year, has become a stadium for opposing positions and sharply contrasted stances. Consider for example, the contrast between the aforesaid Mr Charkin, and Harald Greiner, his fellow opening keynoter on the first day. The Bloomsbury Executive Director remains the delightful iconoclast of his earlier years, though he moves in illustrious establishment circles as an ex President of the Publishers Association now about to become President of the International Publishers Association. A Prince amongst Publishers and our Renaissance Man, in fact, with a track record second to none in STM, reference, mass market paperback, fiction, professional, and in print and digital. Our old world looking into the new with the same passion, argumentativeness, curiosity and determination. A dealer and collaborator – his deal with Faber in drama is a clear sign of the times, as was his half-joking suggestion that Writers and Artists Yearbook was a portal for self-publishing.

Then step forward Mr Greiner. Here we saw the necessary technocrat preparing to create another world that all “publishers” (whatever that now means) increasingly recognize. Mr Greiner runs the IT infrastructure – an increasingly strategic component – of Holtzbrinck. For those of us who recall the German newspaper group, this is now a publishing corporation which owns only Die Welt, which has 75 % of its revenues outside of Germany and which has built a powerful science and education interest to replace its former news organization. With technology hubs in the US and the UK as well as in Germany, Harold Greiner’s drive was towards the industrialization and the professionalization of the industry. Older readers will recall the 1960s lament that the accountants were taking over publishing: the equivalent today is the new men of technology, and, if they are like Mr Greiner, they will be very impressive colleagues (as well as the people who return the margins to the business).

They talk the talk of services and solutions, and walk the Agile way, these New Men. Another who surfaced later on the first day was Marcello Vena, CEO of Digital Publishing from RCS Libri in Italy (think Fabbri, Rizzioli etc) Here was the Technologist as Digital Adventurer – from his eBooks Aboard experiment of making eBook reading free on the fast trains of Italy (clever marketing – get stuck in then you have to download to finish it when you arrive at your destination) to Big Jump, a joint venture with Amazon (yes, that is correct!) on a self-publishing, contest-based, crowd-reviewed platform which has generated 500 new books and 500,000 views. This excitingly followed Bob Stein, who pointed us back to the steady march of social reading , reminded us that writing will change as the Social Book becomes more important, and then pointed to the future of independent bookselling – in recommendation and review sites like

Day 2 set us different challenges. Put your head into 2020 and tell us what you see, the keynoters were asked. Nigel Newton, founder of Bloomsbury, saw the revival of the SME as the technology allowed small start-ups in a world where Amazon had sliced and diced the margins of big players. Francis Bennett, Deputy Chairman at Yale University Press and well-known as the creator of the book trade’s first digital metadata system, saw the role of publishing in the branded competition of universities struggling to attract research funds and grants. Monograph publishing was commercially exhausted and scholarly communication which had value needed immediate availability. Sven Fund, CEO and architect of the rebirth of Walter de Gruyter, saw focus and specialization and size as the answers to the fragmentation he saw around him, and stressed the need for partnership and technology standards in the world we are entering. Matt Turner, CTO at MarkLogic took up that theme. There was no time at which it was more important for publishers to concentrate their working capital of data on one platform, to have complete control over it and access to it, to be able to search it fluently within the platform and relate it to third party or remotely held data, and to be able to fully enhance it with semantic analysis.

And as we began to debate the future that these voices described it became ever clearer that the “publishing” community is not owned by those who self-described themselves as publishers. Baldur Bjarnason of Unbound challenged the very right of publishers to exist in a Viking raid on the high ground of publishing morality (a very different concept from that of Richard Charkin), and the Prince of Self Publishing, Hugh Howey, earnt real respect from an audience which might have felt challenged as he displayed some of the potential of self publishing, pointed out that it is a larger activity than most people think (and larger than “publishing” itself), and guided our thinking away from selection of original works and towards investing in the marketing and development of existing self-published work. And with Fionnuala Duggan and Eric Razenberg (CEO, ThiemeMeulenhoff) underscoring the revolution in education around learner-centric networks and the arrival of real personalised learning, the revolution seemed complete. Hugh Howey, Porter Anderson and Ed Nawotka ended the day in style, but the voice I recalled that night was that of Brian O’Leary. A quiet voice calling for a new architecture of Collaboration. A calm and rational presence embedded in two days of high excitement in a publishing conference that really did bring all the voices to the table.

Helmut von Berg, indefatigable organizer of this event for a decade, retires this year. He earnt the grateful thanks of all of us present in Berlin. He is succeeded by Ruediger Wischenbart and an editorial board who now know how hard it will be to improve on this.

keep looking »