In the past week I have attended two MarkLogic World events, one in London and the other in Amsterdam. Your modern software company greets its clients these days in football stadiums (Arsenal and Ajax respectively). Audiences of publishers are large and getting larger – over 500 attended the two events. And enthusiasm grows as more and more data, and content-as-data, owners begin to realise what they must do if they are enable themselves for the age of data. While MarkLogic is not the only platform which can accomplish that enablement, it is by far the most prevalent in “Big Publishing”, and its recent price policy change now brings it into the range of medium-sized and niche players. Adding semantic analysis brings even closer the notion that this type of platform can be instrumental in helping us to ever faster, customer-responsive, new product development, and reminds us of the hassle that many content-providers still suffer in bringing together diverse data streams, created at different points in time within different logical structure, in order to develop new solutions demanded in the market.

It is not hard to get enthusiastic about all of this. It comes on the back of the growing fashion for NoSQL databases, of which MarkLogic is probably the leading exponent. It comes at a time when the visible problems of the relational database world are becoming more important than the historical virtues. This poses problems, and timing issues, for the industry giants (Oracle, IBM, SAP etc). But the last two weeks made one thing very clear to me: those remaining publishers who still think that they can build and maintain their own underlying platform structures are living in a dream world. This game is moving away from them and into a speed of development and complexity of tools that makes it improbable that you can stay competitive and profitable without utilizing a third party solution of this type. This is demonstrated to me by the worry of CTOs in Europe about whether they can recruit enough MarkLogic proficient staff quickly enough.

My interest in all of this derives both from trying to measure how the industry will modernize itself in the face of data-driven demand, and work I have done with MarkLogic on how they present themselves as a solution-vendor. And in the latter role I found myself wondering at these meetings about our ability to reach a common language. One which allows software players to use their own images, but express them in terms that the CTO and the CFO can understand. At present so much of the dialogue of the software vendors is specialized to the world of the CTO and CIO. In publishing we have to engage the people who write the cheques, and while I have regularly in this column pleaded for a greater effort from senior management to really understand something about the software on which their businesses are based, I also feel that vendors must extend their efforts to find a language of communication that makes it easy.

It starts with the very word “platform”. Something on which everything sits? Yes, indeed – but what. In my view, for example, platform without search is a non sequitor: how can you re-use differently structured or unstructured data without it? Or interrogate third party data? Then again, I am with those who define “platform” in enterprize terms. Surely we cannot go on addressing our business as publishers in a series of silos. If the platform carries our data, then it must carry it all – customers, sales, usage, performance as well as product and content, so that the solutions that we build come out of all that we know. And this means that the platform must be addressable in a number of ways: it interested me to see MarkLogic, so long in the XML/XQuery world, now enabling Java and JavaScript.

But if we are worried that their are no standard descriptions of a “platform”, it is even more worrying that the whole world of semantics is now beset by a thorn hedge of imprecise language. And when I commented on this to friends and colleagues, they all, to a man, asked how I would explain it. And since I heard these terms first at a lecture by Tim Berners Lee on SPARQL some long years ago, I share their timidity about departing from the sacred canon. But we really do have to do more than try to persuade the CEO that even if he does not understand triples and triple stores, it will all be all right on the night! So try telling him how to teach a machine to read – vital if is to understand how other machines write in a M2M age. Surely you would start by creating a specialized word list – followed by a lesson in basic sentence structure so that machine understanding of subject/verb/object was on the ground floor of the learning process. And when you had a vocabulary and a way of understanding the positioning of a word in context, and lots and lots of those positional contexts, you next need some rules which allow you to infer meaning in context. Lo and behold, we have built triple stores, taxonomy, inference rules and ontology and still never defined RDF!

The purists will hate this, I know. And I am almost certainly over-generalising, simplifying too much and generally getting it wrong. But my point remains: if we are to carry this next stage of the software revolution which is driving change in our industry then we have to find the words to express it to the Board, and despite the huge amount of re-platforming taking place amongst the 500 or so publishers that I have sat with in the past two weeks, we do not yet approach an explanatory language.

Footnote: One linguistic innovation – bitemporality! Introducing MarkLogic 8, “bitempoaral” was used as a term for dating content arrival and subsequent access, a problem that I have always encountered in forays into legal data (What law was in force then? etc) and in compliance datasets (Did they have the information? Did they look at it at the time or subsequently?) This is a very valuable additional resource and again indicates a vendor listening to its clients, but I hope they never have to defend this miscegenated term before an audience of lawyers! OK, I know it is the correct expression in the SQL world, but when we speak to the CEO please can we call it an Audit trail?

keep looking »