The news (BBC, 29 December) that Orang Utans in Milwaukee are using iPads to watch David Attenbrough while covertly observing each others behaviour reminds me at once of how “early cycle” our experience of tablet tech still is, while how little we still extract from the experience we have of all digital technologies. So, by way of apologizing for missing last week (minor knee procedure, but the medical authorities advised that no reader of mine could possibly deserve my last thoughts before going under the anaesthetic…) and wishing you all (both…?) a belated happy Christmas I am going to sort through the December in-tray.

The key trends of 2011 will always be, for me, the landmark strides made towards really incorporating content into the workflow of professionals, and the progress made in associating previously unthinkable data collections (not linked by metadata, structure and /or location) in ways that allowed us draw out fresh analytical conclusions not otherwise available to us. These are the beginnings of very long processes, but already I think that they have redefined “digital publishing” or whatever it is that we name the post-format (book, chapter, article, database, file) world we have been living in now for a few years and are at last beginning to recognize. Elsevier recognized it all right with their LIPID MAPS lipid structures App ( earlier this month and I should have been quicker to see this. This App on SciVerse does all of the workflow around lipid metabolisms  and is thus integral to the research into lipids-based diseases (stroke, cancer, diabetes, Alzheimer’s, arthritis, to name a few). The LIPIDS MAP consortium is a multi-institutional, research-based organization which has marshalled into its mapping all of the metadata and nomenclature available – common and systematic names, formula, exact mass, InChiKey, classification hierarchies and links to relevant public databases. Elsevier adds the entity searching that allows the full text and abstracts to support the mapping and in data analysis terms to draw the sting from a huge amount of researcher process effort. Whenever I hear the old Newtonian saw about “standing on the shoulders of giants” I replace shoulders with “platforms”.

So how do Elsevier pull off a trick like this? By being ready and spending years  in the preparatory stages. Elsevier, in my view, has become two companies, and alongside a traditional, conservative journal publisher has evolved a high tech science data handling company, conceived in Science Direct and reaching, via Scirus and Scopus a sort of  adolescence in SciVerse. This effort now moves beyond pure data into the worktool App, driven by SciVerse Applications ( and the network of collaborating third party developers which is increasingly driving these developments ( This is and will be a vital component. Not even Elsevier can do all these things alone. The future is collaborative, and here is the market leader showing it understands that, and knows that science goes forward by many players, large and small, acting together. And if developers can find, under the Elsevier technology umbrella, a way of exposing their talents and earning from them (as authors were wont to do with publishers) then another business model extension has been made. There is much evidence here of the future of science “publishing” – and while it may be doubted that many (two?) companies can accomplish these mutations successfully, Elsevier are making their bid to be one of them.

And there is always a nagging Google story somewhere left un-analysed, usually because one could either write a book on the implications or ignore them , on the grounds that they may never happen. But Google is the birthplace of so much that has happened in Big Data that I am loath to neglect BigQuery. With an ordinary sized and shaped company this would all be different. I could say for example that LexisNexis is taking its Big Data solution, HPCC ( Open Source because it wants to get its product implemented in many vertical market solutions without having to go head to head with IBM, Oracle or SAP. But Google clearly relishes the thought of taking on the major analytics players on the enterprize solutions platforms, and clearly has that in mind with this SQL based service, which has been around for about a year and now enters beta with a waitlist of major corporate users anxious to test it. And yet, wait a minute, Google, Facebook and Twitter led us into the No SQL world because the data types, particularly mapping, and the size of databases involved, pushed us into the Big Data age and past the successful solutions created in the previous decade in SQL enquiry. So is what Google is doing here driven mostly by its analysis of the data and capabilities of major corporates (Google doing market research and not giving the market what Google thinks is good for them!) or is this something else, a low level service environment that may take off and splutter into life, or may beta and burn like so many predecessors. Hard to tell but worth asking the question of the Google Man Near You. Meanwhile, the closest thing to a Big Data play in publishing markets remains MarkLogic 5.0. Coming back to where I started on Big Data, one of the most significant announcements in a crowded December had Lexis Nexis – law this time, not Risk Solutions – using MarkLogic 5 as the way to bring its huge legal holdings together, search them in conjunction with third party content and mine previously unrecognized connectivities. Except that I should not have said “mine”. Apparently “mining” and “scraping” are now out of favour: now we “extract” as we analyse and abstract!

However, I wish every scraper and miner seeking  a way forward every good wish for 2012. And me? Well, I am going to check out those Orang Utans. They may have rewritten Shakespeare by now.



Content was once valuable. Then content about content, the metadata that identifies our content values and made them accessible, became a greater and more powerful value. Soon we stood at the edge of a universe where no searching would take place which did not involve machine interrogation of metadata. We evolved ever more complex systems of symbology to ensure that customers who used our content were locked into accepting our view of the content universe by virtue of accepting our coding and metadata, and using it in relation to third party content. Further, we passed into European law, in terms of the provisions of the so-called directive on the legal protection of databases, the notion that our metadata was itself a protectable database. Now content is less valuable, more commoditized, and inevitably widely copied. So it is our fall back position that our metadata contains the unique intellectual property and as long as we still have that in a place of safety we are secure. And can sleep easily in our beds.

Until the day before yesterday, that is. For on the 14 December the European Union’s Official Journal published a settlement offer from Thomson Reuters in an competition enquiry which has run for two years ( The case concerns Thomson Reuters’ use of its RICs codes. Insofar as they have become the standard way in which traded equities are described in datafeeds, the fact that the market bought the Reuters solution as a surrogate for standardization did give Thomson Reuters competitive advantage – and this is made clear by the fact that the Commission investigation was prompted by its commercial rivals. But that advantage was not unearnt, and the standardization that resulted from it brought benefits across the market. Now Thomson Reuters, to end the process, offers licensing deals and increased access to its metadata. This may turn out to be a momentous moment for the industry.

I have no interest here in examining whether Thomson Reuters are right or wrong to seek a deal. From Microsoft to Google to Apple, the frustrations of enquiries by the competition commissioner’s office in Brussels have worn down the best and most resilient. But I do want to comment om what may be happening here. If you accept my thesis that content is becoming increasingly commoditized and that systems for describing it are increasingly valuable, we may have to recalibrate our picture of what is happening as a result of this news. What if, in fact, the commoditization involved here spreads slowly up the entire information value chain over time. In this model, the famous value pyramid which we have all used to subjugate our audiences and colleagues is under commoditization water at its base, which is where raw data and published works are kept. Now the next level is becoming slightly damp from this rising tide, as descriptive modalities get prised off and become part of the common property of all information users. So information vendors scramble further up the pyramid, seeking dry land where ownership can be re-asserted. Maybe more advanced metadata will offer protection and enhance asset value. The Scorm dataset in an educational product can annotate learning outcomes and allow objects and assessment to be associated. Or, following the financial services theme here, maybe we add Celerity-style intelligence to content which allows a news release to be “read” in machine-to-machine dialogue, and trading actions sparked by the understanding created. We will certainly do all these things, because no one will buy our services if they do not accord with the most appropriate descriptive norms available. But will they protect our intellectual property in conent or data? No, I am increasingly afraid that they will not.

It will take many years to happen. And it will happen at a very different pace in different marketplaces. But the days when you valued a company by its content IP, by its copyrights and its unique ownership value have been over for some time. And now we can see that the higher order values are themselves becoming susceptible to competition regulation which seems, in this age, to over-ride IP rights in every instance. So what are we actually doing when we say we are building value? Normally, it seems to me, we are combining content with operational software systems to create value represented by utility. From the app to the workflow system, content retains its importance in the network because we shape it not just for research, but for action, for process, for communication. And that, after all, is where the definition of a networked society with a networked economy lies.

And if we were in doubt about this, reflect on the current pre-occupation about Big Data. Is our society going to be willing to hold up the vital release of “new” scientific knowledge from the ossified files of journal publishers just because some of this stuff is owned by Elsevier and some by Wiley? The water of analytic progress is already flowing around the dams of copyright ownership, and this week surged past a major player protecting his coding, though the proposed licensing scheme does leave a finger in the hole in the dyke. We seem to me to be running at ever greater speed towards a service economy in professional information where the only sustaining value is the customer appreciation of service given, measured in terms of productivity, process improvement, and compliance . These benefits will be created from content largely available on the open web, and increasingly using metadata standards which have gone generic and are now, like RICs, part of the common parlance of the networked marketplace. The language of IP in he information economy is getting to sound a bit old-fashioned.

keep looking »