The news (BBC, 29 December) that Orang Utans in Milwaukee are using iPads to watch David Attenbrough while covertly observing each others behaviour reminds me at once of how “early cycle” our experience of tablet tech still is, while how little we still extract from the experience we have of all digital technologies. So, by way of apologizing for missing last week (minor knee procedure, but the medical authorities advised that no reader of mine could possibly deserve my last thoughts before going under the anaesthetic…) and wishing you all (both…?) a belated happy Christmas I am going to sort through the December in-tray.

The key trends of 2011 will always be, for me, the landmark strides made towards really incorporating content into the workflow of professionals, and the progress made in associating previously unthinkable data collections (not linked by metadata, structure and /or location) in ways that allowed us draw out fresh analytical conclusions not otherwise available to us. These are the beginnings of very long processes, but already I think that they have redefined “digital publishing” or whatever it is that we name the post-format (book, chapter, article, database, file) world we have been living in now for a few years and are at last beginning to recognize. Elsevier recognized it all right with their LIPID MAPS lipid structures App ( earlier this month and I should have been quicker to see this. This App on SciVerse does all of the workflow around lipid metabolisms  and is thus integral to the research into lipids-based diseases (stroke, cancer, diabetes, Alzheimer’s, arthritis, to name a few). The LIPIDS MAP consortium is a multi-institutional, research-based organization which has marshalled into its mapping all of the metadata and nomenclature available – common and systematic names, formula, exact mass, InChiKey, classification hierarchies and links to relevant public databases. Elsevier adds the entity searching that allows the full text and abstracts to support the mapping and in data analysis terms to draw the sting from a huge amount of researcher process effort. Whenever I hear the old Newtonian saw about “standing on the shoulders of giants” I replace shoulders with “platforms”.

So how do Elsevier pull off a trick like this? By being ready and spending years  in the preparatory stages. Elsevier, in my view, has become two companies, and alongside a traditional, conservative journal publisher has evolved a high tech science data handling company, conceived in Science Direct and reaching, via Scirus and Scopus a sort of  adolescence in SciVerse. This effort now moves beyond pure data into the worktool App, driven by SciVerse Applications ( and the network of collaborating third party developers which is increasingly driving these developments ( This is and will be a vital component. Not even Elsevier can do all these things alone. The future is collaborative, and here is the market leader showing it understands that, and knows that science goes forward by many players, large and small, acting together. And if developers can find, under the Elsevier technology umbrella, a way of exposing their talents and earning from them (as authors were wont to do with publishers) then another business model extension has been made. There is much evidence here of the future of science “publishing” – and while it may be doubted that many (two?) companies can accomplish these mutations successfully, Elsevier are making their bid to be one of them.

And there is always a nagging Google story somewhere left un-analysed, usually because one could either write a book on the implications or ignore them , on the grounds that they may never happen. But Google is the birthplace of so much that has happened in Big Data that I am loath to neglect BigQuery. With an ordinary sized and shaped company this would all be different. I could say for example that LexisNexis is taking its Big Data solution, HPCC ( Open Source because it wants to get its product implemented in many vertical market solutions without having to go head to head with IBM, Oracle or SAP. But Google clearly relishes the thought of taking on the major analytics players on the enterprize solutions platforms, and clearly has that in mind with this SQL based service, which has been around for about a year and now enters beta with a waitlist of major corporate users anxious to test it. And yet, wait a minute, Google, Facebook and Twitter led us into the No SQL world because the data types, particularly mapping, and the size of databases involved, pushed us into the Big Data age and past the successful solutions created in the previous decade in SQL enquiry. So is what Google is doing here driven mostly by its analysis of the data and capabilities of major corporates (Google doing market research and not giving the market what Google thinks is good for them!) or is this something else, a low level service environment that may take off and splutter into life, or may beta and burn like so many predecessors. Hard to tell but worth asking the question of the Google Man Near You. Meanwhile, the closest thing to a Big Data play in publishing markets remains MarkLogic 5.0. Coming back to where I started on Big Data, one of the most significant announcements in a crowded December had Lexis Nexis – law this time, not Risk Solutions – using MarkLogic 5 as the way to bring its huge legal holdings together, search them in conjunction with third party content and mine previously unrecognized connectivities. Except that I should not have said “mine”. Apparently “mining” and “scraping” are now out of favour: now we “extract” as we analyse and abstract!

However, I wish every scraper and miner seeking  a way forward every good wish for 2012. And me? Well, I am going to check out those Orang Utans. They may have rewritten Shakespeare by now.



keep looking »