We had a wonderful Christmas, thanks, – and I hope that you, Dear Reader, did likewise. Immediately afterwards (dawn on Boxing Day) we departed for Nova Scotia to bring greetings to my Canadian mother-in-law – and drove into a violent snowstorm on the Halifax-Lunenburg road which completely wiped my mental tapes of what I was going to do by way of a year end communication. I only recall that I had promised one reader that I would try not to be so apocalyptic in 2013. Well, here then is a first attempt at a lulling, measured message to that individual: “Have no fears for the future – the disruptions that you might have countered are now fully in play, and the only thing we now await is the full effect!”

As I dashed from the Hut on Boxing Day, I picked up two clippings that curiously underline this theme. The first, and Cureus (pronounced Curious) is its name, refers to the announcement of a new Open Source medical journal launched by a Stanford neurosurgeon (http://scopeblog.stanford.edu/2012/12/19). We could, if we wished, take Cureus as an example of the imminent demise of journal publishing in the sciences. Dr John Adler, its progenitor, appears to have at least two issues with current publishing procedures. On the one hand he complains that research results need to be made available immediately: “allowing researchers to publish their findings at no cost within days, rather than the months or even years that it typically takes for research to be made public”. And on the other hand he is an opponent of traditional peer review who wants to crowd source opinion on an article from both expert and non-expert readers. “Nowadays you wouldn’t go to a restaurant without Yelping it first. You wouldn’t go to see a movie without seeing what Rotten Tomatoes had to say about it. But medical journals are stuck in this 200-year-old paradigm.”

So in fact the delightful thing about Cureus is that it ignores both Open Access as practised by PloS and by commercial publishers, or even by the technical evaluation favoured by PloS One, and demands a further level of democratization of access at the same time. “The average Joe has little or no access to the medical literature today. Its a right. Its a human right”. This would delight the early Open Access campaigners in the US, but the crowd source idea is more valuable than the access by non-academic users. We now have a raft of innovations, some of them going back to the Dotcom Boom, which, if fully applied to current publishing processes, would have a hugely disruptive effect. Are we looking at a time when both conventional Journal publishing and newly “conventional” Open Access publishing are both overtaken by the delayed “boomerang” effect of network publishing procedures now taken for granted elsewhere?

I have the same reaction to a wonderful piece by Bill Rosenblatt, a doyen of internet rights commentators, in a piece on 15 December 2012 (http://paidcontent.org/2012/12/15). Entitled “The Right to Re-Sell: A ticking Time Bomb over Digital Goods”, Bill makes two critical points that will be very important on the 2013 agenda. In the first instance, most music and eBook products are now sold without DRM. DRM files were hated by users and arguably created more customer services issues than they were worth. So while legal embargoes on resale or re-use remain in everyone’s licences, the physical barrier has largely disappeared. File transfer between friends – “I’ll loan you that book when I have finished it” – is allegedly commonplace, though I have seen few attempts at quantification. Bill doesn’t offer any, since he has bigger game in his sights. He has been looking at ReDigi (www.redigi.com), a music resale service. This includes a forward – and – delete function so that the company can protect itself against the whole idea that it is a front for IP theft , but could well become the Chegg of the music industry.

Bill’s other issues concern ORI (the Owners Rights Initiative grouping) whose membership brings libraries and resellers together in unholy alliance to lobby for protection against litigious publishers under the slogan “You bought it, You own it”. (http://www.prnewswire.com/news-releases/you-bought-it-you-own-it-owners-rights-initiative-launches-to-protect-consumers-rights-175435921.html) And here lies the fundamental point that I distil from Bill’s piece. As we moved into the networked world we never resolved the fundamental issue about intellectual property ownership. After 500 years of print reproduction, we thought that we could still own the the content and control its re-use. And we are still trying to do that in a network of users who adhere to a completely different view of ownership. They think that a digital object is synonymous with a physical one, and having successfully ignored or evaded the law in the real world of real objects, they will be able to do exactly the same in the virtual world.

Meanwhile, the current world of digital offers just about every variant on lending and resale rights that one might possibly imagine. And the belwether world of journal publishing illustrates yet further variation on the theme of open network publishing. For publishers and those who aspire to recreate publishing, the key remains how you add value to processes that were once your sole domain, but which now can be performed anywhere by anyone with network access. The key to 2013 remains as it has for the last 20 years: understand how users want to behave in the network, and get there before demand chrysalises with appropriate value adding proposals that they will want to pay for. Next year, as in all those years “just publishing” will not be enough.

So have we all got it now? When our industry (the information services and content provision businesses, sometimes erroneously known as the data industry) started talking about something called Big Data, it was self-consciously re-inventing something that Big Science and Big Government had known about and practised for years. Known about and practised (especially in Big Secret Service; for SIGINT see the foot of this article) but worked upon in a “finding a needle in a haystack” context. The importance of this only revealed itself when I found myself at a UK Government Science and Technology Facilities Council at the Daresbury Laboratory in he north of England earlier this month. I went because my friends at MarkLogic were one of the sponsors, and spending a day with 70 or so research scientists gives more insight on customer behaviour than going to any great STM conference you may care to name. I went because you cannot see the centre until you get to the edge, and sitting amongst perfectly regular normal folk who spoke of computing in yottaflops (processing per second speeds of 10 to the power of 24) as if they were sitting in a laundromat watching the wash go round is fairly edgy for me.

We (they) spoke of data in terms of Volume, Velocity and Variety, sourced from the full gamut of output from sensor to social. And we (I) learnt a lot about the problems of storage which went well beyond the problems of a Google and a Facebook. The first speaker, from the University of Illinois, at least came from my world: Kalev Leetanu is an expert in text analytics and a member of the Heartbeat of the World Project team. The Great Twitter Heartbeat ingests Twitter traffic, sorts and codes it so that US citizens going to vote, or Hurricane Sandy respondents, can appear as geographical heatmaps trending in seconds across the geography of the USA. The SGI UV which did this work (it can ingest the printed resources of the Library of Congress in 3 seconds) linked him to the last speaker, the luminous Dr Eng Lim Goh, SVP and CTO at SGI, who gave a magnificent tour d’horizon of current computing science. His YouTube videos are as wonderful as the man himself (a good example is his 70th birthday address to Stephen Hawking, his teacher, but also look at (http://www.youtube.com/watch?v=zs6Add_-BKY). And he focussed us all on a topic not publicly addressed by the information industry as a whole: the immense distance we have travelled from “needle in a haystack” searching to our current pre-occupation with analysing the differences between two pieces of hay – and mapping the rest of the haystack in terms of those differences. For Dr Goh this resolves to the difference between arranging stored data as a cluster of nodes to working in shared memory (he spoke of 16 terabyte supernodes). As the man with the very big machine, his problems lie in energy consumption as much as anything else. In a process that seems to create a workflow that goes Ingest > Store and Organize > Analytics > Visualize (in text and graphics – like the heatmaps) the information service players seem to me to be involved at every point, not just the front end.

The largest data sourcing project on the planet was represented in the room (The SKA, or Square Kilometre Array, is a remote sensing telemetry experiment with major sites in Australia and South Africa). Of course, NASA is up there with the big players, and so are the major participants in cancer research and human genomics. But I was surprized by how Big the Big Data held by WETA Data (look at all the revolutionary special effects research at http://www.wetafx.co.nz/research) in New Zealand was, until I realised that this is a major film archive (and NBA Entertainment is up there too on the data A List) This reflects the intensity of data stored from film frame images and their associated metadata, now multiplied many times over in computer graphics – driven production. But maybe it is time now to stop talking about Big Data, the term which has enabled us to open up this discussion, and begin to reflect that everyone is a potential Big Data player. However small our core data holding may be compared to these mighty ingestors, if we put proprietory data alongside publicly sourced Open Data and customer-supplied third party data, then even very small players can experience the problems that induced the Big Data fad. Credit Benchmark, which I mentioned two weeks ago, has little data of its own: everything will be built from third party data. The great news aggregators face similar data concentration issues as their data has to be matched with third party data.

And I was still thinking this through when news came of an agreement signed by MarkLogic (www.marklogic.com) with Dow Jones on behalf of News International this week. The story was covered in interesting depth at http://semanticweb.com/with-marklogic-search-technology-factiva-enables-standardized-search-and-improved-experiences-across-dow-jones-digital-network_b33988 but the element that interested me and which highlights the theme of this note concerns the requirement not just to find the right article, but to compare articles and demonstrate relevance in a way which only a few years ago would have left us gasping. Improved taxonomic control, better ontologies and more effective search across structured and unstructured data lie at the root of this, of course, but do not forget that good results at Factiva now depend on effective Twitter and blog retrieval, and effective ways of pulling back more and more video content, starting with You Tube. The variety of forms takes us well beyond the good old days of newsprint, and underline the fact that we are all Big Data players now.

Note: Alfred Rolington, formerly CEO at Janes, will publish a long-awaited book with OUP on “Strategic Intelligencein the Twenty First Century” in January which can be pre-ordered on Amazon at http://www.amazon.co.uk/Strategic-Intelligence-21st-Century-Mosaic/dp/0199654328/ref=sr_1_1?s=books&ie=UTF8&qid=1355519331&sr=1-1. And I should declare, as usual, that I do work from time to time with the MarkLogic team, and thank them for all they have done to try to educate me.

keep looking »