Its Big Data week, yet again. In the last two months we have seen all of the dramas and confusions attendant upon emerging markets, yet none of the emerging clarity which one might expect when a total sea change is taking place in the way in which we extract value from data content. Then this week, with all the aplomb of an elephant determined not to be left behind in a world which has apparently decided that the hula hoop is the only route to sanity, Oracle announced its enterprize Big Data solution. Again. Only now it is called the Big Data Appliance. It started shipping on Tuesday. And the world will never be the same again.

At the heart of the Oracle launch is a Hadoop license. This baby elephant lies at the heart of almost everything. The two Hadoop – based commercializations, have both raised finance in the lead-up to 2012: Cloudera ($40m) and Hortonworks ($20m), while other sector players like MapR who also exploit Hadoop found 2011 a really good time to raise money. And this had a radiating effect on the whole data handling sector. Neo 4j, a database technology (NeoTechnology, based in Malmo and Menlo Park) for  graph storage and resolution raised $10m in a round led by Fidelity. Meanwhile, Microsoft signed a deal with Horton works, IBM said it would launch Hadoop in the Cloud, EMC (Greenplum) went for MapR, Dell announced a Hadoop-based initiative, and the world waits and wonders what Hewlett Packard will do, now that it has Autonomy for analytics.

So now we have plenty of initiatives, and, as usual, not much idea of who the next generation of users will be. The first generation speak for themselves. We can see the benefits that Facebook derive from being able to used Hadoop-based tools to find connections and meanings in their content that would have been impossible to cost-effectively reveal in a prior age. And the same would be true of such unlikely bedfellows as the Department of Homeland Security, or Walmart, or Sony (think Playstation Network), or the Israeli Defence Force, or the US insurance industry (via Lexis Risk), or Lexis Nexis (who announced a Big Data integration with MarkLogic), let alone the two players who effectively started all this: Yahoo! (Hadoop) and Google (MapReduce). So asking where it goes next is a legitimate question, but one which can only be answered if we accept that the next group of users are never going to recreate  the Google server farms in order to break into these advantageous processing environments. The next group of intensive users will have their XML content on MarkLogic, or their graphical data on Neo 4j. They will want to use the US census data remotely (so will contract with Amazon for process time on the Amazon web presence), and will use a large variety of third party content held in similar ways. Some of their own content will still be held locally on MySQL databases – like Facebook – while others will be working in part or fully in the Cloud, and combining that with their own NoSQL applications. But the essential point here is that no one will be building huge data warehousing operations governed by rigid and mechanistic filing structures. Literally, we are increasingly leaving the data where it is, and bringing the analytical software to it, in order to produce results that are independent of any single data source.

And this too produces another sort of revolution. The front door to working in this way is now the organizational software itself. When Lexis Risk announced at the end of last year that they were going to take HPCC open source, a number of critics saw that as turning their back to an exploitation opportunity. Yet it makes very real sense in the context of Oracle, Microsoft and IBM seeking to build their own “solutions”. Some businesses will want to run their own solutions, and will make a choice between open source Hadoop and open source HPCC. Others in systems integration will seek out open source environments to create unique propositions. But since it was always unlikely that Lexis Risk was going to challenge the enterprize software players in their own bailiwick, then open source is a way of getting a following, harvesting vital feedback, and earn not insignificant returns in servicing and upgrading users.

I am also delighted to see that other winners seem likely to be MarkLogic, since I have been proud of working with them and speaking at their meetings for a number of years. For publishers and information providers, it is now clear that XML remains the route forward. But MarkLogic 5 is clearly being positioned as the information service providers socket for plugging into the Big Data environment. Anyone who believes that scientists will NOT want to analyse all data in a segment, or engineers source all relevant briefs with their ancilliary information, or lawyers cross examine all documentation regardless of location, or pharma companies examine research files in the context of contra-indications should stop reading now and take up fishing. My observation is that Big Data is like Due Diligence: once someone does it, even if the first results are not impressive, all competitors have to do it. The risk of not trying to find the indicative answer by the most advanced methods is too great to take.




My holiday reading, courtesy of Skip Pritchard who gave it to me, has been Michael Korda’s vast biography of T E Lawrence, and despite my familiarity with the story, I have found it an entrancing experience. Lawrence is almost impossible to reconstruct, since he shone a different light in the direction of every individual he met, and one is left feeling that nowhere does a real Lawrence exist. So very like the information game, then! Every observer sees a different fraction of play, and no one can predict the outcome. This comment is meant to mask my residual guilt at reading my book while my knee mended and not writing pages of forecasts and predictions for the amusement of readers, and to confirm my frailties as a prophet of anything.

Lawrence wrote “The Seven Pillars of Wisdom”, one of the world’s unread classics (and almost unreadable in parts: he lost the only copy of the full manuscript on Reading train station and had to recreate 200,000 words, during which he clearly became bored.) In 800 words I can communicate seven thoughts – not so much Pillars  as pillows, and not predictions but observations of this unknowable industry. Here goes:

1.  Some think its about content and others that it is about platforms and technology. For me it is still about communications, and the greatest challenge is still holding people’s attention, having gained their recognition. Even Facebook hits a plateau. The gods remain Reputation, Identity, and Attention.

2. You are either a communication company or you are not. News Corp is a format company. It does newspapers, film and television and has little corporate bandwidth for non-format communications. This cannot be changed by executive whim, and the collapse of Beyond Oblivion, its music initiative, before the holidays (, as well as the veil of silence around the performance of The Daily on the iPad, following on as they do the oblivion that was My Space, demonstrates all of this very well. Yet Mr Murdoch has signed on to Twitter. There is no evidence yet that the world can be saved with a single Tweet. There is no evidence yet that traditional media and information businesses can recreate themselves in new marketplaces without either starting afresh somewhere else  or by buying a new business and moving into it. Boinc.

3. Apple, according to MacRumors (, is about to enter the textbook market, maybe with Pearson and certainly via the iPad. This was apparently a dearly held dream of Steve Jobs, at least according to Walter Isaacson, who is shaping up to be not just the biographer but also the Delphic oracle. I have some doubts – not about the iPad as a display device, but about whether markets want textbooks re-invented. Learners would like learning re-invented, and made easier and more compelling. Textbooks are an extinct format. And learning should operate equally well on whatever platform you have available. What a waste of all this energy around eLearning if we abolish the old formats like textbooks and replace them with rigid device platforms. And yet I am sure that the analysts are right – there are only a few global growth markets and education is the largest.

4. Then I had a great comment from Brad Patterson at EduLang ( He points out that 500 million people are trying to learn English and only 50 million can afford textbooks, online or otherwise. So his business model for his interesting TOEFL and TOIEC Simulators is “pay what you can”, with half going to a reading charity. In many ways this is very neat – it reaches out to 450 million people with a trust relationship, and could be a really interesting business model to watch. Above all, how encouraging it is to see someone moving the goalposts – we did not score many goals in regular business model configurations so lets applaud the courage of someone doing something different.

5. Semantic Web technology and deployment in mass markets is getting closer and closer. I took part in the beta of Garlik ( some 3 years ago, partly because of an interest in technology around identity, and partly out of interest in technologies derived from the University of Southampton Computer Science department, and blessed by such eminences as Wendy Hall, Nigel Shadbolt – and Sir Tim Berners Lee himself. Two days before Christmas Garlik was sold to Experian, in a move that I think was as significant as Reuters buying ClearForest all those years ago. Garlik protects personal identity through web search, was founded by the men who built the UK online banks Egg and First Direct, and backed by Doughty Hanson. This is a straw in a wind which will go galeforce.

6. But if the Semantic Web is going to be so clever, and linked data will recreate so many service environments, where is it now? Well, look at the obvious places. In most of our economies building and construction is the largest sector in terms of activity and players, large and small, and has great companies serving it with supplier and materials information. Thus, in a US market replete with Reed Construction, Hanley Wood and McGraw-Hill. But what if a semantic web-based environment were able to search all online catalogues and directories to produce a sweeping coverage of suppliers and products that was at once more detailed and more comprehensive than any directory-style database, and could include more metadata from suppliers and users to create a continually developing industry specification site, deliverable and self-formatting to every platform and device? That is what interests me about MaterialSource, ( as well as its use of SPARQL, Semantic Web Pages for faceted and graph-based browsing, smartphone and tablet Apps using HTML5, ontologies etc, etc. If they do it, someone will have to buy them!

7. I keep on thinking about the neglect of audio, so I was delighted to see SoundCloud ( There has to be room for an audio portal, and a community for sharing sound and cross-referencing its sources and users. I anticipate that they know things about users that Beyond Oblivion didn’t.

Last words of a predictive nature before I get back to real work. A correspondent asks “what technology are you following in 2012!” Since I say every week that I am not following technologies but users, I take mild offense at this, but I do admit to a penchant for 3D printing. Now that really could have an impact. Especially in medical workflow. I have also been asked by a venture capitalist who should know better what is likely “to be certain” to succeed this year. He is a serious man so I owe him a serious answer: anything that saves more time and money than it costs. The prime example this year in the UK has been Shutl, a delivery logistics service that gets your online purchases to you physically (average delivery time in London was 90 minutes, with a cost of £5). Is that all the queries? I am beginning to feel like an Agony Aunt!


« go back