As we in the information services market start to get our thinking right about the influence of Big Data and our current obsession with workflow, then I am beginning to think that we will need to revise our whole approach to collaborative working in marketplaces. At the moment we are playing all the old tunes but none of them seem to quite fit the customer mood. Like that old vaudeville star, Jimmy “Schnozzle” Durante, we need to tinkle those ivories again and again until we find it. The Lost Chord!

So here is a sample of my keyboard doodling. I reason that we cannot “productize” information services for ever. Our customers are now too clever, and as we open our APIs and let them self-actualize or customize, we face real dangers. At the top end of most markets in most sectors the top 10 customers are well-equipped at the skills level, and are surrounded by systems integrators who can service them expensively but effectively. And amongst the medium and small enterprizes in our client base, the cost of doing anything but allow them to customize for themselves is prohibitive. And we are sitting in the middle of this, talking passionately about selling solutions and always seeking stickiness, while our client base shows dangerously independent tendencies.

There are two answers. We could sell less. Just licence everything, put the APIs in place, let the user community get on with it. For me, this is like sleep-walking on a cliff edge. Our only potent quality as service providers has been our knowledge of what users do with our data and how they work. Make the relationship one of pure licencing and we cut off the feedback loop and isolate ourselves from the way in which workflow software is being tweaked and refined, and the way our data grows, or diminishes, in importance as a result. Or we could go to the opposite extreme, way past the current middle ground where we build “solutions” and customers adopt and install them as applications, with all the difficulties described above. The “opposite extreme” is equally difficult, but at least keeps us in the game.

So what is the opposite extreme? Simply this: that we go on building solutions, but we increasingly customize them for our major customers, working in partnership with systems integrators and our software solution partners whose Big Data environment, or analytics, or data mining is part of the key to our service specification. Setting up our own systems integration, by alliance or as an in-house installation, could be vital to our ability to stay sticky, to bring the client’s own data and resources into play, and to learn where the market is going to go. I hear cries of “We are a content company, not a software house!”. Not so for the major players in B2B and STM, who have been fully invested in software for five years or so, and are more likely these days to buy a tool-set than a data-set. Much more cogent are the protests of those who do not want to get into ownership of major pieces of systems software: the answer there is strategic alliance. Discussing the pharma market the other day, where size is very important, I found myself advocating approaches to major customers for outsourcing large areas of non-research process which offered real productivity gains to the user, and gave the services solutions player and his systems software partner the ability to work inside the firewall and grow with the client need.

There may be 1000 major global clients across all verticals with whom this approach would work. It certainly works in government and financial services, traditionally the targets of the major players in Big Data software. But it again exposes two new problems. It leaves the bulk of the market behind  in medium and small players unable to afford this type of soup-to-nuts solutioning. This, again, is a real opportunity for solution packaging with a systems integrator, either externally or internally to the content player. This will enable 3-5 year contracts with upgrades, data updating and maintenance. And in some instances integration will go further and permit scaled down custom solutions that parallel what the major players are doing. The trick will be to start by seeking to sell in the standard integration package, and then respond to the smaller customer’s need for customization. And there is a market of small players and consortia where this type of solutioning has been working for some time. Its Education, and the service area to watch is Pearson Learning Solutions.

And the other problem for the bigger data content players? Simply that there are killer whales out there! As the major enterprize software vendors see what is happening, they will feel that this type of solutioning undermines some sacred territory. We see that with Oracle in particular, but also IBM and SAP are always ready to buy on a vast scale. Some of today’s Big Data ex-start-ups, in the 5-10 year old Valley vintages, will be absorbed into these big players, which could be difficult – or an opportunity – if your content solution is tied to that  newly acquired player. In fact, if the major content providers are not talking regularly to the mighty enterprize software players about how these worlds come together then they are less smart than I think they are. At the moment, in my experience, some at least of the enterprize software players are saying “We should probably buy some of them – but we have no experience of managing content.” If ever you find yourself saying “I never imagined that Springer or Elsevier or Wiley would end up as part of the solutions division at Oracle” then I hope that you will recall an article that went right to that point. And at least that would integrate all access at all points!

So the UK government has decided to monitor every tweet and every email and every social network connection, all in the good cause of the greater security of the citizen. While I am up to my eyes in articles defending the civil liberties of the citizen (at least some of whom are more afraid of the police than the terrorists) I see little commentary on the logistics of all of this, and at best guess estimates that owe more to powerful imagination than logistical reason. My mind goes to the software involved, and that prompts a wider question: while we are now familiar with Hadoop and the techniques used by the cloud-based systems of Yahoo!, Google, Amazon and Facebook, what deployable software is there in the market which works at a platform level and interfaces  information systems with very large data aggregations on the one side, and user interfaces on the other.

In the media and information services area the obvious answer is MarkLogic ( Now a standard for performance in its sector, MarkLogic chose media alongside the government sector as its two key areas of market exposure in the development years. Throughout those years I have worked with them and supported their efforts to “re-platform” the industry. MarkLogic 5.0 is just about as good as it gets for information services going the semantic discovery route, and the testimony to this is installations  in differing information divisions in every global and many national information service providers. So when MarkLogic roll out the consultancy sell these days, they do so with almost unparalleled experience of sector issues. I have no prior knowledge, but I am sure that they would be players in that Home Office contract.

Other potential players come from outside the media sector and outside of its  concentration on creating third party solutions. In other words, rather than creating a platform for a content holder to develop client-side solutions, their experience is directly with the end-user organization. Scanning the field, the most obvious player is Palantir A Palo Alto start-up of the 2004 vintage (Stanford and PayPal are in its genes), this company targetted government and finance as its key starter markets, and has doubled in size every year since foundation. It raised a further $90m  in finance in the difficult year of 2010, and informal estimates of its worth are now over $3 billion. It does very familiar things in its ability to cross search structured, unstructured, relational, temporal and geospatial data, and it now seems to be widening its scope around intelligence services, defense, cyber security, healthcare  and financial services, where its partner on quant services is Thomson Reuters (QA Studio). This outfit is a World Economic Forum 2012 Tech pick – we all love an award – and as we hurry along to fill in the forms for the UK intelligence service, I expect to find them inside already measuring the living space – and the storage capacity.

My next pick is something entirely different. Have a look at This service, from First Life Research, is more Tel Aviv than Palo Alto, but it provides something that UK security will be wanting – a beautifully simple answer to a difficult question. Here the service analysed 160,00 US blog sites and health portals comment sections to try to trap down what people said about the drugs they were taking. They have now examined 600 m posts from 23 million patients commenting on 8500 drugs, and the result, sieved through a clinical ontology-based system, is aggregated patient wisdom. When you navigate this, you know that this will have to find a place in evidence-based medicine before too long, and that the global service environment is on the way. In the meanwhile, since the UK National Health Service cannot afford this, lets apply it to the national email systems, and test the old theory that the British only have two subjects, their symptoms and the weather.

We started with two Silicon Valley companies, so it makes sense next to go to New Zealand. Pingar ( starts where most of us start – getting the metadata to align and work properly. From automating meta tagging to automatic taxonomy construction, this semantic -based solution, while clearly one of the newest players on the pitch, has a great deal to offer. As with the other players I will come back to Pingar in more detail and give it the space it deserves but in the meanwhile I am very impressed by some indicative uses. Its sentiment analysis features will surely come in useful in this Home Office application, as we search to find those citizens more or less likely to create a breach of the peace. If there are few unique features – here or anywhere in these services, then there is a plenitude of tools that can make a real difference. Growing up in the shadow of MarkLogic and Palatir is a good place to be if you can move fast/agile.

But there are others. Also in the pack is Digital Reasoning (, Tim Estes’ company from Franklin TN. Their Synthesys product has scored considerable success, in, guess where? The US government. Some analysts see them as Palantir’s closest competitor in size terms, and here is how they define the problem:

“Synthesys is the flagship product from Digital Reasoning that delivers Automated Understanding for Big Data. Enterprise and Government customers are awash with too much data. This data has three demanding characteristics – it is too big (volume), it is accumulating too fast (velocity) and it is located in many location and forms (variety). Solutions today have attempted to find ever better methods of getting the user to the “right” documents. As a result, data scientists and data analysts today are confronted with the dilemma of an ever-increasing need to read to understand. This is an untenable problem.”

I hear the UK department of spooks saying “hear, hear” so I guess we shall see these gentlemen in the room. But I must turn now to welcome a wonderfully exciting player, which, like Pingar, seems to have emerged at the right place at the right time. In 1985 I became a founder member of the Space Society. This could have been my recognition of the vital task of handling remotely sensed data, or the alluring nature of the Organizing Secretary who recruited me. She moved on, and so did I, ruefully reflecting that no software environment yet existing could handle the terabytes of data  that poured from even the early satellites. Now we have an order of magnitude more data, but at last practical solutions  like SpaceCurve ( from Seattle. Here is the conversation we all wanted then: pattern recognition systems, looking at parallel joins between distributed systems and indexing geospatial polygons… working on multi-dimensional, temporal, geospatial data, data derived from sensors, and analysis of social graphs. Now, if I thread together the third of the words on their website that I understand, I perceive that large scale geospatial has its budding solutions too, and its early clients have been in commodities (the goal of all that geospatial thinking years ago) and defense. Of course.

So I hope to see them filling in their applications as well. In the meanwhile, I shall study hard and seek to produce in the next few months a more detailed analysis of each. But in the meanwhile, if you are gloomy about the ability of the great information companies to survive the current firestorm of Change, reflect on this. Three of my six – Palantir, Treato and SpaceCurve – share a common investor in Reed Elsevier Ventures. They should take a bow for keeping their owners anchored within the framework of change, and making them money while they do it.

« go back