So the UK government has decided to monitor every tweet and every email and every social network connection, all in the good cause of the greater security of the citizen. While I am up to my eyes in articles defending the civil liberties of the citizen (at least some of whom are more afraid of the police than the terrorists) I see little commentary on the logistics of all of this, and at best guess estimates that owe more to powerful imagination than logistical reason. My mind goes to the software involved, and that prompts a wider question: while we are now familiar with Hadoop and the techniques used by the cloud-based systems of Yahoo!, Google, Amazon and Facebook, what deployable software is there in the market which works at a platform level and interfaces  information systems with very large data aggregations on the one side, and user interfaces on the other.

In the media and information services area the obvious answer is MarkLogic (www.marklogic.com). Now a standard for performance in its sector, MarkLogic chose media alongside the government sector as its two key areas of market exposure in the development years. Throughout those years I have worked with them and supported their efforts to “re-platform” the industry. MarkLogic 5.0 is just about as good as it gets for information services going the semantic discovery route, and the testimony to this is installations  in differing information divisions in every global and many national information service providers. So when MarkLogic roll out the consultancy sell these days, they do so with almost unparalleled experience of sector issues. I have no prior knowledge, but I am sure that they would be players in that Home Office contract.

Other potential players come from outside the media sector and outside of its  concentration on creating third party solutions. In other words, rather than creating a platform for a content holder to develop client-side solutions, their experience is directly with the end-user organization. Scanning the field, the most obvious player is Palantir  www.palantir.com). A Palo Alto start-up of the 2004 vintage (Stanford and PayPal are in its genes), this company targetted government and finance as its key starter markets, and has doubled in size every year since foundation. It raised a further $90m  in finance in the difficult year of 2010, and informal estimates of its worth are now over $3 billion. It does very familiar things in its ability to cross search structured, unstructured, relational, temporal and geospatial data, and it now seems to be widening its scope around intelligence services, defense, cyber security, healthcare  and financial services, where its partner on quant services is Thomson Reuters (QA Studio). This outfit is a World Economic Forum 2012 Tech pick – we all love an award – and as we hurry along to fill in the forms for the UK intelligence service, I expect to find them inside already measuring the living space – and the storage capacity.

My next pick is something entirely different. Have a look at www.treato.com. This service, from First Life Research, is more Tel Aviv than Palo Alto, but it provides something that UK security will be wanting – a beautifully simple answer to a difficult question. Here the service analysed 160,00 US blog sites and health portals comment sections to try to trap down what people said about the drugs they were taking. They have now examined 600 m posts from 23 million patients commenting on 8500 drugs, and the result, sieved through a clinical ontology-based system, is aggregated patient wisdom. When you navigate this, you know that this will have to find a place in evidence-based medicine before too long, and that the global service environment is on the way. In the meanwhile, since the UK National Health Service cannot afford this, lets apply it to the national email systems, and test the old theory that the British only have two subjects, their symptoms and the weather.

We started with two Silicon Valley companies, so it makes sense next to go to New Zealand. Pingar (www.pingar.com) starts where most of us start – getting the metadata to align and work properly. From automating meta tagging to automatic taxonomy construction, this semantic -based solution, while clearly one of the newest players on the pitch, has a great deal to offer. As with the other players I will come back to Pingar in more detail and give it the space it deserves but in the meanwhile I am very impressed by some indicative uses. Its sentiment analysis features will surely come in useful in this Home Office application, as we search to find those citizens more or less likely to create a breach of the peace. If there are few unique features – here or anywhere in these services, then there is a plenitude of tools that can make a real difference. Growing up in the shadow of MarkLogic and Palatir is a good place to be if you can move fast/agile.

But there are others. Also in the pack is Digital Reasoning (www.digitalreasoning.com), Tim Estes’ company from Franklin TN. Their Synthesys product has scored considerable success, in, guess where? The US government. Some analysts see them as Palantir’s closest competitor in size terms, and here is how they define the problem:

“Synthesys is the flagship product from Digital Reasoning that delivers Automated Understanding for Big Data. Enterprise and Government customers are awash with too much data. This data has three demanding characteristics – it is too big (volume), it is accumulating too fast (velocity) and it is located in many location and forms (variety). Solutions today have attempted to find ever better methods of getting the user to the “right” documents. As a result, data scientists and data analysts today are confronted with the dilemma of an ever-increasing need to read to understand. This is an untenable problem.”

I hear the UK department of spooks saying “hear, hear” so I guess we shall see these gentlemen in the room. But I must turn now to welcome a wonderfully exciting player, which, like Pingar, seems to have emerged at the right place at the right time. In 1985 I became a founder member of the Space Society. This could have been my recognition of the vital task of handling remotely sensed data, or the alluring nature of the Organizing Secretary who recruited me. She moved on, and so did I, ruefully reflecting that no software environment yet existing could handle the terabytes of data  that poured from even the early satellites. Now we have an order of magnitude more data, but at last practical solutions  like SpaceCurve (www.spacecurve.com) from Seattle. Here is the conversation we all wanted then: pattern recognition systems, looking at parallel joins between distributed systems and indexing geospatial polygons… working on multi-dimensional, temporal, geospatial data, data derived from sensors, and analysis of social graphs. Now, if I thread together the third of the words on their website that I understand, I perceive that large scale geospatial has its budding solutions too, and its early clients have been in commodities (the goal of all that geospatial thinking years ago) and defense. Of course.

So I hope to see them filling in their applications as well. In the meanwhile, I shall study hard and seek to produce in the next few months a more detailed analysis of each. But in the meanwhile, if you are gloomy about the ability of the great information companies to survive the current firestorm of Change, reflect on this. Three of my six – Palantir, Treato and SpaceCurve – share a common investor in Reed Elsevier Ventures. They should take a bow for keeping their owners anchored within the framework of change, and making them money while they do it.


Comments

Name (required)

Email (required)

Website

Speak your mind

1 Comment so far

  1. BIG Data: Six of the Best | BIIA.com on April 22, 2013 12:35

    [...] David Worlock Blog  (David Worlock is Chairman of BIIA) This entry was posted in B2B Media, BIG DATA, BIG DATA, [...]