It reminds one superficially of mineral extraction. Who owns the seam of diamonds – the miner or the landowner? When rights are not clear or landownership in dispute? But this business of text or data mining is not really like that at all, and I was reminded this week by blogging contributions from two old friends that who owns the results of data extraction, from thousands or millions of unstructured files, where the data retrieved from individual datasets may be tiny (well within most fair usage provisions) but the contribution to the whole value may be huge, remains at issue. Play this in the context of Big Data and real questions emerge.

Lets go back to the beginning. Here are a couple of top of head examples of life on the planet that give a clue to what is worrying me:

* According to research quoted by the UK’s National Centre for Text Mining “fewer than 7.84% of scientific claims made in a full text article are reported in the abstract for that article”. This, they point out, makes cross-searching of articles using data mining and extraction techniques very important to science research. Fortunately the JISC organization which licences all journal article content from publishers on behalf of UK universities permits researchers to data mine these files, and no doubt this was agreed with the publishers within the license(?). But the question in my mind is this: who owns the product created by the data mining, and is this a new value which can be resold to someone else?

* Lexis Risk Management use many hundreds of public and private US data resources in their Big Data environment to profile people and companies. Both private and public data is researched, and, of course, it will often be the case that unique connections will be thrown up which encourage or discourage users from doing business with the data subject. Clearly Lexis own the result of the custom sweep of the data, and clearly it needs to be updated and amended over time as a result of fresh data becoming available, or more data being licensed into the mine. But do Lexis, or any other data extractor, own the result of the extraction process? They are able to sell a value derived from it, and that value emerges directly from the search activity and the weighting of the answers that they have accomplished. But do they own or need to own the content (which may be different in ten minutes time when another search is done on the same subject)? And can the insurance company who buys that result as part of their risk management model resell the data content itself to a third party?

I have put up two examples because I do not wish to polarize the argument into publishers v government. The issue arises in the UK, as the media lawyer’s lawyer, Laurie Kaye has pointed out, because the Hargreaves Review of copyright law recommends the retention of rights with the data miner – so you can make new products by recombining other people’s data. The UK government has adopted this recommendation with its usual emphatic “maybe”. Elsewhere in the world of August which I deserted to take a holiday, the UK government has come out with a storming approval of Open Data, and, as Shane O’Neill has repeatedly pointed out in his blogs, this contrasts sharply with the content retention policies pursued by UK civil servants, even now creating a Public Data Corporation in order to frustrate the political drive of its masters (how easily a licensing authority becomes a restricting body!).

There are two really troubling aspects of this to me. In the first instance we are not going to get the data revolution, the Berners Lee dream of linked data, the creation of hybrid workflow content modelling, or the Big Data promise of new product and service development unless there is a primary assumption in our society that all Open Web content, and all government or taxpayer funded content is available for data cross searching, unless there are national security considerations. And that it is a standard expectation for data leasing that discovery from multiple files creates new services for the person putting the intellectual effort into that discovery, and hopefully new wealth and employment in our society. If we simply continue to debate copyright as if it connotes the transfer of real world rights into the digital network then we shall constrain the major hope of intellectual property development this century.

And the second thing? Well, I am realist enough to know, after 20 years of lobbying this point, that it is unreasonable to expect the UK government to change its attitude to an information society in my lifetime. So maybe we can undermine these guardians of “my information is my power” by saying that we do not want their content – just the right to search it. After all if it is good enough for the universities and the progress of science, it should be good enough for Ordnance Survey and the Land Registry!

References

Making Open Data Real (www.data.gov.uk/opendataconsultation)

The Public Data Corporation (http://discuss.bis.gov.uk/pdc/)

Response to the Hargreaves Report (http://www.bis.gov.uk/assets/biscore/innovation/docs/g/11-1199-government-response-to-hargreaves-review)

National Centre for Text Mining (http://www.bis.gov.uk/assets/biscore/innovation/docs/g/11-1199-government-response-to-hargreaves-review)

Laurence Kaye (http://laurencekaye.typepad.com/)

Shane O’Neill (http://www.shaneoneill.co.uk/)

So, having noted the Jana/Teachers activist shareholders story on McGraw-Hill recently here, no one is more surprized than me at seeing it come instantly true. I am left wondering just how that happened. So Terry McGraw gets a letter from Jana saying  “You would be better off in two parts”, and doesn’t say “Who the hell are you?” but responds “Smart idea boys, we’ll do it next week!”  The only explanation is that this loaf was already half-cooked, and the Jana intervention gave Chairman McGraw opportunity to do what he wanted to do anyway, and follow Thomson, Reed, Wolters Kluwer and others in the one respect that they all have in common: they all sold out of education. Of course, this is blue-blood McGraw-Hill, so you don’t sell out, you just cast it adrift, while climbing adroitly into an accompanying life boat.

As a result we have two vessels now heading in opposite directions. McGraw Markets (everything which is not education), including all the B2B and credit rating assets, is in one, and everything education is in the other. But Pat English, a shareholder and CEO of Fiduciary Management Inc, told Reuters that this was only the start: “It doesn’t make sense to have S&P ratings, S&P indices, Capital IQ, Platts, and other companies under one roof”. So what happens in October? Do we see Chairman McGraw skip down the gangplank and set sail in the SS S&P, leaving the waste barge B2B to sink in the Hudson? Anything is possible of course: we are watching one of the largest corporate deconstructions in the sector since D&B sold all of their global subsidiaries to franchise holders.

And why? The answer is a not inconsequential $3 billion. This is the difference between the valuations expected for Markets and Education apart, compared to the current, or pre-announcement, values. Education is seen to be in the slow lane and holding back an advanced valuation of S&P. No one has ever explained cogently to me why companies, however large, cannot have valuations which reflect the intrinsic worth of their parts, and why “true” valuations cannot be exhibited without break out, but clearly I am in the nursery class in these matters. And my eye also caught the Chairman’s statement that $1 billion in overheads would be saved. That I really appreciate. I can see that the corporate office of a chairman, for example, would need less aides, fewer executive jets and less travel in a global $4.5 billion company than in a $6.5 billion global company, but since Chairman Terry is going to Markets, there will have to be another Chairman at Education, also aided and abetted and privately flying around a $2 billion company. So where does the saving come in?

And where does the future come in? The US education market is grossly over-published. Margins are too low to attract investment (hence this deal). The nation hovers on the brink of radical IT solutions to address a national standards deficit, present across the developed world, which can only be tackled through individualized digital learning: everything else has failed. McGraw Education have a decent record of innovation, good assessment assets like the California Bureau, and 20 years of struggle, from Primis onwards, to show in justification. But they sit on the edge of the same decreasingly relevant mountain of textbook assets that also contains Harcourt Houghton Mifflin. They have a junior position in non-US markets, compared with their major competitor. But no one can currently compete with Pearson. Cengage have learnt to go global and diversify. McGraw could go with Harcourt, but the resulting debt pile would be bigger than the Greek economy, so this is unlikely. Maybe the “we now have the message” boys at IBM, or Intel, or Cisco, will buy them. But why? There are some good assets in medical education (Harrisons) but are we looking here at a slow death from asset sales until only the unsaleable are left? Eventually Pearsons’ major competitor in global markets will be a borne digital platform company, but these assets will not help them substantively to reach that position. On the other hand, my telescope, scanning the horizon desperately for a rescue vessel, sees the sleek global liner HP, just refuelling on high octane Autonomy. Vast interests in education there, and the potential to be the platform player to fight Pearson?

Back at Markets there are problems of a different kind. Platts, aviation and construction all have heavy data capable of real impact in workflow orientated networking. Although serious attempts have been made to leverage this, there is no evidence of much stomach for the fight, some critical people left, and the failing magazine/advertising/subscription businesses are, well, still failing. Pity that the “very best thinking” of the management team, which the Chairman quoted as the reason for the split, was not applied here some years ago. Alongside these are really good, but unrelated, businesses like JD Powers. And then this high grade financial services stuff, with high growth Capital IQ and of course the S&P play most valuable of all. I am forced to repeat the question of Mr English in other words: unless these businesses are radically changed in strategic direction, this company looks as much like a portfolio conglomerate as ever its now deceased parent did. Will this management make those changes? Or will they sell the most marginal assets next year and use the cash to buy back more shares? And is this portfolio nature a real poison pill against a purchase by another mega corp? So eventual break-up is eventually inevitable?

More questions than answers, but as we all search for value on the ocean bed of this recession, there can be no doubt that this will become a common path for beleaguered corporates in years to come. Until, in fact markets recover and growth seriously returns.

« go backkeep looking »