“Keep it Simple, Stupid” was an acronym I brought home from the first management course I ever attended yet it has taken me years to find out what it really means. There are, clearly, few things more complex than simplicity, and one man’s “Simple” is another man’s Higgs Boson. So I was very energised to have a call last week from an information industry original who has been offering taxonomy and classification services to the information marketplace since 1983. When I first met Ross Leher in the late 1980s we were both wondering how far we would have to go into the 1990s until information providers recognized that they needed high quality metadata to make their content discoverable in a networked world. Ross had sold his camera shop to take the long bet on this, but he worked at his new cause with a near religious persuasion, as I realised when I went to see him in the 1990s at his base in Denver, Colorado. Denver at that time was home to IHS, whose key product involved researching regulatory material from a morass of US government grey literature. Denver people did metadata. It was a revolution waiting to happen.

So when I heard his voice on the phone last week my first emotion was relief – that he had not simply given up and retired to Florida – and then agreement. Yes, we were 15 years too early. And many of the people we thought were primary customers, like the Yellow Page companies and the phone books and the industrial directories – are now either dead or dying, or in the trauma of complete technological makeover. Ross’s company, WAND Inc (www.wandinc.com) is now very widely acknowledged as a market leading player in horizontal and multi-lingual taxonomy and classification development. They are the player you go to if you have to classify content, if you are in a cross-over area between disciplines (he has a great case study around taxonomies for medical image libraries), and if you have real language problems (“make this search work just as effectively in Japanese and Spanish”). What they do is really simple.

Your taxonomy requirement is going to start with broad terms that define your content and its area of activity. These can then be narrowed and specified to give additional granularity in any specific field. These classifications can be incorporated into the WAND Preferred Term Code, given a number, and used in a programmatic, automated way to classify and mark up your content (www.datafacet.com). Preferred terms can be matched to synonyms, and the codes can be used to extend the process to very many different languages. So someone whose company, for example, was created in Spanish can be found in the same list as someone who has a Japanese outfit, as the result of a search made by a Chinese user working in Chinese.

And from synonyms we can extend the process  to extended terms themselves, and then map the WAND system to third party maps – think of UNSPSC, Harmonized Codes or NAICS, as well as those superficial and now dwindling Yellow Page classifications. WAND can isolate and list attributes for a term, and can then add brand information. All of these activities add value to commoditized data, and one would think that the newspaper industry at least would have been deep into this for 15 years. Yet few examples – Factiva is an honourable example – exist which demonstrate this.

Not the least interesting part of Ross’s account of the past few years was the interest now shown by major enterprize software and systems players in this field of activity. Reports from a variety of sources (IDC, Gartner) have high-lighted the time being wasted in  internal corporate search. Both Oracle and Microsoft have metadata initiatives relevant to this, and it still seems to me more likely that Big Software will see the point before the content industry itself. With major players like Thomson Reuters (Open Calais) deeply concerned about mark-up, there are signs that an awareness of the role of taxonomy is almost in place, but as the major enterprize systems players bump and grunt competitively with the major, but much smaller, information services and solutions players, I think this is going to be one of the competitive areas.

And there is a danger here. As we talk more and more about Big Data and analytics, we tend to forget that we cannot discard all sense of the component added value of our own information. We know that our content is becoming commoditized, but that is not improved by ignoring now conventional ways of adding value to it. We also know that the lower and more generalized species of metadata are becoming commoditized; look for instance at the recent Thomson Reuters agreement with the European Commission to widen the ability of its competitors to utilize its RICs equity listings codes. This type of thing means that, as with content, we shall be forced to increase the value we add through metadata in order to maintain our hold on the metadata – and content – which we own.

And, one day, the only thing worth owning – because it is the only thing people search and it produces most of the answers that people want – will be the metadata itself. When that sort of sophisticated metadata becomes plugged into commercial workflow and most discovery is machine to machine and not person to machine we shall have entered a new information age. Just let us not forget what people like Ross Leher did to get us there.


keep looking »