“Keep it Simple, Stupid” was an acronym I brought home from the first management course I ever attended yet it has taken me years to find out what it really means. There are, clearly, few things more complex than simplicity, and one man’s “Simple” is another man’s Higgs Boson. So I was very energised to have a call last week from an information industry original who has been offering taxonomy and classification services to the information marketplace since 1983. When I first met Ross Leher in the late 1980s we were both wondering how far we would have to go into the 1990s until information providers recognized that they needed high quality metadata to make their content discoverable in a networked world. Ross had sold his camera shop to take the long bet on this, but he worked at his new cause with a near religious persuasion, as I realised when I went to see him in the 1990s at his base in Denver, Colorado. Denver at that time was home to IHS, whose key product involved researching regulatory material from a morass of US government grey literature. Denver people did metadata. It was a revolution waiting to happen.

So when I heard his voice on the phone last week my first emotion was relief – that he had not simply given up and retired to Florida – and then agreement. Yes, we were 15 years too early. And many of the people we thought were primary customers, like the Yellow Page companies and the phone books and the industrial directories – are now either dead or dying, or in the trauma of complete technological makeover. Ross’s company, WAND Inc (www.wandinc.com) is now very widely acknowledged as a market leading player in horizontal and multi-lingual taxonomy and classification development. They are the player you go to if you have to classify content, if you are in a cross-over area between disciplines (he has a great case study around taxonomies for medical image libraries), and if you have real language problems (“make this search work just as effectively in Japanese and Spanish”). What they do is really simple.

Your taxonomy requirement is going to start with broad terms that define your content and its area of activity. These can then be narrowed and specified to give additional granularity in any specific field. These classifications can be incorporated into the WAND Preferred Term Code, given a number, and used in a programmatic, automated way to classify and mark up your content (www.datafacet.com). Preferred terms can be matched to synonyms, and the codes can be used to extend the process to very many different languages. So someone whose company, for example, was created in Spanish can be found in the same list as someone who has a Japanese outfit, as the result of a search made by a Chinese user working in Chinese.

And from synonyms we can extend the process  to extended terms themselves, and then map the WAND system to third party maps – think of UNSPSC, Harmonized Codes or NAICS, as well as those superficial and now dwindling Yellow Page classifications. WAND can isolate and list attributes for a term, and can then add brand information. All of these activities add value to commoditized data, and one would think that the newspaper industry at least would have been deep into this for 15 years. Yet few examples – Factiva is an honourable example – exist which demonstrate this.

Not the least interesting part of Ross’s account of the past few years was the interest now shown by major enterprize software and systems players in this field of activity. Reports from a variety of sources (IDC, Gartner) have high-lighted the time being wasted in  internal corporate search. Both Oracle and Microsoft have metadata initiatives relevant to this, and it still seems to me more likely that Big Software will see the point before the content industry itself. With major players like Thomson Reuters (Open Calais) deeply concerned about mark-up, there are signs that an awareness of the role of taxonomy is almost in place, but as the major enterprize systems players bump and grunt competitively with the major, but much smaller, information services and solutions players, I think this is going to be one of the competitive areas.

And there is a danger here. As we talk more and more about Big Data and analytics, we tend to forget that we cannot discard all sense of the component added value of our own information. We know that our content is becoming commoditized, but that is not improved by ignoring now conventional ways of adding value to it. We also know that the lower and more generalized species of metadata are becoming commoditized; look for instance at the recent Thomson Reuters agreement with the European Commission to widen the ability of its competitors to utilize its RICs equity listings codes. This type of thing means that, as with content, we shall be forced to increase the value we add through metadata in order to maintain our hold on the metadata – and content – which we own.

And, one day, the only thing worth owning – because it is the only thing people search and it produces most of the answers that people want – will be the metadata itself. When that sort of sophisticated metadata becomes plugged into commercial workflow and most discovery is machine to machine and not person to machine we shall have entered a new information age. Just let us not forget what people like Ross Leher did to get us there.

 


Comments

Name (required)

Email (required)

Website

Speak your mind

4 Comments so far

  1. Fabrizio Cardinali on January 30, 2012 15:11

    Great input, .. These are the sort of directions Publishers should (must) follow quickly to preserve their leadership as knowledge “mediators”..or their services will be eaten up by the next big Amazon on the road… Its not a matter of defending (their) paid .vs. free content ..but that of turning their assets and heritage into well tagged and STRUCTURED content (hence less valuable in machine terms..) content..The next step could be that of adopting Industry wide standards for SPECIALIZED (ie personalized to specif verticals)structured content and start putting up structured content management architectures at the heart of their (enterprise) organizations,….like many non publishing industries, like the banking and financial industries, have done at the dawn of XML transactions…

    The best solution out there, more than Docbook or ePub, seems to be DITA, the Darwinian Information Type Architecture gifted from IBM to OASIS… Many Publishers should be looking into it as should content management solutions .. as we are..(more info at
    http://bit.ly/xNJRRQ )

  2. Metadata - The importance of being found. on January 31, 2012 09:11

    [...] Warlock writes another interesting post about the value of Metadata and its growing importance in today’s information world and makes [...]

  3. Christopher Spigner on May 17, 2012 16:17

    Lol, I could have never thought that you can transform KISS into this, that was so so funny, many thanks for sharing!

  4. Taxonomy: The Route to Value Added – Major Enterprise Software Vendors Get the Message – while Content Providers in General Appear Ignorant | BIIA.com on April 20, 2013 19:32

    [...] recommend that you read David Worlock’s recent blog (KISS – but don’t tell) on his interpretation what Wand taxonomy means for our [...]