In this industry five years is enough to benchmark fundamental change. This week I have been at the 9th Publishers’ Forum, organized as always by Klopotek, in Berlin. This has become, for me, a must attend event, largely because while the German information industry is one of the largest in Europe, German players have been marked by a conservative attitude to change, and a cautious approach to what their US and UK colleagues would now call the business model laws of the networked information economy. At some level this connects to a deep German cultural love affair with the book as an object, and how could that not be so in the land that produced Gutenburg? On another level, it demonstrates that German business needs an overwhelming business case justification to institute change, and that it takes a time for these proofs to become available. Which is not to say that German businesses in this sector have not been inventive. An excellent two part case study run jointly by Klopotek and de Gruyter was typical: de Gruyter are the most transformed player in the STM sector because they have seized upon distribution in the network and selling global access as a fast growth path, and Klopotek were able to supply the necessary eCommerce  and back office attributes to make this ambition feasible. And above all, in a room of more than 300 newspaper, magazine and book executives, we were at last able to fully exploit the language and practice of the network in information handling terms. This dialogue would have been impossible in Germany five years ago. A huge attitudinal change has taken place. Now we can deploy our APIs and allow users to get the value and richness of our content, contextualised to their needs, instead of covering them with the stuff and hoping they get something they want.

In some ways the Day 2 Keynote from Andrew Jordan, CTO at Thomson Reuters GRU business, exemplified the extent of this. The incomparable Brian O’Leary had started us off on Day 1 in good guru-ish style by placing context in its proper role and reminding us that it is not content as such but its relationships that increasingly concern us. You could not listen to him and still believe that content was the living purpose of the industry, or that the word “publishing” had not changed meaning entirely. With Michael Healy of CCC and  Peter Clifton of +Strategy following him to hammer home the new world of collaboration and licencing, and the increasing importance of metadata in order to identify and describe tradeable entities, we were well on the way towards a recognition of new realities, ferried there before dinner by Jim Stock of MarkLogic using the connected content requirements of BBC Sport in an Olympic year to get us started in earnest on semantic approaches to discovery and our urgent needs to create appropriate platform environments to allow us to use our content fluently in this context.

So the ground was well-prepared for Andrew Porter. He took us on a journey from the acquisition of ClearForest by Reuters while it was being acquired by Thomson, to the use of this software by the new company to create OpenCalais, allowing third parties (over 60 of them) to get into entity extraction (events and facts, essentially) and then into the creation of complex cross-referencing environments, and finally to the use of this technology by Thomson Reuters themselves in the OneCalais and ContentMarketplace environments. So here was living proof of the O’Leary thesis, on a vast scale, building business-orientated ontologies, and employing social tagging in a business context. Dragging together the whole data assets of a huge player to service the next customer set or market gap. And no longer feeling obliged to wrap all of this in a single instance database, but searching across separately-held corporate datasets in a federated manner using metadata to find and cross-reference entities or perform disambiguation mapping. Daniel Mayer of Temis was able to drive this further and provide a wide range and scale of cases from a technology provider of note. The case was made – whether or not what we are now doing is publishing or not, it is fundamentally changed once we realize that what we know about what we know is as important as our underlying knowledge itself.

And of course we also have to adjust our business models and our businesses to these new realities – patient Klopotek have been exercising expertise in enabling that systems re-orientation to take place for many years. And we must recognize that we have not arrived somewhere, but that we are now in perpetual trajectory. One got a real sense of this from an excellent presentation to a very crowded room by Professor Tim Bruysten of richtwert on the impact of social media, and, in another way, from Mike Tamblyn of Kobo when he spoke of the problems of vertical integration in digital media markets. And, in a blog earlier this week, I have already reported on the very considerable impact of Bastiaan Deplieck of Tenforce.

Speaking personally, I have never before attended a conference of this impact in Germany. Mix up everything in the cocktail shaker of Frank Gehry’s great Axica conference centre alongside the Brandenburg Gate, with traditional book publishers rubbing shoulders with major information players, and chatting to software gurus, industry savants, newspaper and magazine companies, enterprize software giants and business service providers and you create a powerful brew in a small group. Put them through seperate German and English streams, then mix them up in Executive Lounge seminars and discussion Summits and the inventive organizers give everyone a chance to speak and to talk back. This meeting had real energy and, for those who look for it, an indication that the changes wrought by the networked economy and its needs in information/publishing terms, now burn brightly in the heart of Europe.

Now we are entering the post-competitive world (with a few exceptions!) it is worth pausing for a moment to consider how we are going to get all of the content together  and create the sources of linked data which we shall need to fuel the service demand for data mining and data extraction. Of course, this is less of a problem if you are Thomson Reuters or Reed Elsevier. Many of the sources are relationships that you have had for a long time. Others can be acquired: reflect on the work put in by Complinet to source the regulatory framework for financial services prior to its acquisition by Thomson Reuters, and reflect that relatively little of this data is “owned” by the service provider. Then you can create expertise and scale in content sourcing, negotiating with government and agency sources, and forming third party partnerships (as Lexis Risk Management did with Experian in the US). But what if you lack these resources, find that source development and licensing would create unacceptable costs, but still feel under pressure to create solutions in your niche which will reflect a very much wider data trawl than could be accomplished using your own proprietory content?

The answer to this will, perhaps, reflect developments already happening in the education sector. Services like Global Grid for Learning, or the TES Connect Resources which I have described in previous blogs give users, and third party service developers (typically teacher’s centres or other “new Publishers”) the ability to find quality content and re-use it, while collaborations like Safari  and  CourseSmart allow customization of existing textbook products. So what sort of collaborations would we expect to find in B2B or professional publishing which would provide the quarries from which solutions could be mined? They are few and far between, but, with real appreciation for the knowledge of Bastiaan Deblieck at TenForce in Belgium, I can tell you that they are coming.

Lets first of all consider Factual Inc ( Here are impeccable credentials (Gil Elbiaz, the founder, started Applied Semantics and worked at Google) and a VC-backed attempt to corner big datasets, apply linkage and develop APIs for individual applications. The target is the legion of mash-up developers and the technical departments of small and medium sized players. Here is what they say about their data:

“Our data includes comprehensive Global Places data, with over 60MM entities in 50 countries, as well as deep dives in verticals such as U.S. Restaurants and U.S. Healthcare Providers. We are continually improving and adding to our data; feel free to explore and sign up to get started!

Factual aggregates data from many sources including partners, user community, and the web, and applies a sophisticated machine-learning technology stack to:

  1. Extract both unstructured and structured data from millions of sources
  2. Clean, standardize, and canonicalize the data
  3. Merge, de-dupe, and map entities across multiple sources.

We encourage our partners to provide edits and contributions back to the data ecosystem as a form of currency to reduce the overall transaction costs via exchange.”

As mobile devices proliferate, this quarry is for the App trade, and here is, in the opinion of Forbes (19 April 2012), another Google in potential in the field of business intelligence (

But Los Angeles is not the only place where this thinking is maturing. Over in Iceland, now that the banking has gone, they are getting serious about data. DataMarket (, led by Hjalmar Gislason from a background of startups and developing new media for the telco in Iceland, offers a very competitive deal, also replete with API services and revenue sharing with re-users. Here is what they say about their data:

“DataMarket’s unique data portal – – provides access to thousands of data sets holding hundreds of millions of facts and figures from a wide range of public and private data providers including the United Nations, the World Bank, Eurostat and the Economist Intelligence Unit. The portal allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.

DataMarket’s data publishing solutions allow data providers to easily publish their data on and on their existing websites through embedded content and branded versions of DataMarket’s systems, enabling all the functionality of on top of their own data collections.”

And finally, in Europe we seem to take a more public interest-type view of the issues. Anyway, a certain amount of impetus seems to have come from the Open Data Foundation, a not-for-profit which also has a connection and has helped to stimulate sites like OpenCharities, OpenSpending (how does your government spend your money?), and OpenlyLocal, designed to illuminate the dark corners of UK local and regional government. All of these sites have free data, available under a creative commons-style licence, but perhaps the most interesting, still in beta, is OpenCorporates. Claiming to have data on 42,165,863 companies (as of today) from 52 different jurisdictions is is owned by Chrinon Ltd, and run by Chris Taggart and Rob McKinnon, both of whom have long records in the Open data field. This will be another site where the API service (as well as a Google Refine service) will earn the value-add revenues ( Much of the data is in XML, and this could form a vital source for some user and publisher generated value add services. The site bears a recommendation from the EC Information Society Commissioner, Nelly Kroes, so we should also record that TenForce ( themselves are leading players in the creation of the Commission’s major Open Data Portal, which will progressively turn all that “grey literature, the dandruff of bureaucracy, back into applicable  information held as data.

We seem here to be at the start of a new movement, with a new range of intermediaries coming into existence to broker our content to third parties, and to enable us to get the licences and services we need to complete our own service developments. Of course, today we are describing start-ups: tomorrow we shall be wondering how we provided services and solutions without them.


keep looking »