RPA  Robotic Process Automation. The new target of the Golden Swarm of software VC investors. Sometimes misleadingly known in more refined versions as IPA (Intelligent Process Automation, not warm English beer). 

In my view the central strategic question for anyone who owns or collects and manages news and information, educational and professional content, prices or market data relating to business verticals, and commodities is now simply this: when I license data to process automation, what is my expectation of the life of that annuity revenue stream, and how fast do my users connections and market requirement sensitivity decay? Over the past five years we have seen an industry predicated on the automation of mundane clerical work take huge strides into high value workflows. Any doubt around this thought can be clarified by looking at the speed of advance of automated contract construction in the legal services market. The ability to create systems that assemble precedents, check due diligence, create drafts and amend them is as impressive as it is widespread. The fact that many law firms charge as much for signing off on the results as they did for the original work says more for their margins than it does for the process. But that message is the clearest of all: automating process software is expensive, but eventually does wonders for your margins in a world where revenue growth is hard to come by for many. 

And, at least initially, RPA systems are greedy eaters of content. Some early players, like Aravo Solutions, became important middlemen for information companies like Thomson Reuters and Wolters Kluwer in creating custom automation for governance, risk and compliance systems. Their successors, productising the workflow market, have been equally enthusiastic about licensing premium content, but unlike their custom predecessors, while they have enjoyed the branded value of the starter content, they have also found that this is less important over time. If the solution works effectively and reduces headcount, that seems to be enough. And over time, systems can become self-sufficient in terms of content, often updating information online or finding open data solutions to diminish licensing costs. 

The ten companies in this sector (which included Century Tech as an example of learning as a workflow) that I started to follow three years ago have matured rapidly. Three have become clear market leaders in the past 6 months. Automation Anywhere and UiPath in the US, together with Blue Prism in Europe have begun, from an admittedly low start points, to clock up 100-500%+ annualised revenue growth rates, But a note of caution is needed, and was importantly provided by Dan McCrum writing in the FT on 13 September (https://ftalphaville.ft.com/2018/09/13/1536811200000/The-improbably-profitable–loss-making-Blue-Prism/). He demonstrated that by writing all of its sales costs ( mostly through third parties) to fixed administration costs it was able to claim close to 100%  ebitda and score a 1.7 billion pound valuation on the London AIM market while revenues were 38 m pounds and losses are still building. UiPath (Revenues $100m, revenue growth 500%, valuation $1 bn) and Automation Anywhere (valuation $1.8 bn) follow a similar trajectory. 

All content markets are looking at a future where machines use more content than people, This makes it more important than ever that information is sourced in ways that can be verified, audited, validated and scored. This is not just an “alternative facts” or “fake news” issue – it is about trust in the probity of infrastructures we will have to rely upon. Content owners need to be able to sell trust with content to stay in place in the machine age, at least until we know where the trusted machines are kept. In the meanwhile it will be interesting to see which information, data and analytics companies acquire one of these new software players, or which of these new high value players uses the leverage of that valuation to move on a branded and trusted information source.

Dear reader, I am aware that I have been a poor correspondent in recent weeks, but in truth I have been doing something I should have done long ago: gaining some experience of AI companies, talking to their potential customers and reading a book. Lets start at the end and work backwards. 

The book that has eaten the last week of my life is Edward Wilson-Lee’s fine new publication, The Catalogue of Shipwrecked Books, which describes the eventful life of Christopher Columbus’ illegitimate son, Hernando, and his attempts to build a universal library of human knowledge. Hernando collected printed works, including pamphlets and short works, in an age when many Scholars then still regarded all print as meretricious rubbish. He built a catalogue of his collection, and then realised that he could not search it effectively unless he knew what was in the books, so started compiling summaries – epitomes – and then subject indexing, as well as inventing hieroglyphs to describe the physical properties. In other words, in the 1520s in Seville he built an elaborate metadata environment, but was eventually defeated by the avalanche of new books pouring out of the presses of Venice and Nuremburg and Paris. Wilson-Lee very properly draws many parallels with the early days of the Internet and the Web. 

As I closed this wonderful book, my mind went back to an MIT Media Lab talk in 1985 given by Marvin Minsky. We need reminding how long the central ideas of AI have been with us. At the end of his talk, the Father of AI kindly took questions, and a tame librarian in the front row asked “Professor, If you were looking back from some inconceivably distant date, like, say, 2020, what would surprize you that you have in 2020 but which we do not have now?”. After a thoughtful moment, the great man replies “Well, I guess that I would praise your wonderful libraries, but  still be surprized that none of the books spoke to each other”. At that he left the room, but from then the idea of  books interrogating books , updating each other and creating fresh metadata and then fresh knowledge in the process of interaction has been part of my own Turing test. So I find it easy to say that we do not have much AI in what we call the information industry. We have a meaningless PR AI, a sort of magic dust we sprinkle liberally (AI-enhanced, AI-driven, AI- enabled etc) but few things pass the “books speaking to books and realising things not known before” test.

And yet we can and we will. The key questions are, however: will current knowledge ownership permit this without a struggle, and will there be a dispute over the ownership of the results of these interactions? This battle is already shaping up in academic and commercial research, so it was dispiriting to find when talking to AI companies that it seems there is really no business model in place yet enabling co-operation. Partly this is a problem of perception. Owners and publishers see the AI players as technicians adding another tier of value under contract – and then going away again. The AI software developers see themselves as partners, developing an entirely new generation of knowledge engine. And neither of them will really get anywhere until we all begin to accept the implications of the fact that no one, not even Elsevier, as enough stuff in one place to make it work at scale. And while one can imagine real AI in broad niches — Life Sciences – the same still applies. And if we try it in narrow niches, how do we know that we have fully covered the crossovers into other disciplines which have been so illuminating for researchers  in this generation? In our agriscience intelligent system how much do we include on food packaging, or consumer market research, or plant diseases, or pricing data? 

So what happens next? In the short term it is easy to envisage branded AI – Elsevier AI, Springer Nature AI? I am not sure where this gets us. In the medium term I certainly hope to see some data sharing efforts to invest in AI partnerships and licence data across the face of the industry. It is true that there are some neutral players – Clarivate Analytics for example and in some ways Digital Science – who are neutral to the knowledge production cycle and have hugely valuable metadata collections. They could be a vital building block in joint ventures with AI players, but their coverage is still narrow, and in the course of the last month I even heard a publisher say “I don’t know why we let Clarivate use our data – we don’t get anything for it!”. 

Of course, unless we share our data we are not going to get anywhere. And given the EU Parliament rejection of data metering and enhanced copyright protection last week all these markets are wide open for for massive external problem solving – who remembers Google Scholar? The solution is clear – we need a collaborative model for data licensing and joint ownership of AI initiatives. We have to ensure that data software entrepreneurs get a payback and that investment and data licensing show proper returns, just as Hernando rewarded the booksellers who collected his volumes all across Europe. In a networked world collaboration is often said to the the natural way of working. It is probably the only way that AI can be fully implemented by the scholarly communications world. Hernando died knowing his great scheme had failed. AI will succeed if it shows real benefits to research and those who fund it. As it succeeds it will find other ways of sourcing knowledge if those who commercially control access today are not able to find a way of leading the charge, and not dragged along in its wake. 

« go backkeep looking »