Data: Re-Use, Waste and Neglect

Filed Under B2B, Big Data, Blog, data analytics, Industry Analysis, internet, machine learning, Publishing, semantic web, social media, STM, Uncategorized | Leave a Comment

We live in fevered times. What happens at the top cascades. This must be the explanation for why revered colleagues like Richard Poynder and Kent Anderson are conducting Mueller – style enquiries into OA (Open Access). And they do make a splendidly contrasting pair of prosecutors, like some Alice in Wonderland trial where off-with-his-head is a paragraph summary, not a judgement. Richard (https://poynder.blogspot.com/2018/11/the-oa-interviews-frances-pinter.html). wants to get for-profit out of OA, presumably not planning to be around when the foundation money dries up and new technology investment is needed. Kent defends vigorously the right of academic authors to make money from their work for people other than themselves, and is busy, in the wonderful Geyser (thegeyser@substack.com) journal sniffing the dustbins of Zurich to find “collusion” between the Swiss Frontiers and the EU. Take a dash of Brexit, add some Trumpian bitters, the zest of rumour, shake well and pour into a scholarly communications sized glass. Perfect cocktail for the long winter nights. We should be grateful to them both.

But perhaps we should not be too distracted. For me, the month since I last blogged on Plan S and got a full postbag of polite dissension, has been one of penitent reflection on the state of our new data-driven information marketplace as a whole. In the midst of this. Wellcome announced its Data Re-Use prize, which seems to me to exemplify much of the problem. (https://wellcome.ac.uk/news/new-wellcome-data-re-use-prizes-help-unlock-value-research?utm_source=linkedin&utm_medium=o-wellcome&utm_campaign=). Our recognition of data has not properly moved on from our content years. The opportunities to merge, overlap, drill down through, mine together related data sets are huge. The ability to create new knowledge as a result has profound implications. But we are still on the nursery slopes when it comes to making real inroads into the issues, and while data and text mining techniques are evolving at speed, the licensing of access and the ownership of outcomes still pose real problems. We will not be a data driven society until sector data sources have agreed protocols on these issues. Too much data behind paywalls creates ongoing issues for owners as well as users. Unexploited data is valueless.

It’s not as if we have collected all the data in the marketplace anyway. At this year’s NOAH conference in London at the beginning of the month I watched a trio of start-ups in the HR space present, and then realised that they were all using the same data collected differently. There has to be an easier way of pooling data in our society, ensuring privacy protection but also aligning clean resources for re-use using different analytics and market targets to create different service entities. Lets hope the Wellcome thinking is pervasive, but then my NOAH attention went elsewhere as I found myself in a fascinating conversation about a project which is re-utilising a line of content as data that has been gratuitously ignored. And in scholarly communication, one of the best ploughed fields on the data farm.

Morressier, co-founded in Berlin by Sami Benchekroun, with whom I had the conversation, is a startling example of the cross-over utility of neglected data. With Justus Weweler, Sami has concerned himself with the indicative data you would need to give evaluated.

Progress reporting on early stage science. Posters, conference agendas, seminar announcements, links to slide sets – Morressier is exploring the hinterland of emerging science, enabling researchers and funders to gauge how advanced work programmes are and how they can Map the emerging terrain in which they work. Just when we imagined that every centimetre of the scholarly communication workflow had been fully covered, here comes a further chapter, full of real promise, whose angels include four of the smartest minds in scholarly information, morressier.com is clearly one to watch.

And one to give us heart. There really are no sectors where data has been so eked out that no further possibilities, especially of adding value through recombination with other data, in fact, in my daily rounds, I usually find that the opposite is true. Marketing feedback data is still often held aloof from service data, few can get an object based view of how data is being consumed. And if this is true at the micro level in terms of feedback, events companies have been particularly profligate with data collection, assessment and re- use And while this is changing it still does not have the priority it needs. Calling user data “exhaust” does not help: we need a catalytic converter to make it effective when used with other data in a different context.

When we have all the data and we are re-combining it effectively, we shall begin to see the real problems emerge. And they will not be the access and re-use issues of today, but the quality, disambiguation and “fake” data problems we are all beginning to experience now and which will not go away, Industry co-operation will be even more needed, and some players will have to build a business model around quality control. The arrival of the data driven marketplace is not a press release, but a complex and difficult birth process.

Sep

27

RPA: Generating Workflow and Automating Profits?

Filed Under Artificial intelligence, B2B, Big Data, Blog, data analytics, Financial services, Industry Analysis, internet, machine learning, RPA, semantic web, Uncategorized, Workflow | Leave a Comment

RPA Robotic Process Automation. The new target of the Golden Swarm of software VC investors. Sometimes misleadingly known in more refined versions as IPA (Intelligent Process Automation, not warm English beer).

In my view the central strategic question for anyone who owns or collects and manages news and information, educational and professional content, prices or market data relating to business verticals, and commodities is now simply this: when I license data to process automation, what is my expectation of the life of that annuity revenue stream, and how fast do my users connections and market requirement sensitivity decay? Over the past five years we have seen an industry predicated on the automation of mundane clerical work take huge strides into high value workflows. Any doubt around this thought can be clarified by looking at the speed of advance of automated contract construction in the legal services market. The ability to create systems that assemble precedents, check due diligence, create drafts and amend them is as impressive as it is widespread. The fact that many law firms charge as much for signing off on the results as they did for the original work says more for their margins than it does for the process. But that message is the clearest of all: automating process software is expensive, but eventually does wonders for your margins in a world where revenue growth is hard to come by for many.

And, at least initially, RPA systems are greedy eaters of content. Some early players, like Aravo Solutions, became important middlemen for information companies like Thomson Reuters and Wolters Kluwer in creating custom automation for governance, risk and compliance systems. Their successors, productising the workflow market, have been equally enthusiastic about licensing premium content, but unlike their custom predecessors, while they have enjoyed the branded value of the starter content, they have also found that this is less important over time. If the solution works effectively and reduces headcount, that seems to be enough. And over time, systems can become self-sufficient in terms of content, often updating information online or finding open data solutions to diminish licensing costs.

The ten companies in this sector (which included Century Tech as an example of learning as a workflow) that I started to follow three years ago have matured rapidly. Three have become clear market leaders in the past 6 months. Automation Anywhere and UiPath in the US, together with Blue Prism in Europe have begun, from an admittedly low start points, to clock up 100-500%+ annualised revenue growth rates, But a note of caution is needed, and was importantly provided by Dan McCrum writing in the FT on 13 September (https://ftalphaville.ft.com/2018/09/13/1536811200000/The-improbably-profitable–loss-making-Blue-Prism/). He demonstrated that by writing all of its sales costs ( mostly through third parties) to fixed administration costs it was able to claim close to 100% ebitda and score a 1.7 billion pound valuation on the London AIM market while revenues were 38 m pounds and losses are still building. UiPath (Revenues $100m, revenue growth 500%, valuation $1 bn) and Automation Anywhere (valuation $1.8 bn) follow a similar trajectory.

All content markets are looking at a future where machines use more content than people, This makes it more important than ever that information is sourced in ways that can be verified, audited, validated and scored. This is not just an “alternative facts” or “fake news” issue – it is about trust in the probity of infrastructures we will have to rely upon. Content owners need to be able to sell trust with content to stay in place in the machine age, at least until we know where the trusted machines are kept. In the meanwhile it will be interesting to see which information, data and analytics companies acquire one of these new software players, or which of these new high value players uses the leverage of that valuation to move on a branded and trusted information source.

« go back — keep looking »

Nov

29

Data: Re-Use, Waste and Neglect

Sep

27

RPA: Generating Workflow and Automating Profits?

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin