Show me your provenance!

Filed Under Uncategorized | Leave a Comment

When I was forced to temporarily cease blogging a few years ago, (see personal note below) AI was a fact of life. Every year we saw improvements in the use of increasingly sophisticated algorithms. We noted the rise and rise of robotic process automation. Those of us with two decades of industrial memories recalled expert systems and neural networks. Those of us with four decades remembered hearing Marvin Minsky at MIT, telling us that he wanted the books in our libraries to speak to each other, to exchange and update knowledge and to build a new knowledge out of that exchange. Yet nothing here prepared us for 2023.

When historians get last year into some perspective they will probably conclude that what happened owed as much to the content creation requirements of online advertising, or the financial services requirement for a new wave of Silicon Valley investment frenzy as it did to a breakthrough in AI capabilities. yet what actually happened last year, even without such a perspective, is truly amazing. The year installed AI as a key strategic component in any strategic planning exercise in almost any commercial activity. Hyper-investment and hyperactivity resulting from it produced tools in generative AI which, a mere year later had immensely more powerful. Compare Chat GPT 3 To the current iteration of Gemini: a context window of 122,000 tokens to one of 1 million. Then look at the public recognition factor, and you find a world in which there is now a normal expectation that machine intelligence and machine interruptibility will be a part of everyone’s every day life. It is as if a switch has been flicked on, illuminating a new room into which we have walked for the first time. We all of us know that we can never now go back through that door or switch off that light. Pandora’s Law.

And we should not want to go back, either. What has happened should simply remind us that change does not happen evenly, and that the realisation of change sometimes takes longer to happen than we anticipate. But in 2023 I detected something else as well. A fear of change that was a little beyond normal anxiety. In the world in which I have worked for over 50 years the idea that content creation through the exercise of machine intelligence could be more threatening than beneficial gained a powerful currency and soon turned into dystopian editorials in both trade and consumer media. As a result we have come out of 2023, the year of AI megahype, with both an enhanced view of the speed and power with which machine intelligence will help, support, and change our society, and a hysterical fear of evils unknown which may result from quantum computers secretly plotting our downfall on the network. Since the invention of the wheel mankind has been learning to accommodate and live with the machine, and we shall surely do so in the world of AI as well. Yet, in the clan to which I belong, the data, services and solutions vendors who called themselves content companies and information providers a few years back (and then before that used to describe themselves in Gutenberg terms as publishers), there has been fear of a different sort. Whether it meant anything or not, they have always embraced the consolation of copyright, the belief that intellectual property can be described and identified and protected, as one of the bulwarks of their commercial viability. The idea that individual creativity could be mirrored by machine intelligence or that the machine might regurgitate, as a whole loan part, content acquired as part of training data, or that the value of content or data once described as “proprietary” could be lost in the machine intelligence age: these ideas are the very stuff of panic. Then add to them the knowledge that machine intelligence can produce “hallucinations “, that some related answers may not always be accurate and correct, and that the long-held belief that machines loaded with garbage do indeed produce rubbish, and we find integrity fears added alongside fear of theft and diminishing valuations.

One of my mentors of many years ago, recommending me to a potential client, commented that “while generally sound on strategy he can be unreliable on copyright “. I have over the years tried to be better behaved, but it is difficult because it takes so long to bring the heavy guns of copyright law to bear on problems that have usually departed long before adequate legislation is available to control them. Early regulation on AI, like the EU AI Act, seems , in any case, more bent on risk control than anything else.

While the Copyright lawyers are anxiously seeking reregulation for a machine age, I for one would take the arguments much more seriously if copyright holders paid real attention to marking their works with appropriate metadata and PIDs that indicated ownership and provenance. It is hard to imagine machine interoperable checking on the copyright status of works if those same works are not identified in ways that machines can recognise and understand. Then it becomes more possible to put pressure on AI developers to ensure that they licensed the genuine article, recognised the credentials of the real thing publicly, and increased the integrity of there solutions by showing users that only the real thing was used in the construction of the outcomes desired. This is beginning to happen in some encouraging ways: the fact that both Google and Open AI now accept C2 PA, the coding system developed for images and videos, shows what can be done by persuading people that being licit and responsible is good for business. Rather than have “fake“ hung round their necks, it is better to say that you will check and code every image that you use , especially in an American election year. In text and data there are similar emerging conventions. The ISCC– international standard content code – is now a draft ISO standard. The long- established GO FAIR provisions of the FAIR Foundation create metadata standards that render data “findable accessible Inter operable and reproducible “. Data and content owners who make it clear to interested parties and machines what the scope and ownership of their asset entails have a much better chance of working successfully with it in this New World. And in particular, they have a better chance of entering into proper andsatisfactory licensing agreements around it. If we are able to persuade the machine intelligence world that integrity is vital to business success, then we have a far better chance of creating the sort of licensing environments that pioneers like the Copyright Clearance Centre have advocated and piloted for years. Businesses in the network have to make for themselves the business conditions that work in the network.

So who will police and patrol all of this until law andregulation finally catches up, if it ever does? The publisher and copyright lawyer, Charles Clark, my fellow delegate to the European Commission Legal Information Observatory, invented the maxim “the answer to the machine lies in the machine”. It was never better applied than at this point. If you want to find bias in machine intelligence then the simplest way to do so is programmatically. If you wish to know whether training data has been derived from legitimate known sources that will vouch for accuracy and currency, ask the machine to interrogate the machine. For the AI companies, the price of reputation may be breaking open the black box and demonstrating good practice in creating answers from the very best inputs.

PERSONAL NOTE : I maintained this blog continuously from 2009 to 2021. I suffered eyesight problems which have left me with some 40% of my vision. My road back to this form of communication has taken three years, during which I’ve had the huge pleasure of writing two books, drafting a third and eventually returning to blogging. Writing in the world of text to speech and speech to text software is different. As I say on the end of all of my communications at work “ if you find errors of syntax, grammar or spelling in what I’ve written, please remember that it is much harder for me to edit than ever before, so try to smile indulgently. On the other hand, if you think that I have written utter gibberish, please contact me immediately!“

Mar

9

Show me your provenance!

Comments

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin