New book by David Worlock. Pre-order now at Marble Hill Publishers or Amazon.

A small Cotswold farm is the setting for a classic struggle of wills. Robert Worlock, eccentric and demanding, resolutely maintains the old ways, determined above all to make his son into a farmer fit to take over the family acres. His son, David, is equally determined not to be bullied into something he neither wants nor likes. His childhood becomes a battleground: can he find a way to make his father love him without denying his right to determine his own life?

 

Who benefits is never a bad question to ask. In my mind, after long years in the information industry, it is a question closely related to “follow the money”. And it is closely in my mind at the moment, since I have been reading the UK Information Commissioner’s consultation (https://ico.org.uk/) about-the-ico/what-we-do/our-work-on-artificial-intelligence/generative-ai-second-call-for-evidence/ on the use of personal data in AI training sets and research data. The narrative surrounding consultation invokes;  for me, all sorts of ideas about the nature of trust.

Let me try to explain my ideas about trust, since I think the subject is becoming so controversial that each of us needs to state their position before we begin a discussion. For example, I trust in the brand of marmalade to which I am fairly addicted. My father was an advocate of Frank Coopers Oxford marmalade, and this is probably the only respect in which I have followed him. We certainly have over 100 years of male Worlock usage of this brand of marmalade. Furthermore, in modern times, the ingredients are listed upon the jar, together with any chemical additives. Should I suffer a serious medical condition as a result of my marmalade addiction, I can clearly follow the trial and find where it was made and the provenance of its ingredients. And in the 60 or so years that I have been enjoying it, it has not varied significantly in flavour, taste or ingredients.

I also believe, being a suspicious country man, in something that I call “the law of opposites” . Therefore, when people say that they “do no evil “ or claim that they practice “effective altruism “, then I wonder why they need to tell me this. My bias, then becomes the reverse of their intentions: I tend to think that they are telling me that they are good because they are trying to disguise the fact that they are not. This becomes important as we move from what I would term an open trust society – exemplified by the marmalade – into a blind trust society – exemplified by the “black box” technology, which , we are told, is what it is, and cannot be tracked, audited or regulated in any of the normal ways.

The UK Information Commissioner has similar problems to mine, but naturally at a greater level of intellectual intensity. In their latest consultation document, his people ask whether personal data can be used in a context without purpose. Under data privacy rules, the use of personal data, where permitted, has to be accompanied by a defined purpose. whether the data is used to detect shifts in consumer attitudes or to demonstrate the efficacy of a drug therapy, the data use is defined by its purpose. General models of generative AI, with no stated or specific purposes, violate current data protection regulation, if they use personal data in any form, and this should set us wondering about the outcomes, and the way in which they should earn our trust.

The psychologist Daniel Kahneman who died this week, earned his Nobel prize in economics for his work on decision-making behaviours. His demonstration that decisions are seldom made on a purely rational basis, but are usually derived from preferences based on bias and experience (whether relevant or not) should be ever present in our minds when we think about the outputs of generative AI.Our route to trusting those outcomes should begin with questions like: what is the provenance of the data used in the training sets? Do I trust that data and its sources? Can I, if necessary, audit the bias inherent in that data? How can I understand or apply the output from the process if I do not understand the extent and representativeness of the inputs?

I sense that there will be great resistance to answering questions like this. In time there will be regulation,. I think it is a good idea now for data, suppliers and providers to annotate their data with metadata, which demonstrates provenance, and provides a clear record of how it has been edited and utilised, as well as what detectable bias was inherent in its collection. One day, I anticipate, we shall have AI environments that are capable of detecting bias in generative AI environments, but until then we have to build trust in any way that we can. And where we cannot build trust we cannot have trust, and the lack of it will be, the key factor in slowing the adoption of technologies that may one day even surpass the claims of the current flood of Press releases about them. Meanwhile, cui bono? Mostly .it seems to me, Google, Microsoft, Open AI, Meta. Are they ethically motivated or are they in it for the money? For myself, I need them to clearly demonstrate in self regulation that they are as trustworthy as Frank, Coopers, Oxford marmalade.

In the history of software, as I have suffered it in the past 45 years, the most time wasting difficulty has been the false dawn syndrome. My first CTO, Norman Nunn Price was a grizzled Welshman with an unquenchable enthusiasm for the ability of software to solve all problems. As a young man, he had worked on radar in submarines in the Second World War. When, as his more youthful CEO, I sometimes questioned his predictions, the reply often included “look, we won the bloody war using this stuff didn’t we? “. But as the years passed  by in our development of a start-up in legal information retrieval, we began to notice that when Norman and his team announced that the job was done, or the fix was in place, or the application was ready and the assignment was completed, we were actually at the beginning of another work phase, and not at a point of implementation. Once, in frustration, I pointed out forcibly to Norman that, despite his optimistic announcement that he had once more brought us successfully to a moon landing, it appeared that I was still 50 feet above the surface with no available mechanism to get me down there. It became a company saying.

I find myself using it regularly as I listen to the way in which data and analytics companies are learning to live with AI. I cannot fault the ambition. it is clear that many service providers are framing solutions that are going to provide really dramatic advances in value to the widest possible range of societal requirements. But once the service design and the value added has been determined, we come to that familiar place which the software engineers  describe in terms of ETL – the whole business of extracting, transforming and loading data. it is here that we discover that our data is not like other data. It is too structured, or not structured at all. It has been marked up in a way that makes it difficult to transform, or it hasnot been marked up at all, which makes it difficult to transform. It either lacks metadata to guide the process, or has too much metadata, or nobody can understand and use the metadata. So we must pause and create a solution.

This is a well trodden track. Others have gone before us. The problems about integrating data into cloud data services like Databricks and Snowflake have slowed progress and added to costs for the past five years. It is interesting to see that the small industry has grown up to ease a problem, with companies like prophecy.com, emerging with effective solutions. One might imagine that the same will happen with AI. Data transformation will cease  in time to be an issue, since a raft of services will have emerged to deal with common problems, and the data creators will have reacted and adapted to the issues that arise when data is ingested into AI environment of all sorts.

But of course, this will not stop the press releases, which will continue to claim that something has happened some time before it might possibly happen. Yet, it should moderate our expectations a little bit. Many feel that we have not yet hit the problems of getting first generation AI services fully operational, even if we are talking as if we were rolling out, second generation services, tried and tested by legions of users. 50 feet above the moon can be a good place to be if it provides an opportunity to pause for thought, and realign our thinking before we make the slow eventual descent to the lunar surface.


keep looking »