May
18
Facing up to Father: The pleasures and pains of a Cotswold childhood
Filed Under Uncategorized | 1 Comment
New book by David Worlock. Pre-order now at Marble Hill Publishers or Amazon.
A small Cotswold farm is the setting for a classic struggle of wills. Robert Worlock, eccentric and demanding, resolutely maintains the old ways, determined above all to make his son into a farmer fit to take over the family acres. His son, David, is equally determined not to be bullied into something he neither wants nor likes. His childhood becomes a battleground: can he find a way to make his father love him without denying his right to determine his own life?
Jul
28
Take Three Ideas!
Filed Under Uncategorized | Leave a Comment
I am sitting here on a deck in Nova Scotia, staring at the deep blue waters in the bay and listening to the lapping of the tide. And I am confused. I try to think of the implications of the digital age in general and the age of AI, as the wave now crashing on the shore of the information world , in particular.
This is really hard. What is happening now is shattering so many preconceptions, and is exhibiting so many differences to the way in witch the digital world subsumed the world of printed content. The result of my reading is that I now have a group of ideas, mutually incompatible, fighting each other in a bag like Kilkenny cats. In today’s blogger’s world, the thing to do is just write them down and let you tell me how to sort these things out. So here is my dilemma:
- AI will dis-intermediate the world of the information provider and content creator. The ability of the AI developers to both create fresh content and to synthesise data, combined with the refusal of governments, led by the US government, to protect copyright holders and instead sacrifice them to the economic growth model, points in this duration. I see much in the arguments of Tim O’Reilly about B2B markets, and in a series of articles this month by Ben Kaube in Scholarly Kitchen and on LinkedIn to support this view. It will happen slowly, and not uniformly, but all the signs are there.
- i am beginning to think (some of you will say , not before time!) that we need to pay more attention to the way that user behaviour is changing, and the way we absorb and use information, ideas, knowledge and even wisdom. We pay far more attention to the technology and the business models., In the 45 years which have past since I first walked through the doors of an information industry startup, the changes in user behaviour have been startling. I have been reading Marianne Wolf‘s latest book. Reader , Come Home. I do not agree with the whole thesis – the idea that the next generations never do things as well or the way that we did is somewhat tiresome – but there are real truths here. We are increasingly intolerant of longform (witness my own publisher and readers on the subject of length – whether we are discussing the books, the paragraphs or even the words!) Of course the smartphone has changed the way which we communicate. I still believe that we are a narrative driven animal, however, and communication patterns can rotate as well as change – the first English novels, like Samuel Richardson‘s Clarissa, were after all narratives documented in exchanges of letters.
- As a result of various discussions this month , I was brought back to Walter Ong. I remember being very impressed years ago by his arguments about the development from what he called a world governed by orality to a world of literacy. I have put an AI summary of the distinction that he makes at the foot of this note. Perhaps I have been too much influenced by what has happened in my own life, but I think that we are turning once again towards a voice-moderated society. Since I was registered blind two years ago, I have tracked with interest my ability to substitute voice for eyesight in a society that seems to have been doing somewhat the same. I have been known to describe the keyboard as the longest cul-de-sac in the history of civilisation! But if we are returning to a world where oracy is the more dominant force, then do all the changes in consciousness that Ong noted go into reverse? What has to happen to memory, comprehension, and precision of expression to name but three facets of communication to enable us to work with equal facility in either form of expression.?
Walter Ong died in 2003 and was more concerned, as a Jesuit, with the theological implications of these ideas than I am. Yet by defining writing as a technology, and by introducing the idea of secondary orality to account for the influence of radio, telephone and television in his lifetime, I feel that he points us towards taking the shift to a further advancement of voice in a smartphone based, AI moderated world very seriously indeed. The knowledge workers of today have been brought up to think and rationalise through writing. How will this change and what will be the effect of those changes on the way in which our information systems work?
In 1980, I became CEO of EUROLEX, the first UK legal information retrieval service. I wonder, as I gaze out at the bay, what I would need to do to make such a service competitive today. Comprehensiveness in data terms is a given. Superb AI capability that fully understands the legal context is a must. But I think that to maintain competitive strength I would want to add two other elements. I would want my AI to have a voice based prompt engineering component. In other words, I would want my users to be able to discuss their questions in detail before finalising them, getting critical feedback on phrasing and coverage before posing the question, and providing for reiteration and refinement to encourage the idea that a range of answers is better than a “solution“. And I wiould create publicise a panel of leading legal experts and jurists employed to provide alternative judgements and interpretations of the law. I would argue that these, when added to the model, gave additional reach and value, but they would not be available publicly outside of the service. In an age when all the data is held by everybody, then the things that machine intelligence finds most difficult – total originality, eccentricity, illogical reasoning with an acceptable result – these may have a premium. And all the while I would try to study the behaviour of my users minutely. My ability to stay ahead would depend entirely on how close I modelled the way they communicated.
but just now it is a whole lot easier to watch the seabirds circling across the bay!
- Oral Cultures:
Ong characterized oral cultures as relying heavily on sound, memory, and direct interaction. They tend to be communal, situational, and rely on formulaic language and storytelling for knowledge transmission. - Literate Cultures:
Writing, for Ong, is a technology that restructures consciousness. It enables abstraction, analysis, and the development of a more individualistic worldview.Literacy allows for the storage and manipulation of knowledge in ways not possible in oral cultures, leading to new forms of thought and social organization. - (Thanks to Perplexity for the summary – glad there is no copyright in AI outputs! }
Jun
24
The conversation often goes like this:
“What do you think are the most important issues for the information industry today?”
“Well, of course it’s AI, and getting these AI developers to act responsibly around data“.“
You mean, act responsibly and transparently and identify the data used and held in their models?“
“Yes, of course, and acting responsibly also means paying a decent license fee for our data content!”
“Yes, they have to realise that they cannot ignore the powerful legal and moral position of those who hold copyrights in valuable data. “
I too am a firm advocate of data licensing for AI modelling reuse. When IP is used for any purpose, I believe that it has to be recognised, the usage has to be by consent, and that proper acknowledgement in monetary terms needs to be made to recognise the effort and curation involved. In saying this, of course, I also want to make it clear that I know that most data owned by most B2B organisations that use it in information services were not the original IP of these owners, but that the current owners have obtained the data in the course of creating an information service of some sort of another. In doing this, they often edited it, structured it, improved it, added metadata to it and created value as a result. The original owners – governments, private citizens, research organisations, corporate bodies etc create the data by virtue of their existence and their activity, and in some instances need it to be collected and manipulated for reasons of public policy, research and innovation, compliance activity or reputation management. For most information service providers, the date that they have collected is the most valuable commodity in their world – “the oil of the virtual world“. They prize it highly and they think it is unique. They are right to value it, but we are all becoming gradually aware that there is more data in the world than is contained in the worlds commercial databases, the Cloud or even the Internet.
in the course of looking for and trying to map the various emerging data licensing agencies, the breadth of possibility becomes clear. The powerhouse that is CCC, the Copyright Clearance centre (www.copyright.com) is central to everything and is concentrated around scientific and medical data.).ProRata (prorata.ai )builds AI-based attribution and monetization technologies and solutions that credit and compensate content owners for the value of their work. Human Native (humannative.ai ) says“ Better AI starts with better data. We bring together suppliers of high quality, premium data with reputable AI developers—come join the ecosystem“. Created by Humans (createdbyhumans.ai) calls itself.: “The AI rights licensing platform for books“ while Narrativ ( narritiv.ai) is a licensing site for voices and voice data. And the Data Llicensing Alliance run by Dave Myers, is more than four years old, and seeks to build a marketplace of buyers and sellers in STEM data (www.diadata.com).
Yet all of this rich variety exists in the domain of human creativity. The needs in data terms of AI models are not confined to human creativity. The potential use of data derived from machine intelligence now becomes a factor in creating AI models, and just as we have heard about synthetic data in terms of financial services, so we are now beginning to think about synthetic data in terms of AI modelling.. The announcement last week of the funding of. SandboxAQ by Nvidia takes this former Google startup into new territory.
SandboxAQ (www.sandboxaq.) is, it says, “ leading the next wave of enterprise AI with Large Quantitative Models (LQMs) — grounded in physics and built to simulate real-world systems. Across biopharma, chemicals, advanced materials, cybersecurity, healthcare, navigation, and more—LQMs provide the scientific accuracy and computational scale to solve the world’s most complex challenges.“ So, in Financial services and in scientific research and innovation at least, we can make our own data and not be wholly dependent upon the world of owned and traded data. And as this new scenario becomes apparent, some of us will begin to wonder what it’s affect will be unreal world data evaluations.
The use of AI in this way to create logical extensions of existing knowledge is already well established. I notice that the industry is beginning to refer to “synthetic“ data as opposed to the “real world data“ (inevitably,RwD) found in books and journals, government reports and newspapers. Of course, AI businesses, large or small, point to the licensing cost of data as a crippling tax which will restrict innovation. So far it does not seem to have strangled the competitive appetites of Silicon Valley, but will it stop start up innovators in small markets and niche sectors?
It seems that the data industry is thinking about that already. There is serious activity now around the idea of Open Data in this context (or it already exists in Open Science) not just as a way of sharing datasets amongst researchers, but also as a way of using Open Data to help small scale developers in build effective models without severe licensing costs. Common Pile vo1 is a development of this type.(https://huggingface.co/blog). The duty of ensuring that data is complete, accurate, and has not been distorted or polluted is a vital one, and ensuring that building effective models is not limited to the developers who have the deepest pockets is important as well. The huge collaboration that has built the common pile ( University of Toronto and Vector Institute, Hugging Face, the Allen Institute for Artificial Intelligence, Teraflop AI, Cornell University, MIT, CMU, Lila Sciences, poolside, University of Maryland, College Park, and Lawrence Livermore National Laboratory) are trying to build public standards in terms of both quality of data and of transparency. We should all be grateful for their work.
So now we have data in a variety of forms. Information industry data that can be exchanged and traded shares the business of AI model development with Open Data resources built and released for the very purpose, and with AI created data built as a way of testing probability and computing the logical data extensions of the world we already know., Is this also a pointer towards the ability of the machines to create the resources required by the machine. Perhaps we should be thinking not just about the value of data and data licensing transactions, but also about the duration and lifespan of data licensing markets themselves.
keep looking »