You can tell the sort of industry we are becoming by the language we use to describe what is happening. Does an industry which refers to data “mining”, or entity “extraction”, seem to you to want to align itself to the softer values of literary publishing? Our senior management teams are now replete with data or content “architects” working alongside data or process “engineers” to ensure that we handle data as content in the right way for today, while staying “agile” in terms of new product development. We are “solutions” orientated because now, for the first time in history, we really can tell how our content is being used, what problems users commonly encounter, and how we can ease their processes, help their learning, improve their workflow or deepen their insight by adjusting, or helping them to self-adjust. The way in which data-as-content is recorded in our systems creates new dataflows which are all about those reactions. We used to throw that data away, some of us, because we could not “read” it. Now the “exhaust data” blown out of the back of our machines when they are running full tilt, could be just the place to pan for gold.

And just as diesel is apparently more noxious than petrol, and heavy vehicle than modest family car, there are clearly many different varieties of exhaust. I have always worried greatly about the use to which events organizers have used the rich data derived from registrations, exhibitor profiles, attendee tracking and preference listings. Given privacy constraints there is clearly scope here to add third party data from venues and elsewhere and go beyond the needs of an individual show and into service development for the target group more generally. I have been told in the past that there is too much data to handle or too little to give significant results – all excuses which become increasingly pale in the age of data. And the same opportunities exist in the creation of usage data in online services universally.

But much of the exhaust data potential is less obvious. Jose Ferreira, founder and CEO at Knewton, notes in his latest blog (

“OER (Open Education Resources) represents a tectonic shift in education materials. Try typing “mitosis” into Google. Almost every search result on the first few pages is for OER exploring the process of cell division. The same is true for nearly any other concept you type in: “subject-verb agreement,” “supply and demand,” “Pythagorean theorem” — you name it. And what you can find today on the Internet is probably less than one tenth of one percent of the OER out there. Most is trapped on teachers’ PCs.”

And I bet he is right. Services already exploit this exhaust from the teaching processes of individual teachers (TES Connect, But Jose’s argument goes further. If you are able to employ the OER (what I think I used to call the “learning object” then you are able to see who stumbled over it, what the exhaust data of assessment shows about understanding and accomplishment of learning objectives, and then you should be able to move towards a genuinely adaptive learning that understands learning difficulty and recognises speed of learning acquisition.

Another form of feedback loop came to light this week in a note from f1000Research, the Open Access service in STM which is clearly bent on adding fresh layers of meaning to the expression “on the fly”. Using studies of Drosophila Melanogaster (fly – geddit ?) in his paper on genetic variations in different populations ( the Professor of Neurogenetics at Regensburg and f1000 release for the first time an article in which not only are the professor’s data changeable as fresh evidence emerges, but other labs are invited to add their own data to one part of the data to get a comparative view. This article is then a “living” entity, showing fresh results – “on the fly” – every time it is opened. It also, of course, allows every lab to make comparative studies of its results to the Regensburg results, introducing a fresh instance of the “repeatability” principle to peer review. And the interactions of other labs with the article produces a fresh stream of exhaust data, some of which may itself be citable in this instance.

Like “robotic milking”, the new craze in farming, this should be seen as a great gift to publishers. A robotic cash cow that milks itself! But I fear it will be very specialised in its applications, since looking a gift cow in the mouth, or swatting a data fly, are more traditional pastimes for those-once-called-publishers than searching for gold in the exhaust.


Name (required)

Email (required)


Speak your mind