Lightning in the Cloud

Filed Under B2B, Big Data, Blog, data protection, Financial services, Industry Analysis, internet, news media, Publishing, Search, semantic web, social media, STM, Uncategorized, Workflow | Leave a Comment

So have we all got it now? When our industry (the information services and content provision businesses, sometimes erroneously known as the data industry) started talking about something called Big Data, it was self-consciously re-inventing something that Big Science and Big Government had known about and practised for years. Known about and practised (especially in Big Secret Service; for SIGINT see the foot of this article) but worked upon in a “finding a needle in a haystack” context. The importance of this only revealed itself when I found myself at a UK Government Science and Technology Facilities Council at the Daresbury Laboratory in he north of England earlier this month. I went because my friends at MarkLogic were one of the sponsors, and spending a day with 70 or so research scientists gives more insight on customer behaviour than going to any great STM conference you may care to name. I went because you cannot see the centre until you get to the edge, and sitting amongst perfectly regular normal folk who spoke of computing in yottaflops (processing per second speeds of 10 to the power of 24) as if they were sitting in a laundromat watching the wash go round is fairly edgy for me.

We (they) spoke of data in terms of Volume, Velocity and Variety, sourced from the full gamut of output from sensor to social. And we (I) learnt a lot about the problems of storage which went well beyond the problems of a Google and a Facebook. The first speaker, from the University of Illinois, at least came from my world: Kalev Leetanu is an expert in text analytics and a member of the Heartbeat of the World Project team. The Great Twitter Heartbeat ingests Twitter traffic, sorts and codes it so that US citizens going to vote, or Hurricane Sandy respondents, can appear as geographical heatmaps trending in seconds across the geography of the USA. The SGI UV which did this work (it can ingest the printed resources of the Library of Congress in 3 seconds) linked him to the last speaker, the luminous Dr Eng Lim Goh, SVP and CTO at SGI, who gave a magnificent tour d’horizon of current computing science. His YouTube videos are as wonderful as the man himself (a good example is his 70th birthday address to Stephen Hawking, his teacher, but also look at (http://www.youtube.com/watch?v=zs6Add_-BKY). And he focussed us all on a topic not publicly addressed by the information industry as a whole: the immense distance we have travelled from “needle in a haystack” searching to our current pre-occupation with analysing the differences between two pieces of hay – and mapping the rest of the haystack in terms of those differences. For Dr Goh this resolves to the difference between arranging stored data as a cluster of nodes to working in shared memory (he spoke of 16 terabyte supernodes). As the man with the very big machine, his problems lie in energy consumption as much as anything else. In a process that seems to create a workflow that goes Ingest > Store and Organize > Analytics > Visualize (in text and graphics – like the heatmaps) the information service players seem to me to be involved at every point, not just the front end.

The largest data sourcing project on the planet was represented in the room (The SKA, or Square Kilometre Array, is a remote sensing telemetry experiment with major sites in Australia and South Africa). Of course, NASA is up there with the big players, and so are the major participants in cancer research and human genomics. But I was surprized by how Big the Big Data held by WETA Data (look at all the revolutionary special effects research at http://www.wetafx.co.nz/research) in New Zealand was, until I realised that this is a major film archive (and NBA Entertainment is up there too on the data A List) This reflects the intensity of data stored from film frame images and their associated metadata, now multiplied many times over in computer graphics – driven production. But maybe it is time now to stop talking about Big Data, the term which has enabled us to open up this discussion, and begin to reflect that everyone is a potential Big Data player. However small our core data holding may be compared to these mighty ingestors, if we put proprietory data alongside publicly sourced Open Data and customer-supplied third party data, then even very small players can experience the problems that induced the Big Data fad. Credit Benchmark, which I mentioned two weeks ago, has little data of its own: everything will be built from third party data. The great news aggregators face similar data concentration issues as their data has to be matched with third party data.

And I was still thinking this through when news came of an agreement signed by MarkLogic (www.marklogic.com) with Dow Jones on behalf of News International this week. The story was covered in interesting depth at http://semanticweb.com/with-marklogic-search-technology-factiva-enables-standardized-search-and-improved-experiences-across-dow-jones-digital-network_b33988 but the element that interested me and which highlights the theme of this note concerns the requirement not just to find the right article, but to compare articles and demonstrate relevance in a way which only a few years ago would have left us gasping. Improved taxonomic control, better ontologies and more effective search across structured and unstructured data lie at the root of this, of course, but do not forget that good results at Factiva now depend on effective Twitter and blog retrieval, and effective ways of pulling back more and more video content, starting with You Tube. The variety of forms takes us well beyond the good old days of newsprint, and underline the fact that we are all Big Data players now.

Note: Alfred Rolington, formerly CEO at Janes, will publish a long-awaited book with OUP on “Strategic Intelligencein the Twenty First Century” in January which can be pre-ordered on Amazon at http://www.amazon.co.uk/Strategic-Intelligence-21st-Century-Mosaic/dp/0199654328/ref=sr_1_1?s=books&ie=UTF8&qid=1355519331&sr=1-1. And I should declare, as usual, that I do work from time to time with the MarkLogic team, and thank them for all they have done to try to educate me.

Dec

11

Education and Knowledge – Unlatched

Filed Under Blog, eBook, Education, eLearning, Industry Analysis, internet, Pearson, Publishing, Uncategorized | Leave a Comment

You never come away from a meeting with Dr Frances Pinter without feeling that everything is possible. If, in the whole world of print to digital transformation, you had thought that the scholarly monograph was the most certain lost cause to publishing, then you need to take tea with her – urgently. She is the supreme exponent of the idea that I have long held dear: that the network turns business models on their heads and that if you have the courage to stand on your own head to view them, then eventually a new mode of operation will result. Frances calls her new company “Knowledge Unlatched” (www.knowledgeunlatched.org). I call it Logic Unmatched, but before we consider what the new pre-sold scholarly monograph model might be, just lets think about what has been unlatched this last week in education and scholarship.

For me, it started with Ingram (http://www.prweb.com/releases/2012/12/prweb10192884.htm). I know that Coventry University in the UK is not a high fashion place of learning, but the announcement that Ingram as a book supplier (yes, those funny printed things) would be working on free distribution of textbooks as part of course fees struck me as fully redolent of “standing on heads” style of changes. Once our society begins to accept that learning materials are not extras that students buy but a constituent element of the service that the university provides to them, then we are on a roll. A roll that may change the functions of every other part of the system – and not least the library. The idea of “all-in” course materials, once rooted, has the capacity to change publisher-courseware selector-user relationships fundamentally, and the idea being pursued in print in Coventry is deeply relective of what is happening digitally on a widespread basis. I do not care what Harvard and Cambridge do: things only get interesting when Coventry do them. So I was not really surprized to next encounter a press release from Pearson’s EQUELLA digital educational resource about its work at Palm Beach Atlantic University. Another unfashionable, but deeply normal, institution, one suspects, but also one that relishes solutions for the problems of “efficient sharing and repurposing of learning objects for online course developmentin a team design environment”. EQUELLA is really very interesting, especially when used with its Content Exchange extension:

“To support continued innovation, the EQUELLA Content Exchange, part of EQUELLA version 6, provides an easy-to-use platform to share and sell content between EQUELLA instances. Private exchanges within a consortium, free exchanges of OER resources, and various eCommerce models can now all easily be powered by EQUELLA. Resources can be provided free of charge, sold outright or by subscription. At launch, the EQUELLA Content Exchange offers nearly one million Open Educational Resources from a variety of sources. These resources can easily be discovered and downloaded for free to any version 6 installation of EQUELLA via Content Without Borders, an open access repository powered by EQUELLA. This publicly accessible repository promotes and provides access to resources contributed by academic institutions and repositories from around the world, which are available through content harvesting, and direct access to the website.” (http://www.pearsonapac.com/index.php?id=247&action=view&section=46&module=information_librarymodule&src=%40random4e816d5c9ff34)

EQUELLA, of course, is a Pearson APAC development, so now our range of revolutionary budgeting, content course provision, and upside down thinking rings the globe. And so does the stress for old style publishing seeking accommodation with this new world. This week brought results from Wiley (http://eu.wiley.com/WileyCDA/PressRelease/pressReleaseId-101829.html), and while it is good to hymn the arrival of another Wiley generation onto the board, and good to note acquisitions like Deltak and Electronic Learning Systems (ELS), this is not a brilliant set of figures. Hopefully post-Hurricane Sandy catch-up will help, but education at Wiley is 6% off in the second quarter, and profit contribution to group is down 12% on the period. For the players who now trail Pearson round the world, these are worrying times. Have they invested enough quickly enough? And, in terms of this week’s news, have they understood enough about what has happened to their customers to stand proficiently on their heads?

Which brings us back to Knowledge Unlatched. The problem with much of our publishing effort, especially where shortrun printing was concerned, is the inability of users to be able to afford to pay the price needed to redeem the publisher’s origination costs from his initial print run. And since realistic pricing compels a fall off in volume, the problem intensifies over time. Library sales fall off as controlled budgets permit less acquisition than before, leading to the sort of lamentations in this week’s Wiley report. The whole cyclical process of decline, which has now been in progress for at least 25 years, has already driven many publishers out of areas of their traditional books business. Ten years ago we thought that print on demand would solve all of this , and in some ways it does. But it does no resolve the question of those origination costs. Take them away and the publisher could print to order. Which is where Dr Pinter comes in. If, she argues, you could put together global library consortia with block buying power, then you could, if the group were large enough, recreate eighteenth century subscription publishing and have all of the origination costs covered, for the titles selected, by the library digital purchase prior to publication. This then plays a key role in feeding (Populating) the resource development engines being created by people like Pearson. Then the publisher will be able to deal with the Long Tail by ebook or print on demand, pricing to margin on every sale. And the library will be Open Access!

I know. When you are standing on your head the blood rushes to your head as well. But think carefully about this. Contrarian business models must make sense in a business which now has no sensible business model of its own.

« go back — keep looking »

Dec

14

Lightning in the Cloud

Dec

11

Education and Knowledge – Unlatched

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin