When a movement in this sector gets a name, then it gets momentum. The classic is Web 2.0; until Tim O’Reilly invented it, no one knew what the the trends they had been following for years was called. Similarly Big Data: now we can see it in the room we know what it is for and can approach it purposefully. And we know it is an elephant in this room, for no better reason than the fact that Doug Cutting called his management and sorting system for large, various, distributed, structured and unstructured data Hadoop – after his small boy’s stuffed elephant. And this open source environment, now commercialized by Yahoo, who developed it over the previous five years on top of Google’s open source MapReduce environment, is officially named HortonWorks, in tribute to the elephant from Dr Seuss who Hadoop really was. With me so far? Ten years of development since the early years of Google, resulting in lots of ways to agglomerate, cross search and analyse very large collections of data of various types. Two elephants in the room (only really one), and it is Big Search that is leading the charge on Big Data.

So what is Big Data? Apparently, data at such a scale that its very size is the first problem encountered in handling it. And why has it become an issue? Mostly because we want to distill real intelligence from searching vast tracks of stuff, despite its configuration, but we do not necessarily want to go to the massive expense of putting it all together in one place with common structures and metadata – or ownership prevents us from doing this even if we could afford it. We have spent a decade in refining and acquiring intelligent data mining tools (the purchase of ClearForest by Reuters, as it was then, first alerted me to the implications of this trend 5 years ago). Now we have to mine and extract, using our tools and inference rules and advanced taxonomic structures to find meaning where it could not be seen before. So in one sense Big Data is like reprocessing spoil heaps from primary mining operations: we had an original purpose in assembling discreet collections of data and using them for specific purposes. Now we are going to reprocess everything together to discover fresh relationships between data elements.

What is very new about that? Nothing really. Even in this blog we discussed (https://www.davidworlock.com/2010/11/here-be-giants/) the Lexis insurance solution for risk management. We did it in workflow terms, but clearly what this is about is cross-searching and analysing data connections, where the US government files, insurer’s own client records, Experian’s data sets, and whatever Lexis has in Choicepoint and its own huge media archive, are elements in conjecturing new intelligence from well-used data content. And it should be no surprise to see Lexis launching its own open source competitor to all those elephants, blandly named Lexis HPCC.

And they are right so to do. For the pace is quickening and all around us people who can count to five are beginning to realize what the dramatic effects of adding three data sources together might be. July began with WPP launching a new company called Xaxis (http://www.xaxis.com/uk/). This operation will pool social networking  content, mobile phone and interactive TV data with purchasing and financial services content with geolocational and demographic content. Most of this is readily available without breaking even European data regulations (though it will force a number of players to re-inforce their opt-in provisos). Coverage will be widespread in Europe, North America and Australasia. The initial target is 500 million individuals, including the entire population of the UK. The objective is better ad targetting; “Xaxis streamlines and improves advertisers ability to directly target specific audiences, at scale and at lower cost than any other audience -buying solution” says its CEO. By the end of this month 13 British MPs had signed a motion opposing the venture on privacy grounds (maybe they thought of it as the poor man’s phone hacking!).

And by the end of the month Google had announced a new collaboration with SAP (http://www.sap.com/about-sap/newsroom/press-releases/press.epx?pressid=17358) to accomplish “the intuitive overlay of Enterprize data onto maps to Fuel Better Business Decisions”. SAP is enhancing its analytics packages to deal with content needed to populate locational display: the imagined scenarios here are hardly revolutionary but the impact is immense. SAP envisage telco players analysing dropped calls to locate a faulty tower, or doing risk management for mortgagers, or overlaying census data. DMGT’s revolutionary environmental risk search engine Landmark was doing this to historical land use data 15 years ago. What has changed is speed to solution, scale of operation, and availability of data filing engines, data discovery schema, and advanced analytics leading to quicker and cheaper solutions.

In one sense these moves link to this blogs perennial concern for workflow and the way content is used within it and within corporate and commercial life. In another it pushes forward the debate on Linked Data and the world of semantic analysis that we are now approaching. But my conclusion in the meanwhile is that while Big Data is a typically faddish information market concern, it should be very clearly within the ambit of each of us who looks to understand the way in which information services and their user relevance is beginning to unfold. As we go along, we shall rediscover that data has many forms, and mostly we are only dealing at present with “people and places” information. Evidential data, as in science research, poses other challenges. Workflow concentrations, as Thomson Reuters are currently building into their GRC environments, raise still more issues about relationships. At the moment we should say welcome to Big Data as a concept that needed to surface, while not losing sight of its antecedents and the lessons they teach.

For some years, strategy consultants, this writer amongst them, have talked about “migration” to the Platform. My erstwhile colleagues at Outsell have been strong on the point, but we have all of us been short of exemplars in the education sector. It has therefore been hard for industry participants to see exactly what we mean or how it might be applied. Here then is a chance to explain what I at least mean, as demonstrated by English360 (www.english360.com), a platform developed for English Language Teaching (ELT). And, with major non-educational interests (like Bertelsman this week, and News Corp when their minds are not elsewhere) thinking of education as a banker zone and seeking strategic investments this is a very pertinent area to look at, even though few current text book players seem to have got it right yet.

First of all, some preliminaries:

What is a Platform?

In most digital content marketplaces there is a need for an interface between users and technology, to allow users to manipulate content and present it to themselves, their colleagues or their learners. This may be very lite, or it could be serious industrial strength technology. Its aim is to:

Sorry to repeat what so many know already. And to do so in a context where, when we have been talking workflow/process, we have seen such a spate of examples already: GlobalSpec in engineering, AscendWorldwide in aircraft leasing; DataExlorers in equity leasing to name but three. And elsewhere, open APIs invite users to work on the vendors’ platform as a matter of course. Yet this has been rare in education, where looking at the tools needed to empower teachers has usually not survived the scorn of publishers saying that teachers will never do anything for themselves until last thing Sunday night before the Monday class.

Then again we hear that teachers have quite enough technology that they are not using – VLEs, LMS – so why add more? Well, one of the reasons why that technology is not working, in the UK at least, is that the level of digital literacy amongst teachers is often lower than their pupils, and the technologies installed in every school and classroom in the UK are high level and require a thin platform to give an intuitive interface to teaching tasks like lesson preparation, individualized learner guidance and diagnostic assessment. This is not altered by the fact that a institution has Moodle or that its VLE is stuffed to the brim with unused content or lesson plans created by teachers in previous years (but not updated).

English 360 is a cogent demonstration of this interfacing platform role. I even forgive them for talking about blended learning, since I see how desperately they are trying to de-ice these concepts from the prejudicial beliefs of publishers and teachers alike. They are providing the authoring tools required to get even less-motivated teachers into flexible course design. Through Cambridge University Press initially, and now through a widening range of published materials, they are adding digital learning objects to allow for the construction of contextualised and personalised learning, and they include all the collaborative tools needed to enable learners to learn together, which remains one of the most successful learning strategies, and one which the internet enhances considerably. And in terms of personalised learning they add the tags that reflect the diagnostic readings made by the platform, and which enable users to follow remedial pathways designed to correct their mistakes and support their weaknesses.

When I was a publisher responsible for an ELT list (now lost in the mists of time, fortunately!), there was ELT and ESP – English for special purposes like the oil industry. And the problem of both was that teachers came from all backgrounds (and sometimes none) and learners were equally fragmented by time, place and purpose. Now everything can be treated as ESP, since platform publishing allows special vocabularies or learning content to be available in the same context as basic language learning. It seems to me that English 360 drives beyond the only comparable play in this sector (Macmillan’s Campus English) and not only do I think that publisher’s should invest content in it, but I think they should licence it and use it as a white label environment for branding their approach to digital ELT. Then they can drive it towards mobile platforms, the future focus of many ELT markets, and even (just imagine, collaborative publishers!) share content to create learning solutions.

Of course, outside of ELT some publishers are already getting platform savvy in education. Pearson always has been, and it is interesting to see how MyLab has developed as an all markets vehicle. A dramatic late convert now rushing into the front line is McGraw-Hill: on 18 July they announced the launch of McGraw-Hill Campus (www.mhcampus.com) in higher education, allowing all of their disaggregated content to be used in any LMS – “a universal solution for any institutions LMS”. And McGraw has Connect as a learning platform and Create as a publishing tool already established in these markets. But still there is little progress in K-12, and many seem to see Textbooks on a Kindle as a revolution. Of course it is in one sense – it revolutionizes publisher textbook margins downwards and further complicates the rental market. But it is facsimile, not change. Until we believe that Blended is over and Textbook is dead, it is really hard to reinvent. Which is why English 360 is so welcome.

keep looking »