Acquisition + Collaboration = Complete?

Filed Under B2B, Big Data, eBook, Industry Analysis, internet, Reed Elsevier, STM, Uncategorized | 1 Comment

Sit down to read this with the mind of a research engineer in the public or the private sector. On the screen in front of you there are links to the foremost research resources that you are likely to use in everyday life. Behind them are other links to a host of services that you may use. Above all, you want to be able to search this corpus of knowledge as an entity, and you want the alerts and intelligence services that you use to reflect updates and developments across the entire waterfront of engineering knowledge. And the data types are pretty different. Some is classic data, and may occur in the evidential material that underlies academic research, or in reports and findings on performance or failure of materials. Other information exists as design specifications, or patents, or standards or as structured academic articles or ebooks. Some exists in index entries and as citations or references. Still more is available online in newspaper files, video archives, blogs, tweets and magazine morgues. Engineering research was never easier, but is still not easy. And few subjects are as fragmented as engineering – or have a more important task than ensuring that knowledge is shared across those fragmentations when necessary, for the sake of progress, and the health and safety of everyone. Here is a classic Big data argument waiting to be made.

Yet as it came to the Web few areas were more diverse than engineering. Despite the early attempts of Engineering Village (later bought by Elsevier), it was not until Warburg Pincus funded GlobalSpec that real vertical search arrived, and with it the focus on a huge user-contributed library of specifications. This service is now owned by IHS, who are able to align with it their equally vast collections of patents and standards. So is this the staring place for all enquiry, given that GlobalSpec also indexes the content of vastly authoritative sources like IEEE. Well, almost – but the academic articles remain in the locked service environments of journal publishers like Elsevier, the leading player in this field. So we still have to sign up for all those journals wherever they are published? Well, yes, until yesterday, that is, when Elsevier announced the acquisition of Knovel (http://www.elsevier.com/about/press-releases/corporate/elsevier-acquires-knovel,-provider-of-web-based-productivity-application-for-the-engineering-community). Knovel indexes all of the 100 professional and scholarly journal publishers in this sector, including IET. It is a fast expanding online source which claims to have added 20% more data in the past year. So what we now need on our engineer’s dashboard now?

Well, we certainly need GlobalSpec/IHS, with links to IEEE, and we certainly need Elsevier/Knovel, with links to ScienceDirect, but wouldn’t it be better to have a single access and complete cross-search in a Big Data context? Just a minute, though. Way back in 2006 a really good database visionary called Scott Virkler, then VP Business Development at GlobalSpec, helped to put in place a strategic collaboration with Elsevier, and after that became Elsevier’s VP of search strategy. So are those links still in place? And can you easily cross search all of these files from one place as Scott undoubtedly intended? I ask because it seems to me that consolidation and collaboration is the name of the game, and the game need have no losers. Alexander van Boetzelaer, who runs the corporate markets sector of Elsevier, has a fine record in collaboration. He and his team created GeoFacets, for the oil and gas industry, and IHS was one of their partners in doing that. But in order for collaboration to work partners have to be determined to make it work, creating interfaces with shared ownership, developing ways of exchanging user-derived data, and sharing marketing efforts and knowledge where necessary. There is still a tendency for collaboration to develop a market of two – and then end in a situation which is just one step away from what users really want.

All these takes time, and since it is over a decade since Elsevier invested in Engineering Village we appear to have plenty of that. Knovel was not even founded then, but it now amounts to a very considerable step forward in Elsevier’s further work with corporate markets. It claims 700 corporate customers and will add real muscle to the corporate markets drive at Elsevier, but we need to bear in mind that acquisition is no longer what it once was in the major market players in information. Thankfully we have matured from the 1990s, when it was about corporate ego and machismo when it was not driven by a desire to hoover up all the proprietory content in the sector. Now we know that content is not king, we can buy securely in the search to create more marketing connections while developing premium vale added services designed, whether collaboratively or not, towards the complete satisfaction of the customer service need. And all that Knovel data and all that GlobalSpec data will not do that in separate containers unless they can be combined and intermixed in the user’s workflow. The next chapter here is the next level of service development, and, given the differentiation of their resources and the fragmentation of the market, it seems to me unlikely that either Elsevier or IHS can do this alone in engineering. There was never a better moment, as in many markets, for talking to the apparent, but unreal, competitor.

Dec

14

Lightning in the Cloud

Filed Under B2B, Big Data, Blog, data protection, Financial services, Industry Analysis, internet, news media, Publishing, Search, semantic web, social media, STM, Uncategorized, Workflow | Leave a Comment

So have we all got it now? When our industry (the information services and content provision businesses, sometimes erroneously known as the data industry) started talking about something called Big Data, it was self-consciously re-inventing something that Big Science and Big Government had known about and practised for years. Known about and practised (especially in Big Secret Service; for SIGINT see the foot of this article) but worked upon in a “finding a needle in a haystack” context. The importance of this only revealed itself when I found myself at a UK Government Science and Technology Facilities Council at the Daresbury Laboratory in he north of England earlier this month. I went because my friends at MarkLogic were one of the sponsors, and spending a day with 70 or so research scientists gives more insight on customer behaviour than going to any great STM conference you may care to name. I went because you cannot see the centre until you get to the edge, and sitting amongst perfectly regular normal folk who spoke of computing in yottaflops (processing per second speeds of 10 to the power of 24) as if they were sitting in a laundromat watching the wash go round is fairly edgy for me.

We (they) spoke of data in terms of Volume, Velocity and Variety, sourced from the full gamut of output from sensor to social. And we (I) learnt a lot about the problems of storage which went well beyond the problems of a Google and a Facebook. The first speaker, from the University of Illinois, at least came from my world: Kalev Leetanu is an expert in text analytics and a member of the Heartbeat of the World Project team. The Great Twitter Heartbeat ingests Twitter traffic, sorts and codes it so that US citizens going to vote, or Hurricane Sandy respondents, can appear as geographical heatmaps trending in seconds across the geography of the USA. The SGI UV which did this work (it can ingest the printed resources of the Library of Congress in 3 seconds) linked him to the last speaker, the luminous Dr Eng Lim Goh, SVP and CTO at SGI, who gave a magnificent tour d’horizon of current computing science. His YouTube videos are as wonderful as the man himself (a good example is his 70th birthday address to Stephen Hawking, his teacher, but also look at (http://www.youtube.com/watch?v=zs6Add_-BKY). And he focussed us all on a topic not publicly addressed by the information industry as a whole: the immense distance we have travelled from “needle in a haystack” searching to our current pre-occupation with analysing the differences between two pieces of hay – and mapping the rest of the haystack in terms of those differences. For Dr Goh this resolves to the difference between arranging stored data as a cluster of nodes to working in shared memory (he spoke of 16 terabyte supernodes). As the man with the very big machine, his problems lie in energy consumption as much as anything else. In a process that seems to create a workflow that goes Ingest > Store and Organize > Analytics > Visualize (in text and graphics – like the heatmaps) the information service players seem to me to be involved at every point, not just the front end.

The largest data sourcing project on the planet was represented in the room (The SKA, or Square Kilometre Array, is a remote sensing telemetry experiment with major sites in Australia and South Africa). Of course, NASA is up there with the big players, and so are the major participants in cancer research and human genomics. But I was surprized by how Big the Big Data held by WETA Data (look at all the revolutionary special effects research at http://www.wetafx.co.nz/research) in New Zealand was, until I realised that this is a major film archive (and NBA Entertainment is up there too on the data A List) This reflects the intensity of data stored from film frame images and their associated metadata, now multiplied many times over in computer graphics – driven production. But maybe it is time now to stop talking about Big Data, the term which has enabled us to open up this discussion, and begin to reflect that everyone is a potential Big Data player. However small our core data holding may be compared to these mighty ingestors, if we put proprietory data alongside publicly sourced Open Data and customer-supplied third party data, then even very small players can experience the problems that induced the Big Data fad. Credit Benchmark, which I mentioned two weeks ago, has little data of its own: everything will be built from third party data. The great news aggregators face similar data concentration issues as their data has to be matched with third party data.

And I was still thinking this through when news came of an agreement signed by MarkLogic (www.marklogic.com) with Dow Jones on behalf of News International this week. The story was covered in interesting depth at http://semanticweb.com/with-marklogic-search-technology-factiva-enables-standardized-search-and-improved-experiences-across-dow-jones-digital-network_b33988 but the element that interested me and which highlights the theme of this note concerns the requirement not just to find the right article, but to compare articles and demonstrate relevance in a way which only a few years ago would have left us gasping. Improved taxonomic control, better ontologies and more effective search across structured and unstructured data lie at the root of this, of course, but do not forget that good results at Factiva now depend on effective Twitter and blog retrieval, and effective ways of pulling back more and more video content, starting with You Tube. The variety of forms takes us well beyond the good old days of newsprint, and underline the fact that we are all Big Data players now.

Note: Alfred Rolington, formerly CEO at Janes, will publish a long-awaited book with OUP on “Strategic Intelligencein the Twenty First Century” in January which can be pre-ordered on Amazon at http://www.amazon.co.uk/Strategic-Intelligence-21st-Century-Mosaic/dp/0199654328/ref=sr_1_1?s=books&ie=UTF8&qid=1355519331&sr=1-1. And I should declare, as usual, that I do work from time to time with the MarkLogic team, and thank them for all they have done to try to educate me.

« go back — keep looking »

Jan

8

Acquisition + Collaboration = Complete?

Dec

14

Lightning in the Cloud

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin