The Odd Future of Aggregation

Filed Under B2B, Big Data, Blog, data analytics, Education, eLearning, Financial services, healthcare, Industry Analysis, internet, mobile content, news media, Publishing, Search, semantic web, social media, STM, Workflow | 1 Comment

Mr Bezos and his cohorts at WaPo (the Washington Post to you and I, earthlings) have decided, we are told, on an aggregation model. As far as skilled translators of Bezos-speak can tell, this will mean bringing together licensed news content from all over the globe – spicy bits of the Guardian, a thin but meaty slice of the London Times, a translated and processed sausage segment of FAZ, a little sauerkraut from Bild, a fricasse of Le Monde… well, you get the picture, and the fact that I am writing this before dinner. These ingredients will get poured into a WaPo membership pot, heated and served to members who want to feel that they are on top of global goings-on, from the horses mouth, and without having to read the endless recyclings and repetitions which characterize the world media at source.

Well, I see the point, and the idea of membership and participation seems to me one which has endless energy these days. But I have been thinking for several years now that the Aggregation business model as experienced from 1993 onwards on the Web is on its last legs. Believing that “curation” is too often a word which we use when we are trying to maintain or defend a job, I have tried to steer clear of imagining that storage, the ultimate network commodity, was a good place to start building a business. In the early days of the Web it was certainly different. Then we could create the whole idea of the “one stop shop” as a way of simplifying access and reducing friction for users. All of the things we were collecting and storing, for the purposes of aggregation, were in fact “documents”, and their owners wanted them to be stored and owned as documents, bought as documents and downloaded as documents. The early reluctance of STM publishers to apply DOI identity beyond the article level and make citations, or references or other document sub-divisions separately searchable seems in retrospect to demonstrate the willingness of IP owners to manipulate the access to protect the business model.

Three clear developments have comprehensively undermined the utility of content aggregation:

* the desire of users to move seamlessly from one part of one document through a link to another part of a different document seems to them a natural expression of their existence as Web users – and in the content industries we encouraged this belief.
* the availability of search tools in the Web which permit this self-expression simply raises the frustration level when content is locked away behind subscription walls, and increases the likelihood that such content will be outed to the Open web.
* the increasing use of semantic analysis and the huge extension of connectivity and discoverability which it suggests makes the idea that we need to collect all or sufficient content into a storehouse and define it as a utility for users just by the act of inclusion a very outdated notion indeed.

It seems to me that for the past decade the owners of major service centres in the aggregation game – think Nexis, or Factiva, or Gale or ProQuest – have all at various times felt a shiver of apprehension about where all of this is going, but with sufficient institutional customers thinking that it is easier to renew than rethink, the whole aggregation game has gone gently onwards, not growing very much, but not declining either. And while this marriage of convenience between vendors and payers gives stability, end users are getting frustrated by a bounded Web world which increasingly does not do what it says on the tin. And since the Web is not the only network service game in town, innovators look at what they might do elsewhere on internet infrastructure.

So, if content aggregation seems old-fashioned, will it be superseded by service aggregation, creating cloud-based communities of shared interests and shared/rented software toolsets? In one sense we see these in the Cloud already, as groups within Salesforce for example, begin to move from a tool-using environment to user-generated content and more recently the licensing of third party content. This is not simply, though, a new aggregation point, since the content externally acquired is now framed and referenced by the context in which users have used and commented upon it. Indeed, with varying degrees of enthusiasm, all of the great Aggregators mentioned above have sought to add tools to their armoury of services, but usually find that this is the wrong way round – the software must first enhance the end user performance, then lend itself to community exploitation – and then you add the rich beef stock of content. For me, Yahoo were the guys who got it right this week when they bought Vizify (www.vizify.com), a new way of visualizing data derived from social media. This expresses where we are far more accurately than the lauded success of Piano Media (www.pianomedia.com). I am all for software companies emerging as sector specialists from Slovakia onto a world stage, but the fact that there is a whole industry, exemplified by Newsweek’s adoption of Piano this week, concerned with building higher and harder paywalls instead of climbing up the service ladder to higher value seems to me faintly depressing.

And, of course, Mr Bezos may be right. He has a good track record in this regard. And I am told that there is great VC interest in “new” news: Buzzfeed $46m; Vox $80 m; Business Insider $30m, including a further $12m last week: Upworthy $12 m. Yet I still think that the future is distributed, that the collection aggregation has a sell-by date, and that the WaPo membership could be the membership that enables me to discover the opinions of the world rather than the news through a smartly specialized search tool that exposed editorial opinion and thinking – and saved us from the drug of our times – yet more syndicated news!

Jan

9

Post-Pub and Preprint -The Science Publishing Muddle

Filed Under B2B, Big Data, Blog, data analytics, healthcare, Industry Analysis, internet, Publishing, Reed Elsevier, Search, semantic web, STM, Uncategorized, Workflow | 2 Comments

New announcements in science publishing are falling faster than snowflakes in Minnesota this week, and it would be a brave individual who claimed to be on top of a trend here. I took strength from Tracy Vence’s review, The Year in Science Publishing (www.the-scientist.com), since it did not mention a single publisher, confirming my feeling that we are all off the pace in the commercial sector. But it did mention the rise, or resurrection, of “pre-print servers” (now an odd expression, since no one has printed anything since Professor Harnad was a small boy, but a way of pointing out that PeerJ’s PrePrints and Cold Spring Harbor’s bioRxiv are becoming quick and favourite ways for life sciences researchers to get the data out there and into the blood stream of scholarly communication). And Ms Vence clearly sees the launch of NCBI’s PubMed Commons as the event of the year, confirming the trend towards post-publication peer review. Just as I was absorbing that I also noticed that F1000, which seems to me to still be the pacemaker, had just recorded its 150,000th article recommendation (and a very interesting piece it was about the effect of fish oil on allergic sensitization, but please do not make me digress…)

The important things about the trend to post-publication peer review are all about the data. Both F1000 and PubMed Commons demand the deposit or availability of the experimental data alongside the article and I suspect that this will be a real factor in determining how these services grow. With reviewers looking at the data as well as the article, comparisons are already being drawn with other researcher’s findings, as well as evidential data throwing up connections that do not appear if the article alone is searched in the data analysis. F1000Prime now has 6000 leading scientists in its Faculty (including two who received Nobel prizes in 2013) and a further 5000 associates, but there must be questions still about the scalability of the model. And about its openness. One of the reasons why F1000 is the poster child of post publication peer review is that everything is open (or, as they say in these parts, Open). PubMed Commons on the other hand has followed the lead of PeerJ’s PubPeer, and demanded strict anonymity for reviewers. While this follows the lead of the traditional publishing model it does not allow the great benefit of F1000: if you know who you respect and whose research matters to you, then you also want to know what they think is important in terms of new contributions. The PubPeer folk are quoted in The Scientist as saying in justification that “A negative reaction to criticism by somebody reviewing your paper, grant or job application can spell the end of your career.” But didn’t that happen anyway despite blind, double blind, triple blind and even SI (Slightly Intoxicated) peer reviewing?

And surely we now know so much about who reads what, who cites what and who quotes what that this anonymity seems out of place, part of the old lost world of journal brands and Open Access. The major commercial players, judging by their announcements as we were all still digesting turkey, see where the game is going and want to keep alongside it, though they will farm the cash cows until they are dry. Take Wiley (www.wiley.com/WileyCDA/pressrelease), for example, whose fascinating joint venture with Knode was announced yesterday. This sees the creation of a Knode – powered analytics platform provided as a Learned Society and industrial research service, allowing Wiley to deploy “20 million documents and millions of expert profiles” to provide society executives and institutional research managers with “aggregated views of research expertise and beyond”. Anyone want to be anonymous here? Probably not, since this is a way of recognizing expertise for projects, research grants and jobs!

And, of course, Elsevier can use Mendeley as a guide to what is being read and by whom. Their press release (7 January) points to the regeneration of the SciVal services, “providing dynamic real-time analytics and insights into the… (Guess What?)… Global Research Landscape”. The objective here is one dear to governments in the developed world for years – to help research management to benchmark themselves and their departments such that they know how they rank and where it will be most fruitful to specialize. So we seem to be quite predictably entering an age where time to read is coming under pressure from volumes of available research articles and evidential data, so it is vital to know, and know quickly, what is important, who rates it, and where to put the most valuable departmental resources – time and attention-span. And Elsevier really do have the data and the experience to do this job. Their Scopus database of indexed abstracts all purpose written to the same taxonomic standard now covers some 21,000 journals from over 5000 publishers. No one else has this scale.

The road to scientific communication as an open and not a disguised form of reputation management will have some potholes of course. CERN found one, well-reported in Nature’s News on 7 January (www.nature.com/news under the headline “Particle Physics papers set free”. CERN’s plan to use its SCOAP project to save participating libraries money, which was then to be disbursed to force journals to go Open Access met resistance, but from the APS, rather than the for profit sector. Meanwhile the Guardian published a long article (http://www.theguardian.com/science/occams-corner/2014/jan/06/radical-changes-science-publishing-randy-schekman) arguing against the views of Nobel laureate Dr Randy Schekman, the proponent of boycotts and bans for leading journals and supporters of impact factor measurement. Perhaps he had a bad reputation management experience on the way to the top? The author, Steve Caplan, comes out in favour of those traditional things (big brands and impact factors), but describes their practises in a way which would encourage an un-informed reader to support a ban! More valuably, the Library Journal (www.libraryjournal.com/2014/01) reports this month on an AAP study of the half-life of articles. Since this was done by Phil Davis it is worth some serious attention, and the question is becoming vital – how long does it take for an article to reach half of the audience who will download it in its lifetime? Predictably the early results are all over the map: health sciences are quick (6-12 months) but maths and physics, as well as the humanities, have long duration half lives. So this is another log on the fire of argument between publishers and funders on the length of Green OA embargoes. This problem would not exist of course in a world that moved to self-publishing and post-publication peer review!

POSTSCRIPT For the data trolls who pass this way: The Elsevier SciVal work mentioned here is powered by HPCC (High Power Computing Cluster), now an Open Source Big Data analytics engine, but created for and by LexisNexis Risk to manage their massive data analytics tasks as Choicepoint was absorbed and they set about creating the risk assessment system that now predominates in US domestic insurance markets. It is rare indeed in major information players to see technology and expertise developed in one area used in another, though of course we all think it should be easy.

« go back — keep looking »

Mar

23

The Odd Future of Aggregation

Jan

9

Post-Pub and Preprint -The Science Publishing Muddle

Search

Recently Written

Categories

Archives

Blogroll

Links

Share & Subscribe

Admin