Archive for the ‘NLP’ Tag

Scientific Publishing: Disruption and Semantic Build-Up

Abstract:
A new technology paired with a viable business model will have disruptive impact on incumbent companies in a specific market, if they do not reevaluate and update their business models accordingly. As the Internet matures, Semantic Web technologies enable applications for meaning-based and dynamic filtering and processing of information, which has a disruptive impact on scientific publishing. This article calls for publishers to adopt semantic technologies and emphasises the “need to include a semantic strategy in their business models” (Hawkins 2009). With a focus on journals as the ‘cash cow’ of scientific publishers, it assembles debates about disruption and general tendencies in scientific publishing. An introduction to Semantic Web, Text Mining and Semantic Publishing is given as well as various examples of product developments, company partnerships and acquisitions related to semantic technologies. Finally, different ways of acquiring semantic annotation data and financial aspects of semantic enhancements are discussed.

Read the full article following this link: Scientific Publishing: Disruption and Semantic Build-Up

The article was also published last week in: LOGOS: The Journal of the World Book Community, Volume 20, pp. 184-198 (DOI: 10.1163/095796509X12777334632744).

Following are a few minor errors which couldn’t be incorporated before publication:

1) One reference is missing:

Shotton D., Portwin K., Klyne G., Miles A., 2009b. Adventures in semantic publishing: exemplar semantic enhancement of a research article.  PLoS Computational Biology 5: e1000361. Available at: http://dx.doi.org/10.1371/journal.pcbi.1000361.

2) p 187, para 3, reference is wrong:

“Phillips (2009) in The Future of Journal Publishing quotes Brunelle’s (2006)…” should read instead “Phillips (2009) in Business models in journal publishing quotes Brunelle’s (2006)…“.

3) p 191, last-but-one para, three times the same reference is wrong and should read instead:

“NetBase is licensing its technology to drive Elsevier’s illumin8 product, an R&D research support tool sold via a subscription model (Pollock 2008b). The key differentiator of NetBase is that by not relying on taxonomies “it scales across subject areas with no need for investment in domain expertise” (ibid). Pollock (ibid) wonders whether…”

4) p 193, para 3 should read:

“Although there are all sorts of workflow tools benefiting from semantic technologies, the focus in this article will be on tools for acquiring annotation data.“

Scientific Publishing: Disruption and Semantic Build-Up (Part 1)

Update 14 August 2010:
Read the full article following this link: Scientific Publishing: Disruption and Semantic Build-Up

Introduction

While there are ongoing debates whether scientific publishing is in a process of disruption it seems to be undisputed that profound changes are on the way. In this first part of an article I will recall the debates about disruption and explore the changes and tendencies in scientific publishing. I will finish with a brief introduction to Semantic Web and Semantic Publishing as one way for publishers to leverage upcoming business opportunities. In the recommendations chapter I will prefigure what the second part of this article will be about.

Disruptive Innovation

According to Clayton Christensen (2008), disruptive innovation makes a product simpler and more affordable. Disruptive innovation comprises an enabling technology and a business model that can deliver this solution more cost effectively. Such innovations have disruptive effects on established companies, as managers tend to compare investing in a new business model (full cost) with leveraging what is already in place (marginal cost). This causes them to think that business model innovation is not attractive. New entrants in contrast, without a comparison, create what needs to be created. (Christensen 2008)

Concerning the implications of the advanced process of scientific literature becoming available in digital formats, Bruck (2008) mentions P2P networks, Open Access / open archive publishing and intercommunity trading as “challenges” for publishers.

Cope and Kalantzis (2009) in Signs of epistemic disruption: transformation in the knowledge system of the academic journal describe “disruptions of scholarly work”. For example, they suggest that pre-publications erode the significance of post-publication. In some areas, conference proceedings, for their immediacy, and reports become more important than journal articles, and authors and institutions insist that articles be published in their own institutional repositories or on personal websites – “legally or illegally, with or without reference to the publishing agreement they have signed.” (ibid) Cope and Kalantzis identify as further drivers of disruption that knowledge these days is produced by a whole host of organisations, and more knowledge is produced within the networked interstices of the Social Web where amateurs mingle with professionals.

David Bousfield (2009), Vice President and Lead Analyst of research and advisory firm Outsell, mentions the first of four “disruptive forces“ for the STM market as Open Access. He notes: “Springer’s purchase of BioMed Central and the launch of Nature’s Communications [journal] both represent significant landmarks in the adoption of this disruptive business model by for-profit publishers.” The Open Access “business model that has infected mainstream STM publishing is working its way through legal, tax, and regulatory content, and also permeates the co-creation of news and market research” (Stratigos et al. 2009). Also, funding institutions increasingly demand an Open Access approach (Lunn 2010).

The commercial Open Access market “accounts for 3.3 per cent of the total journal publishing market, growing at 11.3 per cent per year”. Although significantly mitigated by other trends in R&D spending Open Access will reduce revenues to the primary journal publishing market, and in the event of widespread take-up will shrink the market value by estimated 57 per cent” (Pollock 2009).

Michael Nielsen (2009) in his article “Is scientific publishing about to be disrupted?” claims “that scientific publishing is in the early days of a major disruption” and that “those publishers that don’t become technology driven will die off”.

Michael Clarke (2010), asking why scientific publishing has not been disrupted already, examines the potential for disruption by listing five functions of journals. He asserts that beyond dissemination and registration, for which journals are no longer needed, there are three additional functions that journals serve which have developed over time: validation, filtration and designation. Regarding validation Clarke writes: “To date, no one has succeeded in developing a literature peer-review system independent of journal publication”. Concerning filtration, various new tools, instead of replacing journals, “rely on the filtration provided by journals” (ibid). Clarke continues:

“While there is the possibility that recent semantic technologies will be able to provide increasingly sophisticated filtering capabilities, these technologies are largely predicated on journal publishers providing semantic context to the content they publish. In other words, as more sophisticated filtering systems are developed – they tend to augment, not disrupt, the existing journal publication system.”

As funding and career advancement decisions are based on scientists’ publication record (designation), Clarke, at best, sees a shift away from journal toward article-based metrics. But even then, “change would likely be incremental rather than disruptive” and such a transition “would likely be measured not in years but in decades”. (ibid) Finally, he argues that the main reason why the latter three functions were not easily replaced is that they are not technology-driven but “cultural functions” and therefore not vulnerable to disruption.

Also concerning publishers’ value proposition filtration, Hagenhoff (2006) acknowledges that progress in metadata, harvesting and Semantic Web technologies enable increasingly reliable selection and aggregation of research papers, and therefore the commercial publishers are no longer needed. Accordingly, Tim Berners-Lee (n.d.) explicitly describes the disruptive impact of semantic technologies on scientific publishing.

Bernard Lunn (2010) assumes scientific publishing is not YET disrupted and lists the key elements of the STM process according to the journal functions outlined by Clarke. First copy costs are approximately 80 per cent, 96 per cent of articles are available electronically, and “Companies like Wiley, McGraw-Hill, Elsevier, Wolters Kluwer and Springer … [are] in good financial health” (ibid). Therefore, the Internet’s superiority regarding distribution is not the disruptive force in STM publishing. Also, innovative technologies for registration, validation (peer review) and filtration are not main drivers for disruption in Lunn’s view. In contrast to Clarke, Lunn believes the element

“designation … where the researcher gets credit … may be the main impediment to disruptive change. … Journals have a power law distribution, like a network effect. The best journals attract the best articles, which have the biggest impact on academic reputation and so on. … But we see the same power law distribution in social networks. … As these peer networks do not require the intermediation of a journal brand, they are fundamentally disruptive.” (ibid)

Furthermore, Lunn writes, in some disciplines, recognition with an Open Access system may start to have a serious impact on academic reputation (designation) (ibid).

Although there are good reasons to deploy semantic technologies predicated on journal publishers’ content as Clarke suggests, there is no reason why there should not be a more efficient filtration system not predicated on journals and metadata provided by publishers, by acquiring metadata from other sources and in other ways through authors’ and readers’ ‘contribution’. With Open Access paving the way, it is also not evident that a new system would even need publishers’ acquiescence for leveraging their assets. Semantic filtration technologies not only “augment” the journal publishing system, but also will have disruptive effects on journal publishers’ offerings.

Also with Julia Lane’s (2010) recall of flaws of existing metrics and suggestions for “measuring ALL activities that make up academic productivity” to “make science metrics more scientific” Clark’s affirmation of funding criteria is not entirely convincing.

His notion that filtration and designation are not technology-driven but “cultural” functions is also unconvincing as a culturally-driven function may also be technology-driven. A culture mediated by technology is prone to be disrupted regarding the way the culture is mediated – and the mediation is part of journal publishers’ business.

Because scientific search tools are improving, more scientists will publish Open Access articles (Lunn 2010). It can be argued that publishers’ paid content business models are kept alive because distribution of scientific literature is controlled by them due to distribution being bundled to the publishers’ other value propositions filtration and designation, which are not yet undermined by new technologies. Hence, Open Access beyond author-fee models could take off just as Open Access content is available in formats allowing filtration and designation to be done by machines and new hybrid solutions based on semantic technologies. Thereby, semantic technologies could unfold a disruptive potential also regarding the distribution aspect of the publishers’ work and their paid content business models by enforcing adoption of Open Access.

Summarising, it can be said that a significant decline in revenues in journal publishing in the coming years is realistic. Thereby, advanced filtration technologies, social networking services and Open Access models can be identified as drivers for disruption. Accordingly, Lunn’s (2010) conclusion “that we are on the cusp of disruptive change and that it will be brought on by the implementation of social networking and semantic technology” might be a good starting point for further assessments.

Where the Value Goes

Phillips (2009) in The Future of Journal Publishing quotes Brunelle’s (2006) report emphasising a “basic shift in business models that is mandated by a move from a journal economy of scarcity (print world) to a journal economy of plenty (online world)” with completely new players flooding the market with free content. Semantic technologies allow grey literature to become more visible while the increasing amount of freely available data will trigger higher demand for science services (Hagenhoff 2006).

In the same time the peer review article becomes less relevant.

“Grant allocating bodies and researchers themselves rely on the primary research output data rather than text as the main means for evaluation … The research article often is the gateway into a world of simulations, data analysis, modelling etc. … It has ‘links’ to similar databases, to bibliographic databases, has links to images, maps and structures … but it becomes less essential as standalone entity.” (Brown & Boulderstone 2008)

Reasons why journals will be less satisfying in the future are the increasing speed of research, the static character of the journal as well as the fact that it is a single mode of communication and relatively isolated (Morris 2009). In the domain of “economics, top authors are moving away from top journals altogether” (Ellison 2007). The concept of an “article within an issue within a journal becomes redundant. Instead users will ‘subscribe’ to those items that are specifically relevant to their needs irrespective of source” (Brown & Boulderstone 2008). Likewise the proportion of pure data and the importance of data publishers such as WesternGeco have risen significantly during the last few years (ibid).

Clarke (2010), although opposing the notion of a disruption process in scientific publishing, acknowledges that “new technologies are opening the door for entirely new products and services built on top of – and adjacent to – the existing scientific publishing system”. Beside mobile technologies and Open Data standards he lists semantic technologies as particularly promising. Citing King and Tenopir (2000) he explains:

“the cost of journals is small relative to the cost, as measured in the time of researchers, of reading and otherwise searching for information … Which is to say that the value to an institution of workflow applications powered by semantic and mobile technologies and interoperable linked data sets may exceed that of scientific journals. If such applications can save researchers … significant amounts of time, their institutions will be willing to pay for that time savings and its concatenate increase in productivity.”

Clarke also confirms that there will be “a downward pressure on journal pricing” and he refers to Nielsen, emphasizing “that acquiring expertise in information technology (and especially semantic technology) – as opposed to production technology – is of critical importance to scientific publishers”.

Also Bousfield (2009) emphasises a trend that publishers

“move up the value chain. … Elsevier, Thomson Reuters, and Wolters Kluwer Health are rapidly diversifying their STM publishing divisions in order to add more value to their offerings … They are all moving away from traditional article publishing into areas that require enterprise scale content aggregation and analysis”.

Products based on data mining flourish with Open Access licences, and repository services such as “abstracting and indexing, semantic search and discovery tools, and new ways of presenting the scholarly article … all can add value and enable publishers to charge for their services” (Pollock 2009).

Scientific publishing is shifting from selling static pieces of content toward access centred models for dynamic content in multiple formats coupled with value added services. It also has been indicated that the boundaries between resources themselves and discovery services are increasingly permeable (Brown & Boulderstone 2008). We

“will witness a strengthening in secondary information systems over primary publications, with a significant growth in A&I [abstracting and indexing] platforms over the next 3-5 years. Services such as Science Direct (Elsevier) and Web of Science (Thomson) are paving the way” (ibid).

In this context the “extend of value add provided will determine price and market acceptability … [and] larger publishers in particular are looking at redefining their business and moving from a content-focus to a service orientation”. (ibid) Publishers are becoming information solutions providers and scientific support service providers.

Whether through a disruptive change or an incremental process Open Access is on the rise, and value is migrating away from journals and content altogether toward technology (Nielsen 2009) and services related to the scientific process (Brown & Boulderstone 2008). David Shotton predicts for the coming decade a decrease in the value of raw text while the value of semantic services that help readers to find actionable data, interpret information and to extract knowledge will increase. (Shotton 2009)

Generally there are four main areas of growth for scientific information markets: emerging markets (in particular BRIC countries), mobile content and services, the non-library audience and the extensive-knowledge worker sector (those people outside the research institutional environment) (Brown & Boulderstone 2008) and finally peer networks and Social Semantic Web services. All four of these areas of growth comprise tendencies of a ‘journal economy of plenty’, decline of importance of the journal article and value shifting toward workflow applications and value added services related to scientific content – with semantic technologies playing a pivotal role.

Semantic Web

According to the World Wide Web Consortium (W3C) the main goals of the Semantic Web is to extend the principles of the Web from documents to data. Data should be related to one another to be shared and reused across applications and to reveal possible new relationships. In order to achieve these goals, the most important is to be able to define and describe the relations among data between any two resources (W3C 2009). It is all about common formats for integration and combination of data drawn from diverse sources and about a language for recording how the data relates to real world objects. “That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing” (W3C 2010). In other words, the “markup” language behind each Web page would be cross-referenced into countless other databases, once developers agreed on a common set of definitions. (Shannon 2006)

Semantic Publishing

To date the Semantic Web is still in its infancy and most publishers just start to implement semantic technologies. David Shotton and his team handcrafted a semantically enhanced scientific article (see reference R. B. Reis et al. 2008) to demonstrate the potential of semantic technologies. In his paper Semantic publishing: the coming revolution in scientific journal publishing Shotton (2009) defines Semantic Publishing as

“anything that enhances the meaning of a published journal article, facilitates its automated discovery, enables its linking to semantically related articles, provides access to data within the article in actionable form, or facilitates integration of data between papers. Among other things, it involves enriching the article with appropriate metadata that are amenable to automated processing and analysis, allowing enhanced verifiability of published information and providing the capacity for automated discovery and summarization. These semantic enhancements increase the intrinsic value of journal articles, by increasing the ease by which information, understanding and knowledge can be extracted. They also enable the development of secondary services that can integrate information between such enhanced articles” (ibid).

Conclusion

Given the likelihood of diminishing revenues in journal publishing within traditional business models and ongoing value migration toward science support services Semantic Publishing as a set of enhanced content features and value added services represents a major opportunity for scientific publishers to leverage additional and nascent business opportunities (ibid).

Recommendations

For further research it is recommended to assess semantic technologies as well as existing products and services for scientific information based on these technologies. In particular social and mobile semantic services should be examined as, beside emerging markets and non-library audiences, sizeable new income streams can be expected in those fields. Finally, it is recommended to examine partnerships and acquisition activities related to semantic technologies. Additionally, the wider effects of Semantic Publishing on innovation and generativity in science is a topic worth being explored.

References

Berners-Lee, T., Scientific publishing on the ‘semantic web’. Available at: http://www.nature.com/nature/debates/e-access/Articles/bernerslee.htm [Accessed May 24, 2010].

Bousfield, D., 2009. Scientific, Technical & Medical Information: 2009 Market Forecast and Trends Report, Outsell, Inc.

Brown, D. & Boulderstone, R., 2008. The Impact of Electronic Publishing : The Future for Publishers and Librarians, München: K.G. Saur. Available at: http://books.google.de/books?hl=de&lr=&id=lpr0EV0JvzwC&oi=fnd&pg=PR15&dq=%22The+Impact+of+Electronic+Publishing+%22+autor:David+autor:J+autor:Brown&ots=0_KAZ1MN3N&sig=jg_RdLEHMo2wjcYDH-Lz1vTIrOk#v=onepage&q=&f=false.

Bruck, P., 2008. Multimedia and E-Content Trends : Implications for Academia. 1st ed., Wiesbaden: Vieweg Teubner.

Brunelle, B., 2006. Publishers Speak Up On Open Access: Big Promise, Small Uptake, Outsell, Inc.

Christensen, C.M., 2008. Reinventing Your Business Model. Harvard Business IdeaCast 122. Available at: http://itunes.apple.com/podcast/harvard-business-ideacast/id152022135 [Accessed February 23, 2010].

Clarke, M., 2010. Why Hasn’t Scientific Publishing Been Disrupted Already? The Scholarly Kitchen. Available at: http://scholarlykitchen.sspnet.org/2010/01/04/why-hasnt-scientific-publishing-been-disrupted-already/ [Accessed April 13, 2010].

Cope, B. & Kalantzis, M., 2009. Signs of epistemic disruption: transformation in the knowledge system of the academic journal. In: The Future of the Academic Journal, Oxford: Chandos.

Ellison, G., 2007. Is Peer Review in Decline? NBER Working Paper. Available at: http://www.nber.org/papers/w13272.pdf.

Hagenhoff, S., 2006. Internetökonomie der Medienbranche, Göttingen: Universitätsverlag Göttingen.

Lane, J., 2010. Let’s make science metrics more scientific. Nature, 464(7288), 488-489.

Lunn, B., 2010. Semantic Wave Hits STM Publishing, Part 1: Current Cash Cows. Semantic Web. Available at: http://www.semanticweb.com/features/semantic_wave_hits_stm_publishing_part_1_current_cash_cows_154355.asp [Accessed April 13, 2010].

Morris, S., 2009. ‘The tiger in the corner’: will journal matter to tomorrows scholars. In: The future of the academic journal. Oxford: Chandos.

Nielsen, M., 2009. Is scientific publishing about to be disrupted? Michael Nielsen. Available at: http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/ [Accessed April 13, 2010].

Phillips, A., 2009. Business models in journal publishing. In: The Future of the Academic Journal. Oxford: Chandos.

Pollock, D., 2009. An Open Access Primer – Market Size and Trends, Outsell, Inc.

Reis, R.B. et al., 2008. Impact of Environment and Social Gradient on Leptospira Infection in Urban Slums. R. E. Gurtler, ed. PLoS Neglected Tropical Diseases, 2(4), e228.

Rotman Epps, S., 2009. Eight Models For Monetizing Digital Content, Forrester Research.

Shannon, V., 2006. A ‘more revolutionary’ Web. The New York Times. Available at: http://www.nytimes.com/2006/05/23/technology/23iht-web.html?_r=2 [Accessed April 13, 2010].

Shotton, D., 2009. Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing, 22(2), 85-94.

Stratigos, A.C., Strohlein, M. & Watson Healy, L., 2009. Information Industry Outlook 2010: A New Dawn, New Day, New Decade, Outsell, Inc.

Tenopir, C., 2000. Towards electronic journals : realities for scientists, librarians, and publishers, Washington  DC: Special Libraries Association.

W3C, 2010. W3C Semantic Web Activity. Available at: http://www.w3.org/2001/sw/ [Accessed April 13, 2010].

W3C, 2009. W3C Semantic Web FAQ. Available at: http://www.w3.org/2001/sw/SW-FAQ#whatarebuildingblocks [Accessed April 13, 2010].

Follow

Get every new post delivered to your Inbox.