Session 2010-12
Publications on the internet
Written evidence submitted by Reed Elsevier plc
1. Introduction
1.1 Reed Elsevier is a world leading provider of professional information solutions in the Science, Medical, Legal, Risk and Business sectors. Our products include academic journals, books and databases, legal texts and analyses, business to business magazines and websites, information products for the banking and insurance sectors and trade exhibitions. Our professional customers use our products every day to advance science, improve medical outcomes, enable better decisions, enhance productivity, evaluate risk and gain insight. Our key brands include SciVerse ScienceDirect1, SciVerseScopus2, the Lancet, Cell, Brain Navigator3, Geofacets4, New Scientist, XpertHR5, Lexis®PSL6 and Lexis.com7.
1.2 A FTSE 100 company, we employ more than 30,000 people of whom 4,600 are in the UK, delivering £6.1 bn revenue in 2010, of which £5.6m was from sales outside the UK; we are a significant exporter. We are also a major investor in technology; 86% of our science and technology revenues and 74% of our legal and professional information revenues are derived from electronic products and services. Alongside our editors, journalists and researchers, we currently employ some 4500 IT professionals.
2. Summary of Submission
2.1 The Hargreaves Review of Intellectual Property was a welcome initiative highlighting the importance of IP to the UK economy. Our response to the Call to Evidence concentrated on copyright issues. We believe that strong copyright protection and enforcement are essential to investment and growth; we disagree with the assertion that IP needs to be weakened to support innovation. Over the last decade Reed Elsevier has invested some £2.02bn to combine technology and authoritative content to deliver enhanced functionality to the user. Some of these products are digital evolutions of existing print products, some are startups. Without IP protection these investments could not have been made and the ensuing benefits for our academic and professional customers not realized. We were therefore disappointed with the outcome of the Hargreaves Review and the Government Response which, despite claiming to understand the importance of copyright, then proceeded to propose unilaterally weakening it.
2.2 In this submission we focus in particular on the proposal for new UK and EU exceptions to copyright for text and data mining8 which we view as unnecessary, premature and potentially damaging to the future competitiveness of the information industry.
2.3 Text mining is at its simplest, machine-based bulk reading in order to process large volumes to identify and extract relevant information and relationships.
2.4 The Hargreaves Review paints a picture of publishers and other right holders blocking vital life saving research though manipulation of out of date copyright laws. The Review omits to mention that many publishers including Reed Elsevier fully understand the value of data mining as an aid to speeding up and enhancing research and are actively supporting its development.
2.5 The issue is whether creating an exception for text mining will deliver the results that both the Hargreaves Review and the UK Government desire, and if so at what cost, or whether it will in fact have the opposite effect of stifling investment, damaging academic publishing, drive authors and publishers away from the UK and put the UK out of step with international property treaties and EU law.
2.6 Hargreaves and the Government both state that too many decisions in the past have been supported by poor evidence9, and call for decisions to be made in future on the basis of good evidence, balancing economic objectives and the needs of various groups.10 It is noteworthy, therefore, that while supporting document T11 of the Review cites several examples of the benefits of text mining to organizations interrogating their own internal or third party data, when it comes to evaluating why a blunt exception to publishers copyright is the correct course, as opposed to normal licensing mechanisms, supporting document EE12 of the Review contains no economic impact analysis of the recommendation.
2.7 Whilst we agree that technological solutions provide significant benefits to organizations, we do not consider that the conclusion to be made from such examples is that data mining as a whole should be made exempt from copyright laws. The text mining in the six case studies cited13 as evidence of the benefits of text mining has, by definition, taken place without a copyright exemption being in place.
3. Text Mining Exception
Background
3.1 Reed Elsevier believes that text and data computer analytics are likely to become an increasingly important activity for many players in any area where large digital content sets have become available. Indeed, the Marc Weeber example14 discussed how he used automated text mining to carry out a preliminary machine-enabled read of some 40,000 search results, which enabled him to reduce his final reading list to 20 or 30 papers.
3.2 It should be noted that it is only after the original investments to produce high quality enriched content and text delivery technologies have been made in the first place, that further value can be extracted by data mining techniques. Reed Elsevier aims to manage its content in modern digital formats in ways that facilitate the easy access, use and re-use of that content. Our policy is to enable text mining as far as possible. We are working hard to develop options to make this technically possible. We have already facilitated this service across many sectors, including corporate pharmaceutical and academic chemistry customers, index service providers and mining technology application developers.
3.3 We have also mined our own content to develop important new productivity tools that serve science research leaders in universities, funding bodies, and governments. For example, SciVal Spotlight15 is a highly innovative product that mines Scopus16, the world’s largest database of research articles and citations. SciVal Spotlight’s proprietary algorithm uses a supercomputer to process terabytes of data to generate unique maps that identify areas of research in which universities or nations are especially prolific or influential. As such, they provide much needed quantitative and objective evidence to help UK universities and funding bodies identify their unique strengths as measured by global standards, and then to focus research investments on those areas of strength. In April 2011 the UK Department of Business, Innovations and Skills commissioned Elsevier to assess the performance of the UK’s research base relative to other countries. Its findings, which will be released by BIS in late September, will be used to inform the UK’s research and innovation policy and strategy for years to come.
Current model
3.4 Elsevier currently has two main categories of customer who engage in data mining. [see attached confidential client list].17 The first group is primarily large research institutions or corporations seeking to roll out internal text mining solutions, such as Harvard University and major pharmaceutical companies. The motivation of this category is to secure competitive research advantage by increasing data processing efficiencies internally and unlocking new insights. The second category, smaller scale application developers, are either working in partnership with us to develop additional functionalities for our own content databases, or are creating separate identified commercial application offerings. Academics in the field of Bioinformatics also look to text mine our content to develop new tools for pure research purposes or on occasion as part of an academic qualification process.
3.5 Providing content that can be mined requires special infrastructure and dedicated support. Though it is possible theoretically to consider text mining as something done with respect to a single article, it typically occurs against a whole collection because it is more useful the bigger the collection. We have discovered that the text mining capacities and needs of our customers can vary considerably, so we have developed two delivery mechanisms to offer our customers.
3.6 The first option ConSyn18, is designed for users using very large quantities of full text from our ScienceDirect database of electronic journal and book content, and we are shortly to launch the second, FTAPI, (full text API)19 recommended for users accessing content in small to medium quantities (up to 10,000 articles). ConSyn allows the download of large quantities of full text in machine readable XML format. Once an entitlement is set up, customers are free to select and download content according to their preferences. Customer profiles can be based on categories, lists of journals, publication year or ranges, and content "richness" e.g. full content, abstract only, or abstract and references. FTAPI enables users to search using ScienceDirect’s search syntax and then download one by one.
3.7 Customers are authorized to download the agreed required quantity of content from our server and text mine locally for their own research. Our terms provide that locally loaded content may be stored only for the period of use and not indefinitely, with limitations as to whom it should be made available within the needs of the research being undertaken. There may also be arrangements as to quantity displayed. For example, to prove a trend from large amount of content, the system often has to display snippets or larger sections of the underlying articles. Parties will agree what is necessary for the project in question.
These contractual limitations exist for two important reasons:
4. Platform stability
4.1 First, from a technology perspective, in order to text mine our content it needs to be copied wholesale and then transferred to the customer’s location or crawled on our server. Current technology does not permit multiple party crawling of our host server at speeds that are necessary for efficient mining. Where contractual arrangements do exist for crawling our content on our host server such as with Google indexing, this is done in a managed way that does not jeopardize service delivery to other users.
4.2 Text mining is a continuous experimental activity that typically requires multiple runs to achieve a desired result. We are working hard to respond to increasing customer demand for text mining with tools such as ConSyn and FTAPI mentioned above. These are accessed on a managed scale as part of our overall customer offering to ensure the underlying platforms can cope and to ensure we avoid system failure. The scale of our publishing output is such that, at any one time Elsevier’s core scientific publishing platform ScienceDirect, enables more than 9.5 million articles/book chapters from 2500 journals and 11000 books to be accessed by over 10 million users across 120 countries. ScienceDirect‘s operational service level runs at 99.6% availability, with around 1 million transactions per hour at peak times. To maintain this level of service is a major infrastructure challenge. Text mining is a complementary service we offer to our customers but it is important that it does not technically compromise our core publishing operation or other online services. Broad uncontrolled crawling would be indistinguishable from a denial of service attack and cause a shutdown of the system.
5. Data Security
Second, once the content has moved wholesale to the external location out of our control, there needs to be contractual and technical protections in place to prevent onward data misappropriation. As well as protecting the platforms from which we deliver our content, it is self evident that we need to protect the content itself from misappropriation. Contractual licenses enable content owners to validate the identity and research bona fide of potential text miners and limit the risk of misappropriation. Our terms stipulate that customers authorized to mine may not perform systematic substantive extracting or harvest in a way that would compete with the value of the original journal articles. Without such practical contractual legal protection the publisher’s core asset would be at serious risk of unauthorized redistribution.
6. Impact of Data Mining exception as proposed by Hargreaves
UK would become an unattractive location for copyright
6.1 Licenses are necessary to address legitimate technical and data security concerns about protecting our core publishing operation. A text mining exception which required us to give unlimited unfettered access and right of reproduction to anyone that claimed a text mining purpose would be unmanageable and unpoliceable.
6.2 Furthermore as well as the practical, commercial and technical objections, the proposed UK interim exception would put the UK out of step with international intellectual property treaties and EU law. The UK will be alone in the European Union in allowing data mining without permission. Since other Member States will be able to rely on more robust protections against unauthorized reproduction for right holders under their implementation of EU copyright law, this will result in the UK becoming an unattractive location for copyright. It could discourage right holders from undertaking publishing operations in the UK and instead provide opportunities to competitor nations who enjoy more robust copyright protections.
Risk of adverse consequences for UK based readers
6.3 Rightholders who do not want to risk their protected works becoming the target of unrestricted data mining without protection may seek to ensure such works are not accessible in the UK. As English law will apply regardless of the origin of the work, it is foreseeable that the owners of foreign copyrights could set up firewalls preventing access by users located in the UK. Whilst the UK is an important producer of high quality academic content, producing 6% of scientific articles published and accounting for 14% of the world’s most highly cited articles, UK e-sales only represent approximately 3-5% of total STM global e-revenue. Some content owners may choose to forgo the UK market to avoid the downside risks outlined above, thereby depriving UK users of access to valuable content.
6.4 In addition, if the proposed contractual override bar were to be implemented on top of the exception, there would not even be any safe harbour by which UK customers could agree to disapply the exception. This could have the unintended effect of limiting access even to those UK based customer users who had no interest in availing themselves of the data mining exception, because they could not by agreement contract out of the exception. This is a major restriction on the freedom of independent parties to contract as they wish.
Uncertain that UK developers would benefit the most
6.5 It will be argued that the exception could create opportunities for UK SME application developers and other technology players. Our experience to date after opening up our ScienceDirect platform to application developers worldwide, is that out of 20 developers taking up the opportunity, 11 were from the US and others from China, India, Germany and Australia. None came from the UK. It is not a given that it will be UK operators who will mostly benefit from any such growth opportunity. It could just as easily be developers from China, India or the US.
Unfair burden likely to breach international treaties
6.6 Due to current EU law, which the Government seeks to amend,20 the UK interim exception would apply only to text mining for non commercial use. We do not believe this to be sustainable in practice. Due to the mass reproduction required in order to undertake data mining it will be virtually impossible to distinguish between commercial and non commercial purposes. We would contend that due to the added value attributed to content derived from data mining; this will invariably nearly always be commercial in nature at some point. Science is full of cases of pure research which ended up having a commercial impact - indeed the Government’s declared policy is to increase the quantum of publicly funded research which does eventually result in commercial benefit to UK plc. An additional factor will be the number of instances of mixed funding for research projects, where some of the interest is commercial and some purely academic.
6.7 It is unreasonable to expect publishers worldwide to be able to police after the fact who has text mined in the UK for what purpose, with the only redress available being a case of copyright infringement though the courts post hoc. By seeking to broaden EU and UK law to give copyright exception to commercial users, the UK Government appears to favour removing the right for UK publishers to consent to global technology companies being able to profit from the use of copyright work, without provision for fair compensation. To quote: "the amount of fair compensation provided would be zero"21
6.8 It is not clear that the proposed exception would satisfy the International Berne Convention test whereby exceptions must not conflict with the normal exploitation of the work nor unreasonably prejudice the rights of the author. If it is true that data mining is the new reading and the likely future mainstream method of consumption of information, then it is set to become 21st century "normal" exploitation and any exception would be contrary to the treaty.
Proposal is premature – No market failure
6.9 This highlights the fundamental problem with the exception proposal. Text mining is still in its infancy. No one yet knows how it will develop. But this is not a case of market failure – indeed as we have demonstrated above, quite the opposite. We are investing heavily to provide facilities for customers who want to text mine. But even the most technologically advanced publishers like Reed Elsevier are still feeling their way. Data mining poses considerable risks to data and platform security which need to be managed. For real progress, issues such as standardization of formats will also need to be addressed. The obstacles to further advances are not legal barriers that a blunt exception would solve. Indeed we have outlined some of the damaging unintended consequences that could result from such a step.
7. Conclusion
· Text mining is an exciting new development in improving discoverability of research. We share the Government’s objective in encouraging greater use of analytics to deliver innovative products and increase knowledge, and Reed Elsevier is actively investing in tools to enable this facility.
· However we believe that there is a misconception as to the technical, practical and legal impact of the proposed text mining exception to copyright which has led to incorrect conclusions as to the best way to achieve the objective.
· The proposals are premature, and in our view would be counterproductive, detract from the objective and dissuade existing investment. Far from delivering growth, they risk prejudicing the domestic professional publishing industry, currently a market leader in its sector, against foreign competitors. They should be reconsidered.
[1] SciVerse ScienceDirect - http://www.sciencedirect.com - is a full text scientific electronic database offering digital content from more than 2,500 peer reviewed journals and 14,000 books. There are currently more than 10 million articles available electronically, some dating back to the 1820’s, accessed by over 10 million readers across 120 countries.
[1]
[2] SciVerse Scopus - http://www.info.sciverse.com/scopus - is the world’s largest abstract and citation database of peer-reviewed literature and quality web sources. It holds 42.5 million records, 22 million with references back to 1996 and 20 million back to 1823. It also references 24 million patents from 5 patent offices.
[2]
[3] Brain Navigator - http://www.brainnav.com - has revolutionised the way neuroscientists work, becoming a central hub for research. It is an online solution, based on Elsevier’s vast library of brain publications that enables neuroscientists to browse and compare atlas plates, species and diagrams using 3D models which facilitate the visualization of brain structures and improve their understanding. In 2009 it won a PROSE award for best e-product in Biological and Life Sciences.
[3]
[4] Geofacets - http://www.info.geofacets.com - is an innovative search and discovery platform that combines a wealth of geo-referenced maps and related data, intuitive search functions and proven Elsevier content into one easy-to-use web-based tool to provide geologists with the information they need to make accurate geological assessments, and to help oil, gas and other energy companies to improve their productivity, for example, by increasing their strike rates when drilling for oil.
[4]
[5] Xpert HR - http://www.xperthr.co.uk - is the UK's most comprehensive online source of legal compliance, good practice and benchmarking information made available to HR professionals.
[5]
[6] . Lexis®PSL – http://www.lexispsl.co.uk - Content is compiled by an expert team of hundreds of solicitors and barristers who themselves have worked in every size and type of practice, who select, organise and link primary law, legal knowhow, precedents, forms and authoritative commentary for practising lawyers.
[6]
[7] Lexis.com - http://www.lexisnexis.com - LexisNexis® is a leading provider of information and business solutions to professionals in a variety of industries - legal , risk solutions , corporate , government , law enforcement , accounting and academic .
[7]
[8] Page 8 of the Government Response
[8]
[9] Hargreaves Review Chapter 2
[9]
[10] Government Response Paragraph 3
[10]
[11] Supporting doc T Hargreaves Review
[11]
[12] Supporting doc EE Hargreaves Review. 3 rd and 29 th pages (not numbered)
[12]
[13] Supporting doc T Hargreaves Review – Marc Weeber et al, Thomas C Wiegers et al, Leach S.M. et al, Varun K Gajendran et al, Fan W Wallace et al (Dow Jones Chemicals), Ananiadou S, et al.
[13]
[14] See Footnote 14
[14]
[15] An Elsevier Product - SciVal spotlight - is a customised web-based mapping tool that enables institutions or countries to evaluate their research performance and set/ adjust strategies. It is designed to provide up to date to support decisions about fund allocation, staffing, and which new research areas to pursue
[15]
[16] An Elsevier Product - SciVerse Scopus - is the world’s largest abstract and citation database of peer-reviewed literature and quality web sources. It holds 42.5 million records, 22 million with references back to 1996 and 20 million back to 1823. It also references 24 million patents from 5 patent offices.
[16]
[17] Confidential Customer Information – Supplied to the Committee under separate cover .
[17]
[18] An Elsevier text mining tool
[18]
[19] An Elsevier text mining tool
[20] Paragraph 7, page 8 Government Response
[20]
[21] Government Response page 8, first bullet point