Business, Innovation & SkillsSupplementary written evidence submitted by the British Library

Text and Data Mining

Text and Data mining relates to two activities—one is the mining of facts that libraries and organisations have already purchased, while the other is mining publicly available factual information on the web. The bullet points below focus mainly on purchased information but not exclusively.

By not permitting text and data mining, research is not happening that could be happening.

Due to the vast number of publishers around the world, negotiating effective licensing terms for text and data mining is essentially an impossible task and is currently acting as a barrier to scientific and academic progress. For example there are over 6000 scientific publishers currently active and the British Library alone has over 260,000 journal titles in its collections. There are of course billions of websites. The largest single UK database of scientific material is UK PubMedCentral and none of the content from publishers, other than those that have been paid for by the research sector in advance in order to be published, are available for text and data mining (so-called “open access.”)

Talking to the National Centre of Text Mining who have said that given the difficulty of negotiating licences, the vast array of publishers to approach, combined with academics not being lawyers, most scientists simply don’t start the process. Of course the process of data analysis takes place horizontally across a specific field of research and not in vertical silos of medical information produced by a specific publisher.

American researchers and companies assert “fair use” in this area and Japan has introduced an exception for data analytics, recognising that the extraction of facts does not trade on the creative expression of a work.

Text and data mining involves analysis by researchers of material that has already been purchased by their institution—for example UK universities spend over £250 million a year on acquisitions, mainly journals. Annual purchases by the NHS would also increase this figure significantly. An exception, to allow further usage of the material, will create greater justification for organisations to continue subscribing to content in a period of pressured budgets and rising prices. Universities and pharmaceuticals have been subscribing to electronic journals from the advent of e-publishing in the mid-1990s. Onward access to scientific material is central to a university or a pharmaceutical company’s own reputation and scientific standing and therefore they would not do anything to jeopardise this. It is important to draw a clear distinction in behaviours between the use of academic electronic resources by research institutions, universities, pharmaceutical companies, NHS etc and individuals who file share illegally online in this area.

The intent of text and data mining is to extract the facts within text and datasets to progress scientific and cultural understanding, not to make copies and distribute the material in its original form. Text and data mining, by its very nature, is trying to find relationships and information between facts expressed across disciplines to material to which one has lawful access, and does not result in anything that is substitutable or bares any resemblance to the original published work. Copyright law is designed to protect creative expression, for example in a novel or a painting, and not prevent researchers extracting facts to progress science. The purpose of text and data mining is not the appreciation or utilisation of creative expression, but factual extraction and therefore the results of the process (no more than a hypotheses/proof of a relationship) should not accidentally be covered by copyright law.

Text and Data mining will also allow research that is not currently happening, due to cost restrictions, to take place—through text mining, research can be conducted at a reduced cost as text can be searched rather than expensive laboratory experiments being set up.

Business and IP Support

The British Library Business & IP Centre supports innovators and entrepreneurs from that first spark of inspiration to successfully launching and developing a new business.

Entrepreneurs’ views on the IP process

In February 2011 the British Library ran an online survey to ask users of its Business & IP Centre their opinions of the IP process, innovation and their plans to grow. The survey showed that the biggest barriers that SMEs face in protecting and exploiting their IP are around finding affordable services, knowing where to go for advice and support, and the feeling that the IP system favours large companies.

Respondents were also asked what improvements or new services they would like to see in the future. They were most interested in having more affordable support and advice, a faster and simpler process to register IP and a unified patent application across the world, or at least across EU markets.

How the British Library Business & IP Centre has helped entrepreneurs with their IP

The online survey showed that over a third of respondents had used the British Library’s IP-related services. The chart below shows the Centre and its services have made a real difference in helping people to understand about IP, shown them how to do their own searches on IP and has saved them time and money.

What can be done to overcome barriers?

Improving upon such services across the country and online would be a major step in supporting innovation in the UK. Eg the British Library has developed a number of IP learning tools that could be useful for an online learning environment. As the Business & IP Centre survey shows, start-ups and SMEs rated affordable support and advice services to be the most needed change to the current SME landscape. The Centre does this by providing free access to its unrivalled business and intellectual property (IP) collections, workshops, networking events and 1:1 advice sessions. Since March 2006, the Business & IP Centre has welcomed over 250,000 people through its doors.

We are unique in that—as the national library of the UK along with the Intellectual Property Office—we hold the largest collection of published business and intellectual property information in the UK (if not the world). We make available thousands of market research reports, which gives small businesses access to the same resources as those within a major multinational company. We make the link between IP and business information so that we can support users through the whole innovation cycle—from idea to market.

November 2011

Prepared 26th June 2012