Select Committee on Science and Technology Written Evidence


Memorandum from Dr Robert Cannon, Dr Nigel Goddard, and Dr Fred Howell, Axiope

  Summary:  Open Access to scientific publications will save public money, will facilitate scientific communication and will improve access to scientific research by the general public. But opening access to written publications is only the first step in using information technology to improve scientific productivity: the really exciting possibility is to open up access to the actual research data underlying publications, since this would allow significantly more value to be extracted from expensively gathered data sets. Opening access to the electronic research data on which publications are based is technically feasible and would yield dramatic productivity benefits to the scientific community and the nation. The prime example where public access to data has shown its effectiveness is the genomic databases—which have spawned entire new disciplines (eg, bioinformatics) and transformed all other biomedical fields. In order to extend this success in new fields, such as systems biology, functional genomics and systems neuroscience, it is essential to widen access to actual primary data (and not just the publications summarising the findings). Open access to data also offers benefits to commercial publishers, including a path forward for the traditional publishers that embraces the Open Access model. A policy of open access to data is widely supported by funding bodies but requires changes in the ways science is funded and evaluated before it can be achieved.

  1.  Drs Cannon, Goddard and Howell are researchers in Neuroinformatics, the combined field of Informatics and Neuroscience, at the University of Edinburgh. They are also co-founders in a software company, Axiope Limited that is developing software to facilitate scientific communication and data sharing in accordance with themes expressed here. We became concerned about access to scientific data because we were aware of many exciting potential research projects that were nevertheless difficult or even impossible to undertake because of the inability to access the results of publicly funded research.

  2.  Access to computers and to the Internet is dramatically changing the way people work. Many actions that used to be difficult, time consuming or expensive are now almost free. For example, it takes a few seconds, and costs almost nothing to view previous written evidence to this committee. However, the existence of technology that can improve productivity does not necessarily imply that it will be used. This is particularly true within a scientific context where individual researchers are encouraged to be independent. Developments that are generally beneficial to the community may not actually take place in practice because they are not locally beneficial to individual researchers or organisations.

  3.  Science is in an excellent position to benefit from technological developments in network infrastructure and software but has been slow in making good use of it. Thanks to substantial investment in JANET (the Joint Academic NETwork) UK science has benefited from almost universal high speed internet (well in excess of domestic broadband) for the last 10 years. The reasons for the delay in realising the benefits of Internet technology are partly technical, and partly cultural. Technical reasons include availability of suitable software and the training of researchers in IT skills. The cultural reasons concern the motivation and reward systems in place in British and global science. Many organisations are working to address different aspects of the technical problems. For the full benefits of this technology to be realised, government action is required to adjust the conditions in which scientists operate in order to favor behaviours that are ultimately of benefit all round. Some public-spirited researchers are using the Web in an ad-hoc fashion to publish their data, and some journals allow "supplementary data" to be uploaded onto their sites, but there is no systematic approach to, or requirement for, making scientific data public.

  4.  As scientists ourselves who depend on the work of other researchers, we are keen to demonstrate that, whatever mechanism is in place for publishing scientific papers, the new technologies have enormous potential for improving scientific productivity. The need for many scientific studies goes beyond gaining access to the text and graphs that another researcher puts in a paper. There is a need to gain access to the work itself: to the primary data gathered during experiments from which the publication is derived. Normally, very little provision is made for the archiving, cataloging or storage of this data. The result is that even the researchers who gathered it are unable to locate or reuse it even just a few years after the publication of a paper. Researchers themselves are not happy with this situation, but under the existing system where publishing selected results in high profile journals is paramount, it is very hard to justify the time or resources required to do anything else.

  5.  Nevertheless, with external promoting, some disciplines are moving in the direction of regarding data publication as standard and the benefits are clear. For example, the availability of genomic datasets (not just papers about them) in public databases has spawned a huge growth in integrative and cross-disciplinary research. Increasingly, particularly in biology and medicine, new research depends on data from many different experimental techniques, often from different laboratories. For example, the benefits of having the raw data available for anyone to analyse has prompted the gene expression community to encourage major journals (including Nature, Science, The Lancet) to require all papers based on gene expression micro-array experiments to also make the raw data from their experiments freely available online. This pioneering effort has led to the exhaustive recommendations for presenting and publishing results from micro-array experiments—the "MIAME" requirements. The consequence is that there is now the potential for extracting more value from experiments, which have already been done. Furthermore, there appear to be no losers in the game: those who collect the initial data see it being used in ways they had never dreamed of. The other users are able to do research that would have been impossible without publication of the data.

  6.  The principle that research data gathered at public expense should be made available for maximal use by other scientists has already been adopted by several public funding agencies. The National Institutes of Health in the United States, has adopted a requirement[295] from 1 October, 2003, in the case of grants in excess of $500,000 per annum, that data be made publicly available. The NIH statement makes it clear that the obligation will be extended as methods and technologies become available to achieve this. The UK MRC has a policy on data sharing[296] that begins with the statement:

    "The MRC expects investigators supported by MRC funding to make their research data available in a timely and responsible manner to the scientific community for subsequent research with as few restrictions as possible."

  More recently, behalf of a consortium comprising BBSRC, MRC, Wellcome Trust, JISC, DTI and NERC the MRC has issued an invitation to tender for analyses of the data sharing landscape in the life sciences[297]. The same sentiments have been endorsed by the OECD[298] with a statement by the Dutch Minister of Education Culture and Science including the following:

    "It is obvious that Open Access will be a necessary condition to realize the potential of research data as the floating capital of global science. Governments of OECD countries spend about $650 billion annually on research, expanded use of data sources could impressively increase the taxpayers' value of this expenditure."

  We note that the term "Open Access" is used here to refer to primary research data, not research publications.

  7.  Attitudes towards data sharing in the scientific community cover the whole range from those who regard it as scandalous that data is not routinely accessible, to those who regard the data they collect as their own personal property and resent any suggestion that it should be made available to other researchers. What is clear, however, is that preparing material for archiving and sharing incurs a cost in both time and materials. While the eventual benefits are great, the main beneficiaries are other scientists, not those who are responsible for doing the work. Therefore, to achieve the transition into a data-sharing culture, which is of greater benefit to everyone, individual scientists need immediate reasons to prepare and publish their data. One such inducement is the requirement by some journals that the data upon which a paper is based must be in a public archive before the paper can be published. Another possibility is that funding councils impose conditions on grants that researchers must publish the resulting data (as the NIH is beginning to do). A third possibility is to introduce a system that adequately rewards researchers for making data publicly available, perhaps with conditional grant extensions of funds to cover data publication costs.

  8.  Traditional scientific publishers have become content providers who face the challenge of the Open Access model, as put forward by, for example, BioMed Central and the Public Library of Science. We see significant opportunity for the traditional publishers to refocus their business on value added to the content provided, at no cost, by scientists. It is already the case that journals which may charge for access to publications do not charge for access to the data underlying those publications, in the cases where it is available. Typically these data are held in publicly funded database (eg, GenBank), and it is clear that scientists will not agree to deposit their primary data exclusively in a commercial database. It is vital that the ownership of the primary research data remains in public hands and publicly accessible. Traditional publishers can refocus their services on adding value through curation, cross-referencing and providing the best possible search and discovery facilities. There is a parallel with the Open Source movement in software development. Traditional software companies (including the largest players such as IBM) have found that they can increase their value by promoting Open Source and providing additional services. Scientific publishers need to make a similar transition.

  9.  Recommendations. The Government has a crucial role to play in catalyzing the transition to a scientific culture in which research data is routinely stored in publicly accessible archives. It can do this by:

    (a)  providing funds and assistance for pioneer research groups to establish complete electronic archives of their results, and encouraging the administrators of these archives to co-operate with commercial entities, which provide added-value services.

    (b)  acknowledging the importance of making data publicly available by rewarding data publication in terms of the Research Assessment Exercise and other evaluations.

    (c )  In due course mandating that all publicly funded research data should be publicly accessible within a specified time of acquisition (subject to patient confidentiality conditions, etc).

February 2004

295   NIH Data Sharing Policy: Back

296   MRC policy Back

297   Joint council's initiative Back

298   OECD on Access to Research Data from Public Funding",2340,en_2649_201185_26391529_1_1_1_1,00.html Back

previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2004
Prepared 20 July 2004