Select Committee on Science and Technology Written Evidence

Letter from the Sanger Centre

  I am replying to the letter of 20 July addressed to Dr John Sulston regarding the House of Lords Enquiry into Human Genetic Databases. I am head of the Informatics Division at the Sanger Centre, and hence responsible for databases. The Sanger Centre is a genome research institute, funded primarily by a charity, the Wellcome Trust. The Sanger Centre is itself operated by a charitable limited company, Genome Research Limited, which is legally seen as being controlled by the Wellcome Trust.

1.  What current projects involve collecting genetic information on people in the UK? What other projects are about to start? Are there collections of material (eg tissue samples) that could be used to generate databases of DNA profiles?

  I am confining my answer to this question to databases built or primarily maintained at the Sanger Centre.

  Our major activity has been to contribute to the public domain reference sequence of the human genome, for which we are committed to finishing a third. The sequence is almost all obtained from bacterial clones (BACs, PACs and cosmids) made elsewhere, which each contain of the order of 100,000 bp of human DNA that has been obtained from anonymised donors.

  We have also been part of a consortium to identify SNPs (Single Nucleotide Polymorphisms) which are points in the genome where people differ genetically. In the near future we will extend these studies to linkage disequilibrium analyses, which look at the covariation of nearby SNPs, ie the extent to which differences that are close in the genome occur together in individuals. To support this we hold DNA from cell lines from multiple individuals.

  Another large scale project that has started during the last year is to identify somatic mutations that are involved in cancer (somatic mutations are ones that are not inherited from your parents, but which occur in the person getting the cancer), by looking at material from tumours. We therefore hold collections of tumour tissue and derived cell lines to support this work.

  We also have a number of more minor projects, for example to study the human MHC locus (Major Histocompatibility Complex), which is one of the most variable regions of the human genome, involved in the immune response and self recognition.

  Any sample of human DNA or tissue could be used to generate a DNA profile. As described in (4) below, our primary approach to ensuring privacy is to ensure that the sample is anonymised so that any results can not be traced back to the original donor.

2.  Why are these genetic databases being assembled? How are these activities funded? What practical considerations will constrain developments? Are there alternative ways of fulfilling the objectives?

  The reference sequence of the human genome is being collected as a fundamental reference material for research on human molecular biology and genetics. It will underpin much of the future biomedical research worldwide. Our contribution to the sequence is funded by grants, primarily from the Wellcome Trust with a relatively minor contribution from the Medical Research Council. By making the sequence freely available without any attached IPR restrictions we aim to avoid constraints on future use. The human genome sequence is so fundamental that reading it, and its variations, out in the human population will be important to a very wide range of applications, many of which cannot now be envisaged. Given this, alternative approaches such as relying on a private provider who places constraints on use are not appropriate, since it would not be desirable for any one body to control the resource.

  The work on SNPs and linkage disequilibrium will help provide a foundation for a detailed understanding of human population structure, and contribute to future genetic disease mapping. This is funded via a consortium of pharmaceutical companies with the Wellcome Trust, that is committed to making the results publicly available without constraints.

  The work on cancer variation will contribute to our understanding of cancer, and thus hopefully to its cure. In particular it may help stratify cancer types that by other criteria appear identical. It is funded by the Wellcome Trust. Like the other projects, this work will aid fundamental biomedical science, which is a primary aim of the Wellcome Trust.

3.  What is the genetic information that is being collected? How is it being stored and protected?

  For the reference sequence, the primary information being collected is the sequence of nucleotides ("A"s, "G"s, "C"s and "T"s) which is copied to multiple places all over the world. The physical clones from which the sequence is obtained come from widely distributed libraries, for which individual clones of interest can be obtained from a variety of sources.

  For SNP, linkage disequilibrium and cancer studies the information that is collected is the position and nature of variations from the reference sequence in individual (anonymised) samples. This information is again stored in computer files and databases, which at the appropriate time will be made publicly available.

4.  How do the organisations involved see their responsibilities regarding privacy; consent; future use; public accountability; and intellectual property rights?

  We believe that the correct model for as many resources as possible that are used for broad research purposes is to anonymise the underlying samples so that they cannot be referred back to the original donors, and to obtain very broad consent for future use on that basis. This approach ensures privacy, and does not impede future research. Since the Sanger Centre is involved primarily in basic research, this is the approach we prefer to take. Our human sequencing and human variation projects have so far always used anonymised sources.

  Of course, full anonymisation is not appropriate where studies are undertaken that might involve going back to the donor, for example to change the care of a patient involved in a trial, or where further information may be required later in a study (such as prospective long-term studies). We have less experience with such cases, but cannot rule them out for the future. In this case, specific fully informed consent is appropriate, with further consent for any use that arises not covered by the initial consent. Also anonymisation should be used as far as possible, so that access to information concerning the patient is minimised. Finally all data that might lead to identification (on computers or paper) must be kept secure.

  As an institute funded by grants mainly from charity or public agencies we believe that we should be accountable for our activities and indeed we produce reports on the progress of our grant-funded activities. Furthermore, the results of our research are made available in a timely fashion by publication and/or distribution across the Internet.

  Concerning intellectual property rights (IPR), we believe that genomic sequence is a fundamental resource that is precompetitive and should be made available without IPR attached. However, we recognise that IPR is important when materials or data are close to being converted into products such as pharmaceuticals, where protection is needed for the substantial investment in product development.

  We expect that for some programmes at the Sanger Centre where results tell us more about function and are closer to therapeutic application we will take out IPR. But we also expect that most licensing terms would be non-exclusive, with exclusive licences only for material clearly close to creating a product that required substantial resources to bring to market.

  Our approach to the patenting of genes is that this is reasonable where a significant function has been directly established, in which case the patent should cover the application of that function and natural extensions, rather than all possible not yet envisaged or speculative uses of the gene as granted in composition of matter patents. The reason for this position is that we are in a great state of ignorance. Almost nothing is known about almost all genes. Granting rights over all possible applications of a gene, as has happened, creates a disincentive to research by others both in the industrial and, to an increasing extent, academic sectors.

  Unfortunately there has been a "land grab" for rights to genes based on sequence in the absence of clear functional assignment. An example of how this can lead to surprising results is the CCR5 gene. Human Genome Sciences (HGS) filed a speculative composition of matter patent application on CCR5 based on its sequence and similarity to a broad class of receptors. Subsequently researchers at NIH established independently that CCR5 was a coreceptor for HIV, the AIDS virus, and an important therapeutic target. HGS updated their patent application and were awarded a patent with complete rights to all uses including for AIDS research. However almost all the investment of time and money, and the inventiveness, was made by the NIH researchers. This type of situation has a negative effect on research into gene function.

  We therefore encourage this Committee, and the Government, to argue for a narrow interpretation of patent rights over genes. This is currently a matter of debate for the EPO (European Patent Office). The main counter-argument to this is that the USPTO (US Patent and Trademark Office) is still giving broad rights, although they have tightened criteria somewhat recently. We must of course maintain a competitive position with respect to the United States. Any pressure that can be brought to bear on the USPTO would be very positive.

  Finally, the IPR focus in genetic research is shifting from materials that can be patented to information in databases. Much of this information is secondary, derived by analysis and annotation that combines multiple sources. In this case we believe that there is a creative element, and copyright type rights are appropriate to potentially allow recovery of costs of forming the data collection. These should not extend to give rights over others who reach similar conclusions independently. There is a new European database copyright which may be interesting in this regard, but its effectiveness and use has not been fully established yet. Notwithstanding this position, for public or charitably funded research we believe in most cases the most effective way to disseminate results is to write off the cost of producing the data, and place the resulting database in the public domain without IPR constraints. This allows maximally effective use of the results without complications, and is the approach taken by us for all the databases we generate at the Sanger Centre.

5.  How do they see their activities in the area of genetic databases developing in the future? What advances in sequencing, screening and database technology are they anticipating?

  We see primary genomic DNA sequence as of very high value to biological research.

  Even though the human genome is mostly sequenced, we expect demand to continue to grow in the medium term (five years), primarily for sequencing other organisms of research or economic interest.

  We expect the demand for genetic information on human sequence variation to grow enormously on the same time scale, partly to study the contribution of inherited genetic variation to disease, and partly to study the somatic variation (variation within an individual's body) that underlies cancer. We are directly involved in studies of both types, which we expect to grow. Similar studies will be required to analyse genetic traits in organisms of economic importance, which will help traditional breeding as well as provide potential information for genetic modification, and in model organisms, which will help understanding of basic biology that will inform medical research.

  Under the influence of these forces, automation and efficiency will continue to improve, reducing costs progressively. We do not foresee a dramatic reduction in costs per unit of information on the five year timescale, but internationally we expect capacity to continue to increase.

  Demands on data management will grow dramatically, however. First, the amount of primary data is increasing faster than Moore's law for computer efficiency growth (two-fold increase in 18 months, valid for the last 30 years). Second, much of the information of interest will be secondary, derived from the primary information, and in many cases increasing in quantity faster than linearly with respect to the primary data; ie if the amount of sequence doubles the amount of secondary information more than doubles.

6.  What lessons should be learnt from genetic database initiatives in other countries?

  For the reasons given at the end of the last section, it is even more important now to be supporting adequately central resources for the management and presentation of genetic information to the research and development community. We should learn from the USA, who have funded a central national facility NCBI (National Centre for Biotechnology Information), which has become a world leader for management of biological information. However there is a serious economic and political risk in allowing one country to take sole charge of such an important resource. It is also important for there to be serious competition to NCBI to maintain quality and responsiveness in a changing field. In our view UK interests are best served by strongly supporting the EBI (European Bioinformatics Institute, sited in England at Hinxton, next door to the Sanger Centre) as the natural partner/competitor to NCBI. We encourage full support by the British Government for substantially increased funding of the EBI through both EC and EMBL (European Molecular Biology Laboratory) channels (the NCBI budget is currently around $30 million per year, more than twice that of EBI). In addition to this action on a European scale (but sited in Britain), there is also a need for good national computing network and infrastructure to deliver the relevant information to biologists' desktops

Richard Durbin
Head of Informatics

3 October 2000

previous page contents next page

House of Lords home page Parliament home page House of Commons home page search page enquiries index

© Parliamentary copyright 2000