Select Committee on Science and Technology Fourth Report

CHAPTER 3: Setting the scene

3.1 This Chapter clears the ground for the discussion later in this Report. In conjunction with Appendix 7, it first provides a grounding in genetics and associated science. We then say what we mean by a 'human genetic database', and consider what regulation or guidance might be appropriate.

Introduction to genetics

3.2 As background to our discussion of human genetic databases and associated issues, Appendix 7 gives an introduction to some of the main features of genetics relevant to this Report, together with a glossary. Among that detailed material, the key points are summarised in Box 1.

Box 1

Key points in genetics

(see also Appendix 7 and, for the emboldened words, its concluding glossary)

1. Each individual (apart from identical twins) has a unique genetic profile - a mixture of the genetic profile of their two parents.

2. This individual genetic profile is, in turn, determined by the unique ordering of just four different molecules (bases), paired together in the famous 'double helix' of DNA. The DNA forms very long strings - some 3 billion base pairs in total. The entirety of each individual's DNA is their genome.

3. Each of the body's many cells contains a copy of the genome, mainly in the cell nucleus. The DNA forms chromosomes of which humans have 23 pairs. The 23rd pair determines the sex of the individual.

4. Except for sex differences, the variation between the genomes of different people is extremely small - about 0.1 per cent between unrelated individuals. The amount in common is so extensive that it is possible to discuss a general, characteristically human sequence of DNA - the human genome.

5. DNA controls the operation of the body's cells by determining the various proteins (molecules which are vital for cell growth and function) produced within them. In this way, DNA influences not only overall physical characteristics but also many of the processes that help determine good health or illness.

6. The parts of the DNA sequence which govern protein production are called genes. There are thought to be 30,000-40,000 genes in the human genome, occupying only about 3 per cent of the total DNA. The role of most of the remaining 97 per cent of the DNA (the so-called 'junk DNA') is not yet understood.

7. A small number of genes comprising a small percentage of the DNA are contained in the mitochondria, tiny but important structures outside the nucleus which play a major role in energy production. In contrast to other genes, these are inherited only from the mother.

8. While an individual's cells contain full copies of that person's genome, in different cells different genes are active (expressed). As a result, there will be different products in each cell type, giving them their separate functions in the body.

9. Variations between individuals' genomes come in two important types: alterations to the sequence at specific locations along the genome, referred to as single nucleotide polymorphisms, or SNPs; and variable numbers of repeating base sequences between particular locations on the genome, referred to as short tandem repeats, or STRs.

10. The sequencing of the human genome conducted over the past few years has resulted in a general map of human DNA. This map is being annotated by adding information about the different variations (alleles) that have been observed at certain locations, what function in the body particular regions of the genome influence, and the significance of observed variations.

11. Different alleles at important locations within the genome, particularly within genes, can change an individual's susceptibility to various diseases. In some cases genetic factors make the onset of disease inevitable, in others they merely make it more likely. For many common diseases, environmental and lifestyle factors combine with genetic susceptibility in determining whether symptoms appear, at what age and with what severity.

What is a genetic database?


3.3 For the purposes of our call for evidence (see Appendix 2), we defined human genetic databases as:

"collections of genetic sequence information, or of human tissue from which such information might be derived, that are or could be linked to named individuals".

3.4 We included in this definition not only genetic data but also human tissue - and this needs explicitly to include other biological samples (such as blood, saliva and other body fluids[12]). For the foreseeable future, it is inevitable that only a very small proportion of each individual's genetic data will be available on databases. Recourse to the original sample will be required from time to time to obtain further genetic information, giving rise to the ethical considerations discussed in Chapter 7.


3.5 The written evidence we received (and published in November 2000[13]) gave details of the wide range of different projects that fell within our classification of human genetic databases. The main features of the overall picture given by that evidence are summarised in Box 2.

Box 2

Principal features of existing databases

1. All the databases described in our evidence were medical, with the exception of the National DNA Database (maintained by the Forensic Science Service).

2. The databases were largely maintained by university medical researchers or doctors, usually with funding from the MRC, The Wellcome Trust, the NHS, and medical research charities. Commercial pharmaceutical companies also maintained genetic databases.

3. The collections held tissue or biological samples (such as blood), sometimes with small amounts of sequenced DNA - no-one was yet collecting large databases of sequenced DNA data on individuals.

4. Any sequenced DNA data was restricted to narrow regions of particular chromosomes, especially genes.

5. The medical genetic databases had one of three main purposes:

·  to identify genes and genetic variations that were involved in a particular disease[14];

·  to investigate gene/environment interactions; or

·  to evaluate medical treatments.

6. The collections were almost always collected from patients who suffered from the particular disease of interest (e.g. the ADLIB project, p[15] 14). Using such data for research into unrelated diseases or conditions would not usually be worthwhile: the sample populations were not representative of wider society or of other patient groups.

7. Studies which looked at links in population samples between environmental and genetic factors in disease tended to have wider potential applications. The population samples needed to be more generally representative - e.g. the Avon Longitudinal Study of Parents and Children (p 3). The wide range of diseases and factors that might be studied, and the inclusion of environmental information (and maybe more social identifiers), perhaps meant that consent and data protection had more prominence in these projects than in studies using small-scale, single disease databases.

3.6 The occurrence of human diseases is influenced to varying degrees by environmental and genetic factors. Some diseases (such as Huntington's Disease) are wholly genetic in origin while others (such as radiation sickness) are wholly - or virtually wholly - environmental. However, most disease is the result of some interplay between environmental and genetic factors. Examples are given in Box 3.

Box 3

Environmental and genetic influences on disease
Wholly environmental
Wholly genetic
Chemical poisoning

Radiation sickness


Heart disease

Breast cancer
Huntington's disease



3.7 Human genetic databases are also being developed in other countries. We received helpful evidence on practices elsewhere from the Science and Technology Sections of the British Embassies in France (P[16] 87), Japan (p 16) and the United States (p 26). The Department of Health provided a useful summary of the situation in other countries in its written evidence (p 32).

3.8 At our Imperial College seminar (see Appendix 4), Dr Jørgen Olsen of the Danish Institute of Cancer Epidemiology outlined the Danish experience, and pointed us to a useful summary[17]. In common with other Nordic countries, there was a long tradition of linking health and socio-demographic data across different data sets. Denmark maintained a Central Population Register which, for each individual, contained a 10-digit personal identification number, current and former addresses, a list of immediate family, and other information which could readily be used to link health and genetic information.

3.9 Also at that seminar, Dr Lars Järup from Imperial College (and the Karolinska Institute in Stockholm) pointed us to the Swedish experience[18] where large collections of biological material, which could be linked to health and lifestyle information of the donors, had been assembled over the past 15-30 years.

3.10 There has been wide international interest in Iceland, where deCODE Genetics had purchased exclusive access to data on the Icelandic population[19]. Sir George Radda, Chief Executive of the MRC, pointed us to another population-based study in Estonia (Q[20] 73).


3.11 Under the Data Protection Act 1998, personal data are defined as:

"data which relate to a living individual who can be identified from those data or from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller."

3.12 In its written evidence (p 70), the Office of the Data Protection Commissioner pointed out that genetic test results that could be linked to a living individual would constitute personal data subject to the Act's provisions. Indeed, such information would, being about an individual's health or condition, be deemed 'sensitive personal data' under the data protection principles set out in Schedule 1 to the Act and repeated in Box 4.

3.13 The Office of the Data Protection Commissioner further pointed out that the protection afforded by the Act applied to all personal data. There was thus no distinction between genetic and other information that constituted personal data within the meaning of the Act. It did, however, raise the question of whether, given the particular sensitivity of genetic information, there should be such a distinction and, indeed, a separate body to regulate genetic information (p 70).

Box 4

Data protection principles

(These principles are taken from Schedule 1 to the Data Protection Act 1998, which is "the Act" mentioned below.)

1. Personal data shall be processed fairly and lawfully and, in particular, shall not be processed unless:

   a) at least one of the conditions in Schedule 2 to the Act is met[21]; and

   b) in the case of sensitive personal data, at least one of the conditions in Schedule 3 to the Act is also met[22].

2. Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.

3. Personal data shall be adequate, relevant and not excessive in relation to the purpose or purposes for which they are processed.

4. Personal data shall be accurate and, where necessary, kept up to date.

5. Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes.

6. Personal data shall be processed in accordance with the rights of data subjects under the Act.

7. Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of data and against accidental loss or destruction of, or damage to, personal data.

8. Personal data shall not be transferred to a country or territory outside the European Economic Area unless that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data.


3.14 Our starting point was an examination of the issues arising from human genetic databases and the principles that might inform any new regulatory arrangements. The more we considered the evidence received, the clearer it became that regulation of human genetic databases per se was neither necessary nor feasible.

3.15 Concerns about human genetic databases relate to uses and possible abuses of the individual data which they contain rather than the databases themselves. We note in paragraph 3.11 that personal genetic data are already covered by the provisions of the Data Protection Act. In our view these are adequate for the purpose. While genetic data may be particularly sensitive, we do not feel they are uniquely so. Indeed, there are other sorts of data - concerning not just health - which individuals might regard as far more sensitive.

3.16 As to feasibility, any regulatory framework would be impossibly cumbersome. Any regulation would need to cover not only genetic data but also the human tissue and other biological samples from which such data could be derived. That would bring into scope a number of different types of collection across almost all clinical medicine, such as:

(a)  pathological samples taken for diagnostic purposes and kept, together with means of identifying the patient, in hospital and other medical laboratories for later reference as necessary;

(b)  biological samples retained for medical research purposes; and

(c)  other collections kept by individuals for teaching or reference purposes.

3.17 Against that background, we recommend that the HGC and Government should conclude that the primary means of regulating human genetic databases should continue to be the Data Protection Act 1998 and that, except as recommended in paragraph 7.58, no additional protection is required for personal genetic data.

12   Such fluids are referred to as "serum". Back

13   Human Genetic Databases: written evidence received up to 31 October 2000, Session 1999-2000, HL Paper 115. Back

14   The Department of Health's memorandum gave a list of the types of diseases (p 37). Back

15   As noted on page 6, this refers to a page in the earlier volume of evidence. Back

16   As noted on page 6, 'P' refers to a page in the further evidence in the second part of this volume - while 'p' refers to a page in the earlier volume of evidence. Back

17   Frank L. "Where an entire country is a cohort." Science 2000; 287: 2398-2399. Back

18   Nilsson A, Rose J. "Sweden takes steps to protect tissue banks", Science 1999; 286: 894; and Abbott A. "Sweden sets ethical standards for use of genetic biobanks." Nature 1999; 400:3. Back

19   Gulcher JR, Stefansson K. "The Icelandic healthcare database and informed consent", New Engl J Med 2000; 342: 1827-1830, and Annas GJ. "Rules for research on human genetic variation - lessons from Iceland", New Engl J Med 2000; 342: 1830-1833. Back

20   As noted on page 6, 'Q' refers to a question in the minutes of evidence in the second part of this volume. Back

21   Paragraph 5d of Schedule 2 covers processing "for the exercise of any other functions of a public nature exercised in the public interest by any person". Back

22   Paragraphs 8(1) & (2) of Schedule 3 cover processing "for medical purposes" defined as "the purposes of preventative medicine, medical diagnosis, medical research, the provision of care and treatment and the management of healthcare services". Back

previous page contents next page

House of Lords home page Parliament home page House of Commons home page search page enquiries index

© Parliamentary copyright 2001