Select Committee on Science and Technology Fourth Report


Introduction to genetics and glossary


Characteristics of genetic information

1. Genetic information possesses two related but separable characteristics that underlie many of the recurrent themes of our inquiry:

    • its central role in determining how our bodies function, governed by the chemical processes that take place in cells, and
    • its ability to identify us as individuals and as members of families, a consequence of how genetic information is mixed during sexual reproduction.

Proteins and DNA

2. Of particular importance for biochemistry (which describes how the many different processes that take place within living creatures operate at a molecular level) are protein molecules. In addition to being vital components of the body itself, proteins control the function and regulation of the body's cells and tissues. Most disease processes manifest themselves with changes in cell proteins, and virtually all drugs are either proteins themselves or act by binding to proteins.

3. Proteins are made up of one or more chains of a class of chemicals called amino acids. These chains fold into various three-dimensional shapes which, together with their chemical composition, largely determine their function in the body. The shape and chemical activity are determined by the arrangement of the amino acid units. In turn, these are determined by genetic information, stored in the body's cell nuclei[75] in long molecules of deoxyribonucleic acid - DNA. Thus DNA, by controlling the production of different proteins, governs the operation of the body's cells and so determines many of the processes that create good health or illness. It is for this reason that the study of DNA generates such excitement in medical research, and offers such hope for the future identification and treatment of disease.

4. In sexual reproduction, the genetic information of the two parents is mixed to form a unique version of DNA in each of their offspring[76]. Much of what makes members of a sexually-reproducing species distinct individuals, especially in terms of physical characteristics, can be ascribed to each having slightly different versions of DNA, and hence different combinations of proteins that go to make up their bodies.

5. Using the uniqueness of people's DNA to identify individuals is a well established technique, particularly the DNA profiling used in forensic science. However, DNA samples are already much more than a means of identifying an individual. Increasingly, they can be used to indicate a range of characteristics - including susceptibility to different diseases and how drugs are metabolised. Since each person's DNA is some combination of their parents' DNA, first degree relatives have a large proportion of DNA sequence in common. Genetic information relating to one person may therefore have implications for other family members.

DNA: composition, structure and copying

6. DNA is a molecule made up of a long series of chemical sub-units called nucleotides. There are only four different nucleotides in DNA, identified by the initial letter of the chemical bases they contain:

    • A - adenine;
    • C - cytosine;
    • G - guanine; and
    • T - thymine.

7. The DNA in each human cell nucleus is around 3 billion nucleotides long. Each of the bases is linked not only with its neighbours in the long strands, but also across to another base in a parallel strand, making DNA take on a ladder-like structure. In the DNA molecule, this ladder is twisted into the famous "double helix". The 3 billion "base pairs" do not form a single continuous chain, but coil up into separate sections, called chromosomes.

8. Humans have 23 pairs of chromosomes, one of each pair from each parent. The 23rd pair is different from the others[77] as it determines an individual's gender. An offspring always receives an X chromosome from its mother, but may receive either an X or a Y from its father. Individuals with XX in the 23rd chromosome are female, while those with XY are male.

9. Although the bases within each strand can be in any order, the cross-links between strands are limited:

  • base A will cross-link only to base T; and
  • base C will cross-link only to base G.

This means that the two strands in a molecule of DNA are complementary: knowing the sequence of one enables the other to be described.

10. This complementarity is vital in allowing the DNA molecules to be copied, as is necessary every time a cell divides to form new tissue. The links between the two strands are hydrogen bonds - weak bonds, which are very sensitive to the chemical conditions surrounding the molecule. When cells divide, small changes to the cell chemistry cause the hydrogen bonds to break, and the DNA molecule splits into its two component strands. Each half of the DNA molecule then picks up more bases to reassemble its complementary strand, thus making two complete versions of the whole molecule. It is through this process that copies of exactly the same DNA sequence, all 3 billion or so bases, are found in each of the cells of the human body. 11. A similar process also works on a smaller scale in the manufacture of proteins. The instructions to make a particular protein are found in relatively short sections of the DNA sequence - the sections that are called genes. Extracting and copying these limited sections produces short strands of nucleotides which are then transported to a part of the cell (the ribosomes) where amino acids are produced and assembled into proteins. The ordering of the bases in the gene section of the DNA "codes for" (i.e. determines) the sequence of amino acids that are produced, and hence the exact type of protein that results.

Genes and gene 'expression'

12. The complete sequence of base pairs of any individual is called a genome. In humans, this predominantly constitutes the 3 billion base pairs found in the chromosomes of the cell nucleus. While each person has a unique genome, the general features of it are common to all humans.

13. Individual genes - the parts of the DNA sequence that contain the information to make (or "code for") proteins - are usually only a few thousand bases long. The most recent work on the human genome suggests that human genome contains between 30,000 to 40,000 genes. This is a substantial reduction from earlier estimates of about 100,000 genes. The lower numbers suggest more complicated interactions between genes than had previously been anticipated[78]

14. Cells in different parts of the body, although each containing the full complement of DNA, have different characteristics and functions. This is because not all genes are active in all cells. The process which determines which genes are switched on ("expressed") in which cells and under what circumstances is complex. Many factors are thought to be involved, including the parts of the DNA sequence which are not genes - the non-coding DNA. It may therefore be highly misleading to refer to such non-coding material as "junk DNA"[79].

15. The ratio of coding to non-coding DNA varies widely between species. Some 97 per cent of human genetic material is non-coding DNA. At the opposite end of the spectrum is the puffer fish, whose DNA is almost all "genes". The significance of this wide variation is not yet understood[80].

16. Many human genes are also found in other species. For example, humans and mice have 95 per cent of their genes in common, and many human disease genes can be found in yeast[81]. A particular sequence of bases in genes will code for the same protein even in different species. Thus, a gene identified in a mouse as producing a particular protein will, if found in humans, also code for the same protein there.

17. What do not transfer between species, however, are the subtle effects of how genes are regulated. Which particular genes are active in any one cell affects the mix of proteins made in that cell. The way that the different proteins interact determines the cell's precise biological function.

Genetic variation or polymorphisms

18. The small variations between the sequences of bases in the genomes of different people are called polymorphisms. As illustrated in Box 9, they include:

    • a change in a base pair at a particular point along the genome, called a single nucleotide polymorphism (SNP);
    • a difference in the number of bases between given points along the genome, called a length polymorphism.

Box 9

Types of polymorphism
Single nucleotide polymorphism (SNP)
[e.g. a T-C substitution, forcing an A-G change in the other strand]
Length polymorphism

short tandem repeats (STRs) or variable number tandem repeats (VNTRs)
[8 TA repeats]
[4 TA repeats]

19. Current best estimates are that variations in the genome between individual men or women is no more than one in a thousand, or 0.1 per cent. There are at least 500,000 common sites of DNA variation within the human genome. The variation between individuals at each of these sites is small, and many may have no significant effect. The various forms of DNA sequence found in a particular region of the genome are called alleles.

Sources of genetic variation.

20. A small amount of the variation between human genomes is caused by mutations to the DNA. These can arise either through very occasional mistakes in the copying of DNA during cell division or through damage to DNA (e.g. by harmful chemicals or ionising radiation).

21. However, the vast majority of the variation between humans is determined by the recombination of different mixtures of DNA from two parents that occurs in sexual reproduction. The probability that this process will produce more than one human with precisely the same sequence of 3 billion bases in their DNA is infinitesimal[82].

Effects of genetic variation.

22. Polymorphisms that occur in the coding regions of the genome (genes) may create slightly different proteins within the body. Some of these can create benign differences between individuals - such as different eye colour and other aspects of appearance. Other polymorphisms, however, can affect people's health. It also seems likely that polymorphisms in the non-coding regions of the genome may have effects as well, perhaps by changing how the genes are expressed.

23. Some diseases can be caused by a polymorphism in just one of the many thousand human genes. These single gene disorders are currently incurable - and many, such as Huntington's disease, have devastating effects.

24. In many other cases, polymorphisms can increase susceptibility to a particular disease, without making it inevitable that the disease will develop. Some forms of cancer and heart disease are thought to be in this category. Disease susceptibility may be increased with a single polymorphism and increased still further, or perhaps offset, by polymorphisms elsewhere in the genome.

25. Susceptibility to particular diseases, like single gene disorders, may also be inherited. Other factors, such as poor diet, lack of exercise, or exposure to particular chemicals, may need to be in place to trigger the actual disease. Avoiding those factors may prevent the disease occurring - or at least defer its onset.

26. Thus both genetic and environmental factors are involved in what individuals look like and how their bodies' biochemistry functions, The observable characteristics of an individual which arise from these factors are referred to as their "phenotype". The much narrower "genotype" applies solely to the genetic make-up of an organism, or a group of organisms with the same genetic constitution.

27. Differences in DNA also affect how people respond to drugs. Polymorphisms may thus change not only the susceptibility to a disease but also the efficacy of a particular treatment.

28. The possibility of different coding and non-coding regions contributing to or offsetting susceptibility, and the huge number of environmental and lifestyle factors that could interact with genetic ones, combine to make the interpretation of exactly how genetic variation influences many common diseases a daunting task. However, this is clearly an important direction for medical research.

Sequencing and interpreting human genomes

29. The starting point for scientific investigations of genetic information is to identify particular regions of interest along the length of the genome. The initial studies concentrated on individual genes, since these were clearly involved in body function, but the interest in how different genes interact, and how non-coding regions affect the working of genes, has led to more complex analysis of larger regions of the genome.

30. To provide a proper grounding for all such investigations, the Human Genome Project was started. Its aim has been to create a generalised map of the genome as an underpinning of all research into human genetics. This has involved determining the sequence of all the 3 billion bases in the human genome, and identifying the number and location of genes.

31. As our Inquiry was drawing to a close, draft versions of the human genome were published[83]. Further work is in hand to finalise the map as a complete and authoritative reference. At present, the map is not complete, and contains only small information about variations - different alleles (alternative forms) of particular genes are recorded as annotations to the genome. Many individual genomes (genotypes) will need to be sequenced before it is possible to tell what all the variations and commonalities are.


Adenine (A): A nitrogenous base, one member of the base pair AT (adenine and thymine).

Allele: An alternative form of a gene or genetic marker present on one or other of a pair of chromosomes.

Amino acid: The chemical building blocks that make up proteins. There are 20 different amino acids, and their sequence in a protein is determined by the relevant genetic code.

Annotation: Labelling the DNA sequence with information about location, variations, species similarities, protein product and structure etc.

Autosome: A chromosome that is not a sex chromosome. Humans have 22 autosomes, numbered 1 to 22 based on their size.

Base: In genetics, 'base' denotes the nitrogen-containing (nitrogenous) chemical compounds that make up DNA - adenine, cytosine, guanine and thymine. In chemistry generally, bases include a much larger group of chemicals (including the four found in DNA) which share the common property of bonding to hydrogen ions when in solution.

Base pair: A pair of complementary nitrogenous bases (adenine and thymine AT or guanine and cytosine GC) held together by hydrogen bonds.

Cell: The basic structural unit of all living organisms. While some organisms are made up of only one or a few cells, humans are made up of billions of cells, each containing billions of DNA base pairs.

Chromosome: Structures in the cell nucleus which are made up of DNA and proteins containing the genetic information in a linear fashion. Human cells have 23 pairs of chromosomes, one of each pair inherited from each parent.

Coding region: That part of DNA, located in the genes, that determines the structure of a protein, or produces an intermediate product involved in cell function.

Cytosine (C): A nitrogenous base, one member of the base pair GC (guanine and cytosine).

DNA (deoxyribonucleic acid): The molecule which contains genetic information and makes up our genes. The DNA molecule consists of two complementary nucleotide chains containing the bases adenine (A), thymine (T), guanine (G) and cytosine (C), held in a double stranded helix by bonds between base pairs - A linked with T or G linked with C.

DNA sequence: The order of base pairs in a DNA molecule.

Enzyme: Proteins produced by cells to speed up specific biochemical reactions.

Expression: A gene is said to express when it is active and responsible for the production of a protein, or products which alter cell function.

Gene: The basic unit of heredity. Genes are ordered sequences of DNA base pairs, located in specific positions on chromosomes. Genes contain the information for producing proteins.

Gene therapy: Correction of a genetic defect by gene manipulation or by inserting a functioning gene into the cells of an individual.

Genetic code: The information contained in the DNA sequence that determines the amino acid sequence in protein synthesis. The genetic code is read in triplets of bases called codons.

Genetic marker: A unique physical location on a chromosome, which may be experimentally identified and its inheritance pattern monitored. Genetic markers may be genes or DNA segments with no known function.

Genome: The complete genetic material of an organism. The human genome contains 3 billion base pairs of DNA organised into 23 chromosomes, with a small amount of DNA in the mitochondria.

Genomics: Concerned with analysis of the genome.

Genotype: The set of alleles that an individual possesses.

Guanine (G): A nitrogenous base, one member of the base pair GC (guanine and cytosine).

Junk DNA: A misleading term for about 97% of human DNA which does not yet have any known function.

Locus (loci in the plural): The location of a gene or DNA segment on a chromosome.

Mitochondria: small energy producing structures outside the nucleus of every cell which contain small amounts of DNA with several genes, mainly coding for mitochondrial proteins. Mitochondria come from the egg and are thus inherited from the mother alone. More than 50 inherited metabolic diseases are known to be caused by defects in mitochondrial DNA.

Non-coding region: That part of DNA which does not produce any known product or determine a protein.

Nucleotide: A subunit of DNA. Each nucleotide consists of a nitrogenous base (adenine, thymine, guanine or cytosine), a sugar molecule (deoxyribose), and a phosphate group.

Nucleus (nuclei in the plural): The structure in the cell which controls much of its function. It contains all the genetic material except the tiny amount in the mitochondria.

Pharmacogenetics: The study of genetic variation applied to inter-individual variability in drug response.

Pharmacogenomics: The study of differential gene expression applied to drug discovery and optimisation.

Phenotype: The observable physical and biochemical characteristics of an organism, determined by an interaction between genotype and environment.

Polymorphism: A segment of DNA that has more than one form (allele), each of which occurs at a frequency of at least 1%. Polymorphisms are a natural part of genetic variation. A polymorphism in a gene may or may not affect its function.

Protein: A molecule made up of amino acids linked together in a specific order determined by genetic information. Proteins are required for the structure, function and regulation of the body's cells and tissues.

Sex chromosomes: The chromosomes that determine the sex of an individual - designated X and Y chromosomes in humans. Females have two X chromosomes and males have one X and one Y chromosome.

SNP (single nucleotide polymorphism): A single base pair variation at a particular genetic locus. SNPs are abundant in the genome and form part of the natural genetic variation. Can be detected using microarray technology and useful for genetic mapping and association studies.

STR (short tandem repeats): A short tandemly repeated DNA sequence, usually 2-6 nucleotides in length. STRs are found abundantly throughout the genome, and the number of repeats at a locus may vary between individuals. Useful in forensic applications.

Thymine (T): A nitrogenous base, one member of the base pair AT (adenine and thymine).

75   Plus a small but important amount stored outside the nucleus, in tiny bodies called mitochondria -see the glossary at the end of this Appendix). Back

76   Except in the case of identical twins, where two genetically identical individuals develop from the one fertilised egg. Back

77   Pairs 1-22 are numbered according to size, and are sometimes referred to as "autosomes". Back

78   Professor Bell: QQ 334 & 337. Back

79   Professor Bell: Q 334. Back

80   Professor Bell: Q 335. Back

81   Sir George Radda: Q 65. Back

82   Except, of course, in the special case of identical twins. Back

83   International Human Genome Sequencing Consortium. "Initial sequencing and analysis of the human genome." Nature 2001; 409: 860-921 and J Craig Venter et al. "The sequence of the human genome." Science, 2001; 291, 1304-1351. Back

previous page contents

House of Lords home page Parliament home page House of Commons home page search page enquiries index

© Parliamentary copyright 2001