APPENDIX 7
Introduction to genetics and glossary
INTRODUCTION TO
GENETICS
Characteristics of genetic information
1. Genetic information possesses two related but
separable characteristics that underlie many of the recurrent
themes of our inquiry:
- its central role in determining how our bodies
function, governed by the chemical processes that take place in
cells, and
- its ability to identify us as individuals and
as members of families, a consequence of how genetic information
is mixed during sexual reproduction.
Proteins and DNA
2. Of particular importance for biochemistry (which
describes how the many different processes that take place within
living creatures operate at a molecular level) are protein molecules.
In addition to being vital components of the body itself, proteins
control the function and regulation of the body's cells and tissues.
Most disease processes manifest themselves with changes in cell
proteins, and virtually all drugs are either proteins themselves
or act by binding to proteins.
3. Proteins are made up of one or more chains of
a class of chemicals called amino acids. These chains fold into
various three-dimensional shapes which, together with their chemical
composition, largely determine their function in the body. The
shape and chemical activity are determined by the arrangement
of the amino acid units. In turn, these are determined by genetic
information, stored in the body's cell nuclei[75]
in long molecules of deoxyribonucleic acid - DNA. Thus DNA, by
controlling the production of different proteins, governs the
operation of the body's cells and so determines many of the processes
that create good health or illness. It is for this reason that
the study of DNA generates such excitement in medical research,
and offers such hope for the future identification and treatment
of disease.
4. In sexual reproduction, the genetic information
of the two parents is mixed to form a unique version of DNA in
each of their offspring[76].
Much of what makes members of a sexually-reproducing species distinct
individuals, especially in terms of physical characteristics,
can be ascribed to each having slightly different versions of
DNA, and hence different combinations of proteins that go to make
up their bodies.
5. Using the uniqueness
of people's DNA to identify individuals is a well established
technique, particularly the DNA profiling used in forensic science.
However, DNA samples are already much more than a means of identifying
an individual. Increasingly, they can be used to indicate a range
of characteristics - including susceptibility to different diseases
and how drugs are metabolised. Since each person's DNA is some
combination of their parents' DNA, first degree relatives have
a large proportion of DNA sequence in common. Genetic information
relating to one person may therefore have implications for other
family members.
DNA: composition, structure and copying
6. DNA is a molecule made up of a long series of
chemical sub-units called nucleotides. There are only four different
nucleotides in DNA, identified by the initial letter of the chemical
bases they contain:
- A - adenine;
- C - cytosine;
- G - guanine; and
- T - thymine.
7. The DNA in each human cell nucleus is around
3 billion nucleotides long. Each of the bases is linked not only
with its neighbours in the long strands, but also across to another
base in a parallel strand, making DNA take on a ladder-like structure.
In the DNA molecule, this ladder is twisted into the famous "double
helix". The 3 billion "base pairs" do not form
a single continuous chain, but coil up into separate sections,
called chromosomes.
8. Humans have 23 pairs of chromosomes, one of each
pair from each parent. The 23rd pair is different from the others[77]
as it determines an individual's gender. An offspring always receives
an X chromosome from its mother, but may receive either an X or
a Y from its father. Individuals with XX in the 23rd chromosome
are female, while those with XY are male.
9. Although the bases within each strand can be
in any order, the cross-links between strands are limited:
- base A will cross-link only to base T; and
- base C will cross-link only to base G.
This means that the two strands in a molecule of
DNA are complementary: knowing the sequence of one enables the
other to be described.
10. This complementarity is vital in allowing the
DNA molecules to be copied, as is necessary every time a cell
divides to form new tissue. The links between the two strands
are hydrogen bonds - weak bonds, which are very sensitive to the
chemical conditions surrounding the molecule. When cells divide,
small changes to the cell chemistry cause the hydrogen bonds to
break, and the DNA molecule splits into its two component strands.
Each half of the DNA molecule then picks up more bases to reassemble
its complementary strand, thus making two complete versions of
the whole molecule. It is through this process that copies of
exactly the same DNA sequence, all 3 billion or so bases, are
found in each of the cells of the human body.
11. A similar process also works on a smaller scale
in the manufacture of proteins. The instructions to make a particular
protein are found in relatively short sections of the DNA sequence
- the sections that are called genes. Extracting and copying these
limited sections produces short strands of nucleotides which are
then transported to a part of the cell (the ribosomes) where amino
acids are produced and assembled into proteins. The ordering of
the bases in the gene section of the DNA "codes for"
(i.e. determines) the sequence of amino acids that are produced,
and hence the exact type of protein that results.
Genes and gene 'expression'
12. The complete sequence
of base pairs of any individual is called a genome. In humans,
this predominantly constitutes the 3 billion base pairs found
in the chromosomes of the cell nucleus.
While each person has a unique genome, the general features of
it are common to all humans.
13. Individual genes - the parts of the DNA sequence
that contain the information to make (or "code for")
proteins - are usually only a few thousand bases long. The most
recent work on the human genome suggests that human genome contains
between 30,000 to 40,000 genes. This is a substantial reduction
from earlier estimates of about 100,000 genes. The lower numbers
suggest more complicated interactions between genes than had previously
been anticipated[78]
14. Cells in different parts of the body, although
each containing the full complement of DNA, have different characteristics
and functions. This is because not all genes are active in all
cells. The process which determines which genes are switched on
("expressed") in which cells and under what circumstances
is complex. Many factors are thought to be involved, including
the parts of the DNA sequence which are not genes - the non-coding
DNA. It may therefore be highly misleading to refer to such non-coding
material as "junk DNA"[79].
15. The ratio of coding to non-coding DNA varies
widely between species. Some 97 per cent of human genetic material
is non-coding DNA. At the opposite end of the spectrum is the
puffer fish, whose DNA is almost all "genes". The significance
of this wide variation is not yet understood[80].
16. Many human genes are also found in other species.
For example, humans and mice have 95 per cent of their genes in
common, and many human disease genes can be found in yeast[81].
A particular sequence of bases in genes will code for the same
protein even in different species. Thus, a gene identified in
a mouse as producing a particular protein will, if found in humans,
also code for the same protein there.
17. What do not transfer between species, however,
are the subtle effects of how genes are regulated. Which particular
genes are active in any one cell affects the mix of proteins made
in that cell. The way that the different proteins interact determines
the cell's precise biological function.
Genetic variation or polymorphisms
18. The small variations between the sequences of
bases in the genomes of different people are called polymorphisms.
As illustrated in Box 9, they include:
- a change in a base pair at a particular point
along the genome, called a single nucleotide polymorphism (SNP);
- a difference in the number of bases between given
points along the genome, called a length polymorphism.
Box 9
Types of polymorphism
|
Single nucleotide polymorphism (SNP)
|
ATTCCTTGGTATC
| ATTCCTCGGTATC
|
TAAGGAACCATAG
| TAAGGAGCCATAG
|
[e.g. a T-C substitution, forcing an A-G change in the other strand]
|
Length polymorphism
short tandem repeats (STRs) or variable number tandem repeats (VNTRs)
|
GTATATATATATATATAC
| GTATATATAC
|
CATATATATATATATATG
| CATATATATG
|
[8 TA repeats] |
[4 TA repeats] |
19. Current best estimates are that variations in the genome between
individual men or women is no more than one in a thousand, or
0.1 per cent. There are at least 500,000 common sites of DNA variation
within the human genome. The variation between individuals at
each of these sites is small, and many may have no significant
effect. The various forms of DNA sequence found in a particular
region of the genome are called alleles.
Sources of genetic variation.
20. A small amount of the variation between human genomes is caused
by mutations to the DNA. These can arise either through very occasional
mistakes in the copying of DNA during cell division or through
damage to DNA (e.g. by harmful chemicals or ionising radiation).
21. However, the vast majority of the variation between humans
is determined by the recombination of different mixtures of DNA
from two parents that occurs in sexual reproduction. The probability
that this process will produce more than one human with precisely
the same sequence of 3 billion bases in their DNA is infinitesimal[82].
Effects of genetic variation.
22. Polymorphisms that occur in the coding regions of the genome
(genes) may create slightly different proteins within the body.
Some of these can create benign differences between individuals
- such as different eye colour and other aspects of appearance.
Other polymorphisms, however, can affect people's health. It also
seems likely that polymorphisms in the non-coding regions of the
genome may have effects as well, perhaps by changing how the genes
are expressed.
23. Some diseases can be caused by a polymorphism in just one
of the many thousand human genes. These single gene disorders
are currently incurable - and many, such as Huntington's disease,
have devastating effects.
24. In many other cases, polymorphisms can increase susceptibility
to a particular disease, without making it inevitable that the
disease will develop. Some forms of cancer and heart disease are
thought to be in this category. Disease susceptibility may be
increased with a single polymorphism and increased still further,
or perhaps offset, by polymorphisms elsewhere in the genome.
25. Susceptibility to particular diseases, like single gene disorders,
may also be inherited. Other factors, such as poor diet, lack
of exercise, or exposure to particular chemicals, may need to
be in place to trigger the actual disease. Avoiding those factors
may prevent the disease occurring - or at least defer its onset.
26. Thus both genetic and environmental factors are involved in
what individuals look like and how their bodies' biochemistry
functions, The observable characteristics of an individual which
arise from these factors are referred to as their "phenotype".
The much narrower "genotype" applies solely to the genetic
make-up of an organism, or a group of organisms with the same
genetic constitution.
27. Differences in DNA also affect how people respond to drugs.
Polymorphisms may thus change not only the susceptibility to a
disease but also the efficacy of a particular treatment.
28. The possibility of different coding and non-coding regions
contributing to or offsetting susceptibility, and the huge number
of environmental and lifestyle factors that could interact with
genetic ones, combine to make the interpretation of exactly how
genetic variation influences many common diseases a daunting task.
However, this is clearly an important direction for medical research.
Sequencing and interpreting human genomes
29. The starting point for scientific investigations of genetic
information is to identify particular regions of interest along
the length of the genome. The initial studies concentrated on
individual genes, since these were clearly involved in body function,
but the interest in how different genes interact, and how non-coding
regions affect the working of genes, has led to more complex analysis
of larger regions of the genome.
30. To provide a proper grounding for all such investigations,
the Human Genome Project was started. Its aim has been to create
a generalised map of the genome as an underpinning of all research
into human genetics. This has involved determining the sequence
of all the 3 billion bases in the human genome, and identifying
the number and location of genes.
31. As our Inquiry was drawing to a close, draft versions of the
human genome were published[83].
Further work is in hand to finalise the map as a complete and
authoritative reference. At present, the map is not complete,
and contains only small information about variations - different
alleles (alternative forms) of particular genes are recorded as
annotations to the genome. Many individual genomes (genotypes)
will need to be sequenced before it is possible to tell what all
the variations and commonalities are.
GENETICS GLOSSARY
Adenine (A): A nitrogenous base, one member of the base
pair AT (adenine and thymine).
Allele: An alternative form of a gene or genetic marker
present on one or other of a pair of chromosomes.
Amino acid: The chemical building blocks that make up proteins.
There are 20 different amino acids, and their sequence in a protein
is determined by the relevant genetic code.
Annotation: Labelling the DNA sequence with information
about location, variations, species similarities, protein product
and structure etc.
Autosome: A chromosome that is not a sex chromosome. Humans
have 22 autosomes, numbered 1 to 22 based on their size.
Base: In genetics, 'base' denotes the nitrogen-containing
(nitrogenous) chemical compounds that make up DNA - adenine, cytosine,
guanine and thymine. In chemistry generally, bases include a much
larger group of chemicals (including the four found in DNA) which
share the common property of bonding to hydrogen ions when in
solution.
Base pair: A pair of complementary nitrogenous bases (adenine
and thymine AT or guanine and cytosine GC) held together by hydrogen
bonds.
Cell: The basic structural unit of all living organisms.
While some organisms are made up of only one or a few cells, humans
are made up of billions of cells, each containing billions of
DNA base pairs.
Chromosome: Structures in the cell nucleus which are made
up of DNA and proteins containing the genetic information in a
linear fashion. Human cells have 23 pairs of chromosomes, one
of each pair inherited from each parent.
Coding region: That part of DNA, located in the genes,
that determines the structure of a protein, or produces an intermediate
product involved in cell function.
Cytosine (C): A nitrogenous base, one member of the base
pair GC (guanine and cytosine).
DNA (deoxyribonucleic acid): The molecule which contains
genetic information and makes up our genes. The DNA molecule consists
of two complementary nucleotide chains containing the bases adenine
(A), thymine (T), guanine (G) and cytosine (C), held in a double
stranded helix by bonds between base pairs - A linked with T or
G linked with C.
DNA sequence: The order of base pairs in a DNA molecule.
Enzyme: Proteins produced by cells to speed up specific
biochemical reactions.
Expression: A gene is said to express when it is active
and responsible for the production of a protein, or products which
alter cell function.
Gene: The basic unit of heredity. Genes are ordered sequences
of DNA base pairs, located in specific positions on chromosomes.
Genes contain the information for producing proteins.
Gene therapy: Correction of a genetic defect by gene manipulation
or by inserting a functioning gene into the cells of an individual.
Genetic code: The information contained in the DNA sequence
that determines the amino acid sequence in protein synthesis.
The genetic code is read in triplets of bases called codons.
Genetic marker: A unique physical location on a chromosome,
which may be experimentally identified and its inheritance pattern
monitored. Genetic markers may be genes or DNA segments with no
known function.
Genome: The complete genetic material of an organism. The
human genome contains 3 billion base pairs of DNA organised into
23 chromosomes, with a small amount of DNA in the mitochondria.
Genomics: Concerned with analysis of the genome.
Genotype: The set of alleles that an individual possesses.
Guanine (G): A nitrogenous base, one member of the base
pair GC (guanine and cytosine).
Junk DNA: A misleading term for about 97% of human DNA
which does not yet have any known function.
Locus (loci in the plural): The location of a gene or DNA
segment on a chromosome.
Mitochondria: small energy producing structures outside
the nucleus of every cell which contain small amounts of DNA with
several genes, mainly coding for mitochondrial proteins. Mitochondria
come from the egg and are thus inherited from the mother alone.
More than 50 inherited metabolic diseases are known to be caused
by defects in mitochondrial DNA.
Non-coding region: That part of DNA which does not produce
any known product or determine a protein.
Nucleotide: A subunit of DNA. Each nucleotide consists
of a nitrogenous base (adenine, thymine, guanine or cytosine),
a sugar molecule (deoxyribose), and a phosphate group.
Nucleus (nuclei in the plural): The structure in the cell
which controls much of its function. It contains all the genetic
material except the tiny amount in the mitochondria.
Pharmacogenetics: The study of genetic variation applied
to inter-individual variability in drug response.
Pharmacogenomics: The study of differential gene expression
applied to drug discovery and optimisation.
Phenotype: The observable physical and biochemical characteristics
of an organism, determined by an interaction between genotype
and environment.
Polymorphism: A segment of DNA that has more than one form
(allele), each of which occurs at a frequency of at least 1%.
Polymorphisms are a natural part of genetic variation. A polymorphism
in a gene may or may not affect its function.
Protein: A molecule made up of amino acids linked together
in a specific order determined by genetic information. Proteins
are required for the structure, function and regulation of the
body's cells and tissues.
Sex chromosomes: The chromosomes that determine the sex
of an individual - designated X and Y chromosomes in humans. Females
have two X chromosomes and males have one X and one Y chromosome.
SNP (single nucleotide polymorphism): A single base pair
variation at a particular genetic locus. SNPs are abundant in
the genome and form part of the natural genetic variation. Can
be detected using microarray technology and useful for genetic
mapping and association studies.
STR (short tandem repeats): A short tandemly repeated DNA
sequence, usually 2-6 nucleotides in length. STRs are found abundantly
throughout the genome, and the number of repeats at a locus may
vary between individuals. Useful in forensic applications.
Thymine (T): A nitrogenous base, one member of the base
pair AT (adenine and thymine).
75 Plus a small but important amount stored outside
the nucleus, in tiny bodies called mitochondria -see the glossary
at the end of this Appendix). Back
76
Except in the case of identical twins, where two genetically identical
individuals develop from the one fertilised egg. Back
77
Pairs 1-22 are numbered according to size, and are sometimes referred
to as "autosomes". Back
78
Professor Bell: QQ 334 & 337. Back
79
Professor Bell: Q 334. Back
80
Professor Bell: Q 335. Back
81
Sir George Radda: Q 65. Back
82
Except, of course, in the special case of identical twins. Back
83
International Human Genome Sequencing Consortium. "Initial
sequencing and analysis of the human genome." Nature
2001; 409: 860-921 and J Craig Venter et al. "The
sequence of the human genome." Science, 2001; 291,
1304-1351. Back
|