Annex A
Notes on Statistical Aspects of the Badger
Culling Trial
1. PRELIMINARIES
The randomised trial, which is but one of the
responsibilities of the ISG (chaired by Professor John Bourne),
was recommended by the Krebs Report (1997) to resolve the longstanding
issue of the role of badgers in bovine TB. The power calculations,
which are the usual sort of approximate basis for determining
the scale of effort appropriate, were done at the time of the
Krebs report and led to a randomised block design of 10 triplets
comparing three roughly circular areas randomised between three
treatments, surveyonly, reactive and proactive culling. It was
estimated that about five years of observation would be needed
to get the required sensitivity. At the moment work on eight triplets
is in progress and it is anticipated that by 2001 all 10 will
be in commission.
Among the other aspects of ISG work is a casecontrol
study of husbandry methods embedded within the trial. George Gettinby
has played a major role in this. It is not discussed here.
The trial is contentious largely, although not
entirely, because the animal rights fraternity seem convinced
that badgers are irrelevant to bovine TB and oppose any culling
and because, by and large, farmers are convinced of the role of
badgers and are thus uneasy at a long wait for an answer and recommendations.
2. APPROPRIATE
SCALE OF
EFFORT
The role of power calculations is to ensure
that the scale of effort is neither so limited as to be incapable
of leading to useful answers nor exorbitant. While mathematically
a precise answer is obtained once a specification of objectives
is given quantitatively, in realistic terms this specification
is quite arbitrary and the recommended scale of effort is no more
than a rough guide, although a very valuable one. The Krebs calculations
were based on the assumption that the residual error, after eliminating
intertriplet variation and regression on baseline covariates,
will be Poisson. This is likely to be optimistic, although probably
not by much. The projected timespan is based on a cautious assessment
of future breakdown rates and thus probably pessimistic.
There has been some concern about the power
calculations in particular by the Agricultural Select Committee,
and these have been expressed in the national press. Much, if
not all, of the discussion seems to be based on a total misunderstanding
of the role of the power calculations. The precision achieved
in the trial will be determined by the data obtained, totally
independently of the correctness or otherwise of the power calculations.
Nor is the schematic analysis on which the power calculation is
based at all like the very careful and detailed analysis to be
made of the real data.
There are a number of considerations that bear
on the scale of effort, the number of triplets and the time extent
of the trial.
First, to some extent, the number of triplets
and the years of observation are interchangeable in that to a
crude approximation precision will be determined by the total
numbers of breakdowns observed in the three treatment arms of
the trial. But this is only approximately true. It is a well established
principle (Yates and Cochran, 1938), and indeed just common sense,
that very high precision in just one site would be a very insecure
basis for a broad practical recommendation or a sound scientific
conclusion. Range of validity demands replication across sites
(triplets). In an extreme case of heterogeneous response patterns
across triplets (triplet x treatment by interaction) the most
cautious analysis would be purely randomisationbased. Reduction
of the number of triplets say to eight would reduce the randomisation
set in a particular comparison to such a level that power would
be drastically reduced.
A different although somewhat related point
is that it might become necessary to estimate the error of a particular
contrast, say surveyonly versus reactive, from the interaction
with triplets and this would leave degrees of freedom of error
of one fewer than the number of triplets, or perhaps twice that.
Ten triplets is from this viewpoint somewhat minimal.
It might be tempting to suggest more triplets
and a shorter time span. However, more triplets would provide
diminishing returns (Krebs et al 1997) and the benefits would
be questionable. The trial is also massively demanding logistically
in terms of its pressures of field workers, and in terms of cost.
There are also welfare considerations in terms of the number of
badgers sacrificed. There are sound hopes that work in 10 triplets
will be operational by 2001 but to go for more, with the additional
concern of compromising the quality of data collected, seems totally
out of the question.
From several points of view an alternative and
preferable view of calculations of the scale of effort is not
in terms of statistical significance but in terms of precision
of estimation. This is in line with a general preference for estimation
over significance testing. In particular, some assessment of the
size of a reduction of breakdown rates by culling, should there
be one, will be essential for a rational policy recommendation.
See, for example, Cox and Reid (2000, section 8.1 and p 222).
With the same approximations used by Krebs the present trial design
leads to a fractional standard error of 7 per cent in the comparison
of two rates.
3. PRIMARY ANALYSIS
The trial will be a rich source of data in particular
to badger ecologists. These notes concentrate on the methods to
be used in the primary comparisons of the three treatments. Such
a primary analysis will consist of a regression of log number
of breakdowns per trial area into the form:
with adjustment for regression on baseline variables
such as log geographic area, log number of holdings, log number
of herds. A supplementary analysis might include an initial measure
of badger activity at survey although it would be important not
to interpret any associated effect casually.
Following conventional wisdom, if an appreciable
interaction term arises, a rational explanation would be found
if possible. Otherwise the interaction would be treated as an
extra source of random variability, ie of overdispersion relative
to the Poisson distribution.
The analysis can be done either by maximum likelihood
as a generalised linear model or by empirically weighted least
squares, ie if N is a count by assigning log(N) a variance of
1/N in a standard regression calculation. The two are identical
to the first order of asymptotic theory. The latter may be more
flexible if extended versions of the model are needed.
There are many aspects that this does not address
which will need attention later. For example, as is appropriate
for primary comparisons, the above treats the trial area as a
unit of study. Yet there is some information within a trial area
arising from examining those holdings which do and do not have
breakdowns. Also there is the issue of the yeartoyear variation
within a trial area. Is there evidence of an increasing or decreasing
effectiveness of any effects found?
4. INTERIM ANALYSIS
It was agreed at an early stage that an interim
analysis would be done after about 100 breakdowns had accumulated
in trial areas and this point has just recently been reached.
MAFF have a general policy, which we fully support,
of making data public but we have argued very strongly that this
cannot apply to the detailed breakdown data or the badger TB prevalence
data from the trial without potentially catastrophic effects on
the whole enterprise. Thus it is likely that at some point suggestive,
potentially important but in fact wholly indecisive effects will
appear. However strong the "health warning" that might
be put on such data, the potential for destroying cooperation
in the trial, which is of course voluntary, seems clear. We believe
this point accepted, albeit reluctantly in some quarters.
While there is every prospect that the trial
will need to run for several years and probably for the initially
projected period there is at least some possibility that clear
conclusions about some parts of the trial will emerge earlier.
There is a large but somewhat controversial
statistical literature on early stopping of trials but this largely
centres on significance testing of effects which are likely to
be stable in time. They involve setting rather rigid rules about
when and how many interim analyses are allowed and when a trial
should stop early, although no doubt they are rarely applied in
so mechanical a way. We do not think these approaches helpful
here.
The reasons are:
— the possibility that effects are
not constant in time and that there is appreciable intertriplet
variation in effect means that conclusions from a short time period
and a small number of triplets, even if in some sense nominally
significant would not be a secure basis for a conclusion;
— if the objective is regarded, as
we believe it should, as primarily that of estimating the magnitude
of relative reductions via confidence limits, then the need for
various detailed specifications (spending error rate and all that)
disappears;
"continue the investigation, do interim
analyses from time to time and stop when and only when the required
precision is achieved"
is entirely appropriate (Anscombe, 1953). Note
that this is legitimate from any of the main approaches to statistical
interference.
At some point ISG will have to discuss what
level of precision is suitable.
5. SOME FURTHER
POINTS
The trial should be regarded as having several
objectives.
One is to provide a firm scientificallybased
answer to the question of the role, if any, of badgers in bovine
TB. While this may sound at first like testing a null hypothesis
of no proactive effect, in fact it hardly makes sense other than
one of estimation. If there is a proactive effect, it will be
necessary to know whether it is 20 per cent, 50 per cent or 80
per cent reduction or whatever.
Assuming some effect of culling is found, the
second objective is to provide a basis for a culling policy. The
ISG has based its approach on sustainability so all or any of
the three approaches could be included in future policy. Any such
strategy as:
"if and only if the projected breakdown
rate in an area (a county perhaps) exceeds some threshold rT,
institute (or allow or encourage) reactive culling over a distance
d from the affected farm"
would require for further analysis a reasonably
precise estimate of the reduction in breakdown rate to be anticipated.
This could be fed into an economic analysis to determine suitable
values of rT and d.
REFERENCES
Anscombe FJ (1953). Sequential estimation (with
discussion), JR Statist Soc B 15, 129.
Cox DR and Reid N (2000), The theory of the
design of experiments. Boca Raton and London: Chapman &
Hall/CRC Press.
Krebs J, Anderson R, CluttonBrock T, Morrison
I, Young D, Donnelly C, Frost S and Woodroffe R (1997), Badger
tuberculosis in cattle and badger. Ministry of Agriculture, Fisheries
and Food.
Yates F and Cochran WG (1938). The design and
analysis of series of replicated field trials. J Agric Sci
28, 556580.
