Memorandum submitted by Professor David
Marsland, Brunel University (SM 6)
RESEARCH METHOD
1.1 In the current important phase of its work,
the Committee seems to be concerning itself with some awkwardly
academic and technical issues in research methodology.
1.2 I have enclosed a copy of a paper of mine
("Methodological Inadequacies in British Social Science")[3]
in case it may be helpful to Committee members in their later
deliberations.
1.3 I would in particular draw attention to
the emphasis I give in the paper to:
(a) Data qualitywhose main
dimensions are validity, reliability, and sensitivity.
Much of the data which social scientists use is of scandalously
poor quality. Hence misleading descriptions, false explanations,
and mistaken evaluations of the social world.
(b) The unavoidability of interpretation.
However good one's data, and however good one's research design
and data analysis, if the interpretative phase of reading off
a story for an audience is short on disciplined objectivity,
mistaken conclusions will inevitably result. Sloppy interpretation,
like poor data quality, produces phoney description, explanation,
and evaluation.
METHODS OF
EVALUATION
2.1 More specifically, the Committee's considerations
are about methods of evaluation. Evaluation research is
just one species of researchpractical research designed
to investigate the effects and effectiveness of systems, policies,
procedures, etcetera, and to improve them.
2.2 I have enclosed a copy of part of a book
which I am just now finishing ("Research Methods for Health
Care Professionals"),1 in case it may be useful to the Committee
or its officers.
2.3 Here I would particularly draw your attention
to:
(a) The necessityif policy is to be
amended rationallyof striking a sensible balance between
high quality research (conceptualisation, design, data
collection, analysis, and interpretation) on the one hand and
realistic practicality on the other.
(b) Identification of appropriate and
well-measured criteria of effectivenessand sticking to
them.
(c) The grave danger of ideological biaswith
evaluation routinely prone to mis-use for either critical or exculpatory
purposes. Evaluation research is not intended either to prove
that "we are right" or to demonstrate that "they
are wrong", but to test effects and effectiveness as objectively
as possible.
(d) The importanceeven given the difficulty
of elucidating cause and effectof focusing on effects
rather than general conditions. Without this sharp focus, little
can be done to ensure that policy innovation and policy development
are handled coherently.
(e) The value in this context of an experimental
approach to policy development. Innovations should be introduced
gradually and in variant forms to allow for comparative investigation
of their effects.
WELFARE REFORM
SUCCESS MEASURES
Turning now to the Green Paper and the proposed
success measures, I have a number of, mainly critical, comments
to make. I should preface this by saying that I am pleased to
see this work going on. It is an important part of the difficult
task of rationalising social policy which has been going forward
gradually for many years.
3.1 Evaluation of complex social processes and
products is challenging and difficult. It needs time, thought,
and expertise. My reading of the documentation suggests to me
thatas happened earlier with evaluation of educational
policy and even more with health carethere is a danger
that it is being too much rushed at, with too much of mere
consultation with interested parties, and too little in-depth,
reflective thought and use of research expertise. The outcomes
of welfare policy are considerably more subtle, complex, and contentious
than those of educational or health policy, so even more care
is needed in assessing them.
3.2 It might be better to commission from
a number of competing research agencies a coherent, fully-argued
model of welfare evaluationfrom basic concepts to the
nuts and bolts of measurement. Better still, commission two
distinct agencies and test out which model works best.
3.3 The documentation is less than clear about
what is to be evaluated. If the object of evaluation is
welfare reform, then the criteria should all be precise comparative
measures vis a vis the un-reformed system. If, on the other
hand, the aim is to evaluate the reformed system, criteria of
success must be stipulated in terms of the purposes of the reforms.
3.4 I take it that the eight "principles"
itemised in the annex on success measures are the purposes and
objectives of reform. They seem to me, as an evaluator, too vague
and arbitrary to be evaluated properly. They need to be thought
through, argued, and sharpened up. This should not
be done by specifying the measures to be usedthis mistaken
procedure, often found in poor evaluations, makes for a futile,
vicious, circular argument. First the purposes of reform need
to be clarified, argued, and fully defined in detailed, concrete,
practical terms. Then, and only then, one can go on to selecting
and developing indicators. The first stage requires a fully-argued
expert working-paper. A brief ad hoc note is wholly inadequate.
THE PRINCIPLES
Detailed observations on the eight principles
would take more time than I have had available, but I have some
brief comments to make. In general, the problem with them is not,
as some have argued, that they are apple-pie obvious, but that
they are mostly vague and in some cases vacuous. There are also
some key omissionsfor example to maximise self-reliance,
to strengthen real families, to ensure reducing expenditure on
welfare, to avoid helping those whose behaviour is unacceptable
by comparison with the deserving, and to encourage people to think
more in terms of duties than rights.
4.1 As formulated, it does not specify what
the Welfare State as such should be doing about employment, or
therefore make any explicit test of the effects of welfare on
employment feasible. As a matter of principle, the Welfare State
ought to be doing nothing to discourage workthis
would be a real change!
4.2 It is no part of the private sector's proper
function to ensure this happens. In any case what does "partnership"
mean here?
4.3 Why the whole community? Who says we all
need either services or cash supplied by the Welfare State? Why
"high quality"why not "excellent quality",
or by contrast "adequate quality"?
4.4 Who is to judge "dignity"? How
widely and loosely is the term "disabled" being used?
4.5 Very awkwardly formulated. The phrase "scourge
of child poverty" sounds Old Labour, if not older still.
4.6 Both "social exclusion" and "poverty"
are vacuous concepts. Their indicators are therefore bound to
be arbitrary.
4.7 Needs re-writing.
4.8 Good. I would divide this into three separate
principlesflexibility, efficiency, and ease of useand
thus give more weight in the evaluation to this crucial domain.
THE INDICATORS
Again I can only comment briefly, with some
general observations and other specific to particular indicators.
5.1 Quantitative, or at least plausibly qualitative,
thresholds will need to be fixed throughout. For example,
the "reduction" of unemployment specified in Indicator
1.1 could be as low as 1 and still pass! We shall need at least
to say "significantly" or "substantially",
or better "so many thousands per month on average over so
many years". In this respect we need to be at least clear
and bold with welfare indicators as we are in education and health
care.
5.2 We need to be able to justify the number
of indicators for each principle, and to indicate how we would
interpret conflicting messages delivered by success on
say two out of four indicators and failure on two others. In my
view, it is better to have fewer rather than more, including
only well-measured indicators which unarguably represent the principle
in question. The indicators should properly be selected by
empirical research and appropriate statistical analysis rather
than by political judgment. We could also work out statistically
how best to combine selected indicators.
5.3 A major weakness of much official evaluation
is that the demand for (usually good) news makes it too short
term. Annual change is often merely random turbulence. Two year,
or better three year, reports would be more meaningful. In any
even, if trends are what matters, indicators must be chosen
and set for the long term. Thus, for example, all the indicators
for Principle 1 may be too modest for the future.
5.4 There is a case for including at least
one indicator which assesses consumer satisfaction (like 8.1)
for each principle. This type of indicator is more meaningful
than others (except prices) to the general public, and less easily
misunderstood, misinterpretated, or manipulated by spin doctors
than some other measures. It also has the advantage of being tied
closely as an "effect" to a "cause" within
the Welfare State's control, or at least influence. By contrast,
too many of the chosen indicators could move this way or that
despite as much as because of the Welfare State's influence.
5.5 Add to indicators for Principle 1 job
vacancies in specific skill categories. Indicators 1.2 and
1.3 are absurdly too modest.
5.6 Indicator 2.1 is vacuoussuch
a guarantee could exist without affecting reality any more than
the "rights" in Stalin's constitution. Why should the
improved confidence sought in 2.4 be restricted to the private
sector? If the Government intends to maintain a state pension,
people need to be confident of that too!
5.7 Indicators 3.1, 3.2 and 3.3 are all meaningless
without specific targets. I doubt if the Welfare State as such
is responsible for much of the massive improvement in health,
education, and housing which has been ongoing for a 100 years
or more. Better to have more focused indicators within the
control of the Welfare Statein relation to the quality
of performance in these areas by state employees.
5.8 Indicators 4.1, 4.2, 4.3, and 4.4 all seem
to me to reflect incoherent thinking underlying Principle 4. 4.3
in particular is a logical mess. 4.4 emphasises the rights
side of the equation too muchwhy no reference here
to reduction of fraud, for example? Why not include reduction
of the number of disabled among the criteria?
5.9 Indicators 5.1, 5.2, 5.3, and 5.4 also reflect
incoherence in the Principle they are supposed to measure. Does
5.1 mean an increase in the proportion of funding going
to families with children compared with families without? Indicators
5.1 and 5.2 may contradict each otherif more money
is given to families with children, there is arguably less incentive
for the adults involved to seek work.
5.10 6.1, 6.2, and 6.3 are arbitrary indicators
of a meaningless concept. Why these and not others? 6.1 is
a glib rag-bagwhile truancy must be reduced, exclusion
is a completely distinct phenomenon, and not necessarily to be
deprecated. What matters is how we treat excluded pupils, rather
than how many there are. 6.3 is a Sunday School aspiration which
is probably as meaningless as it is infeasible.
5.11 Indicators 7.1 and 7.2 will require much
development work if they are to be coherently assessed. Client
satisfaction measures perhaps have a part to play.
5.12 Indicator 8.2 is surely a part of 8.1 and
not separate. 8.3, 8.4, and 8.5 are very important measures, whose
progressor otherwisewill be watched with interest.
I would build in some specific comparisons in this section
with private sector provider benchmarks.
CONCLUSION
It seems to me that in this phase of its work
the Committee is taking on a most important task. If it is handled
well, it could make a very valuable contribution to the quality
of public administration across the board. All the public services
of a truly modern society stand in need of coherent, rational
evaluation. A powerful lead for other spheres of governmentbeyond
even the wide swathe of welfarecould be provided by focusing
carefully on two crucial issues:
6.1 Getting the logic of evaluation sorted
out straight once and for all.
6.2 Securing reliance on up-to-date statistical
analysis in the identification, measurement, and combination
of indicators.
June 1998
3 See Ev p. 29 para. 4.5. Back
|