Accountability through sampling
172. Whilst the use of saturation testing, that is,
the testing of each child in a given cohort, is generally agreed
to be an appropriate means of ascertaining and certifying individual
pupil and, to a certain extent, school achievement,
there is rather more argument about whether saturation testing
is an appropriate method of testing local and national performance
and monitoring the effects of changes in policy.
173. Witnesses have argued for the decoupling of
measures of pupil attainment from accountability and monitoring
measures in order to remove the need for central collection of
individual pupil performance data, thereby removing the high-stakes
for the school.
Implicit in this argument is the hope that, once the stakes are
removed, the school can get on with the business of teaching children
a full and rounded curriculum without fear of recrimination and
the children will benefit from the education to which they should,
in any event, be entitled.
That the tests would, presumably, remain high-stakes for the
individual child has largely been ignored in the evidence we have
174. Nevertheless, decoupling accountability and
monitoring from a testing system which is primarily designed to
measure pupil attainment may have a number of desirable consequences
in relation to the issues discussed in this chapter. To summarise
the arguments which have been put to us, the incentives for schools
and teachers to teach to the test would be reduced considerably.
Likewise, schools and teachers may be more inclined to withdraw
from the disproportionate focus on the core subjects of English,
mathematics and science, important as these are, and give some
more attention to other subjects, replacing some of the lost variety
in the curriculum. Within the core subjects, teachers may feel
more at liberty to take a more creative approach to their teaching
which may enhance the enjoyment, satisfaction and even attainment
of their pupils.
Less time spent on test preparation would reduce the perception
of the testing system as burdensome and, perhaps, result in reduced
stress and demotivation for pupils. Finally, there would be scope
for developing a system of accountability which is fairer to schools,
teachers and pupils alike and which can give some reassurance
to the public about the maintenance of assessment and performance
standards over time.
The IPPR warned, however, that as long as individual pupils sit
national, summative tests (albeit separated from the accountability
system), that data exists and can be compiled and presented in
school performance tables, whether or not the government chooses
to collate and publish those tables centrally. Much the same data
would be available as before.
175. It has been widely argued that national cohort
sample testing would be a less onerous and more appropriate means
of testing local and national performance and monitoring the effects
of changes in policy.
However, sample testing would not necessarily yield the type
of data currently used for individual school accountability. Presumably,
if accountability is to be decoupled from national tests designed
to measure pupil attainment, different tests or inspections will
be required or the concept of school accountability radically
176. Dr Ken Boston said the QCA had given advice
to the Government on sample testing, but that the Government was
more inclined to go in the direction of single-level tests (as
to which, see paragraphs 188-198 below), instead setting great
store by international sample tests such as PIRLS (Progress in
International Reading Literacy Study), PISA (Programme for International
Student Assessments) and TIMSS (Trends in International Mathematics
and Science Study).
He related that he had told the Government that:
] there are many purposes that would be
served better by different sorts of tests. Indeed, as you know,
some time ago I raised the issue of sample testing, on which the
Government were not keen for other reasons.
He considered that sample testing, using a standardised
test instrument, was the best way of meeting the purpose of discovering
national trends in children's performance standards over time.
If, on the other hand, the purpose was to compare the performance
of school against school, a sample test would not yield the necessary
data, but a full cohort test would.
He did not believe that Key Stage tests, single-level tests and
cohort sampling should be seen as mutually exclusive alternatives:
different tests are needed to serve different purposes.
177. The Minister, however, did not agree that alternatives
to the current Key Stage tests were workable in practice. He acknowledged
that some had argued in favour of sample testing to monitor national
performance, but thought that testing should also be able to demonstrate
a child's progress against a national comparator, as well as measuring
the performance of a particular school. He thought that the use
of teacher assessment for these purposes was problematic due to
the difficulty of assuring comparability of data. He concluded
When I look at the matter and begin to unravel
the alternatives and think about how they would work in practice,
I find that the current SATs are much more straightforwardeverybody
would understand it. They are used for a series of things, and
there might be some compromise involved, but the system is straightforward
and simple, and it shows what our priorities are and gives us
accountability at every level. I do not think that it is a mess
178. The methodology of sample testing is well-established
and is used, for example, in international comparison studies
such as PISA and TIMMS. It was also used in the UK from the mid-1970s
and through the 1980s by the Assessment of Performance Unit ("APU")
within the Department for Education. The APU used light sampling
of schools and light sampling of pupils within schools.
The GTC sets out a number of advantages to this approach, including
reduced burden of testing; anonymity of schools and students,
ensuring that the tests are low-stakes; wide curriculum coverage;
a range of assessment formats can be employed; test items can
be repeated over time; the system is relatively inexpensive; it
provides good evidence of performance trends; and it is a tried
and tested method. Limitations of the approach include the lack
of ratings for individual schools; lack of feedback for individual
schools; and certain technical complexities leading to difficulty
of interpretation of statistical results.
The NFER also pointed out some possible drawbacks with a sampling
system. Low-stakes assessment may not motivate pupils to try hard
and show what they can really do, resulting in a potential underestimate
of ability. In addition, there may be practical difficulties with
a system relying on voluntary participation of schools and pupils.
However, the NFER broadly supports a regular national monitoring
179. The AQA stated that:
] a light sampling survey method would
enable de-coupling of national assessment from a requirement to
deliver robust information on national educational standards.
This would enable testing to reflect curriculum change with precision,
to optimize the learning-focused functions of testing, and enable
constant innovation in the form of tests to optimize accessibility.
Some witnesses have been specific about what they
would like to see. The GTC, for example, advocates cohort sampling
involving a limited number of pupils in a limited number of schools,
using a matrix test structure to allow for multiple tests across
the sample to widen the breadth of the curriculum that is being
tested. Common questions in any two or more tests would allow
for pupils taking different tests to be compared on a common scale.
The tests would be administered by teachers, with external support
The NFER proposed a similar, matrix design.
180. In this context, restoration of the former APU,
or something like it, has been a popular theme in evidence.
Cambridge Assessment has, however, pointed out a series of technical
and political issues which led to the demise of the original APU,
stating that its operation was fraught with difficulty. Whilst
Cambridge Assessment is in favour of the development of a light
sampling, matrix-based model for national monitoring of standards
over time, it counsels that this should be done with close attention
to the lessons learned from the former APU and from similar systems
181. We do not necessarily see the point in creating
a new body (or reinstating an old one) for its own sake, but we
do think that the body developing and administering sample testing
for national monitoring should be independent from government
and, for this reason, the proposed new development agency, for
example, would not be appropriate for this task.
As Professor Colin Richards said:
An independent body is needed to keep standards
under review and to devise a system for assessing performance
in relation to [
] standards over timeat a national
level, not at the level of the individual school.
182. In summary, the discussion in this Chapter has
demonstrated that high-stakes testing, that is, testing where
the stakes are high for schools and teachers, can lead to distortion
of children's education experience where accountability is linked
to the same testing system which is designed to measure pupil
The full value of a creative, linked curriculum
which addresses the interests, needs and talents of all pupils
is not exploited because many schools seem to be afraid to innovate
when test scores might be affected (even if evidence shows they
might go up).
183. Whilst we do not doubt the Government's intentions
when it states that "The National Curriculum sets out a clear,
full and statutory entitlement to learning for all pupils, irrespective
of background or ability", we are persuaded that in practice
many children have not received their entitlement and many witnesses
believe that this is due to the demands of national testing.
184. We are persuaded that the current system
of national tests should be reformed in order to decouple the
multiple purposes of measuring pupil attainment, school and teacher
accountability and national monitoring. The negative impacts of
national testing arise more from the targets that schools are
expected to achieve and schools' responses to them than from the
185. School accountability should be separated
from this system of pupil testing, and we recommend that the Government
consult widely on methods of assuring school accountability which
do not impact on the right of children to a balanced education.
186. We recommend that the purpose of national
monitoring of the education system, particularly for policy formation,
is best served by sample testing to measure standards over time
and that cohort testing is neither appropriate nor, in our view,
desirable for this purpose. We recommend further that, in the
interests of public confidence, such sample testing should be
carried out by a body at arms length from the Government and suggest
that it is a task either for the new regulator or a body answerable