Examination of Witnesses (Questions 80
MONDAY 17 DECEMBER 2007
Q80 Ms Butler: Do I take it that
you believe that the Government's direction of travel in stage
not age is the right direction to be travelling in?
Dr Boston: Yes. That is a very
important step forward, and I think that the single level tests
that are still in pilot stage have the prospect of combining both
a summative assessment and a formative assessment. They are across
a whole programme of study; it is not simply like my example of
testing a youngster on percentages and moving along. It will provide
us with progress datasummative data as they go through
itas well as formative data as they go through the key
Chairman: Annette will lead us through
the next section on the purpose of testing figures for purpose,
which we have started, but we are going to continue.
Q81 Annette Brooke: Can we backtrack
slightly and look at degrees of error, certainly in validation?
We had the statistic in an earlier sitting that up to 30% of candidates
in public examinations are awarded the wrong grade. We can touch
on the issues of consistency of marking and actual mistakes in
adding scores together, but what sort of percentage error are
we looking at that is due simply to the nature of the design of
the test? It may be that a student hits a whole set of questions
and does not know the answer to those particular questions. In
other words, what aspects are there other than the obvious mis-marking
and adding-up errors?
Dr Boston: I cannot say that in
any test there will be this percentage of error, but there are
sources of error. The figure of 30%. is a very high figure, which
I have heard before and it certainly pulls you up. What are the
controls we have over this? We have the nature of the mark scheme
and how precise and definitive that is, in terms of allocating
scores. We have performance around grade boundaries, where a score
might be just above or below a grade boundary. More and more information
is now being given by awarding bodies to candidates, including
the return of scripts at GCSE and A-level, if you want them, and
there is greater diagnosis of performance, particularly from Edexcel.
If there is error, the objective is to detect it and then resolve
it. The process of lodging an appeal after a result and having
that heard and the paper re-examined is a legitimate and important
part of the whole thing. We cannot say that the system works impeccably
unless there are such robust appeal processes and they are seen
Q82 Annette Brooke: Given that the
30% figure has been mentioned, surely that is something that you
have investigated fully and looked at the evidence for? Can we
really say that the Government are quite justified in being confident
in the test results that are finally published?
Dr Boston: Yes, I can certainly
say that we are confident in being published. However, it must
be said that there are various views of comparability which compound
all of this, and make people wonder whether the standards or grades
are being met. One of the most recent arguments about grade inflation
has been the work that Robert Coe has run from Durham, which has
been interesting work to look at. He has taken what he called
the test of developed ability, which was a notion of innate abilitydeveloped
abilityin an individual, and he took the example of the
person getting A-levels and said that A-levels had become two
grades easier over the last 20 years, and that that was a problem.
Q83 Annette Brooke: I am not really
talking about grade inflation at the moment. I am actually talking
about fundamental errors and confidence in the system. I agree
that grade inflation is undermining confidence, but in this sitting
we are not concentrating on that.
Dr Boston: Error exists. As I
said before, this a process of judgment. Error exists, and error
needs to be identified and rectified where it occurs. I am surprised
at the figure of 30%. We have been looking at the range of tests
and examinations for some time. We think that is a very high figure,
but whatever it is it needs to be capable of being identified
Q84 Annette Brooke: What is the primary
purpose for which Key Stage tests are designed? We were talking
about a variety of purposes. What is the No. 1 purpose?
Dr Boston: The No. 1 purpose is
to decide the level that a child has reached at the end of a key
Q85 Annette Brooke: Given that the
tests are also used for the purpose of assessing a school's performance
by parents, local authorities and the Government, and that we
want more teacher assessmentyou said that yourselfand
full professional development, do you think it is reasonable to
ask teachers to judge pupils' performance when they themselves
and their schools are being judged by the results? Is there not
a major conflict here?
Dr Boston: The use of diagnostic
assessment and assessment of pupil performance, and training teachers
to have an understanding of standards and to be able to decide
where their children rest, where their achievement is, is very
sound. I have talked about teacher assessment before and see immense
value in it, and in the Institute of Educational Assessors in
moderation, but I am not signed up to the abolition of external
tests and to the elimination of external marking. I certainly
think that it has a place and that in any assessment system a
balance is needed between internal and external, but I certainly
would not sign up for a summative assessment process that did
not include a significant component of external marking.
Q86 Mr Chaytor: Of the 22 purposes
to which the assessment results can be put, you stressed what
you think are the most effective purposes, which are best served
by the current system. Which of the 22 are least well served by
the current arrangements?
Dr Boston: With regard to the
personal value of students' achievements and the formative assessment
to identify students' proximal learning needs and guide subsequent
teaching, the national curriculum tests are less effective than
the new teststhe single-level testswill be. In respect
of student monitoring to decide whether students are making sufficient
progress in attainment in relation to targets, the single-level
tests will do that better; and no, these are not the tests to
deliver the diagnosis of learning difficulties. We could develop
better and simpler tests to identify the general educational needs
of students to transfer to new schools. We can use Key Stage tests
to segregate students into homogeneous groups or screening to
identify youngsters who differ significantly from their peers,
but we could simply design better ones as well. I will not go
through the 14; it is a matter of stripping down. The tests are
good at assessing institution performance; a standard test is
applied to all schools in the country to children of the same
age and it will give you at one level a measure of the performance
of that institution. You might want to moderate that when you
to come to setting targets for that institution in terms of its
intake, but they are pretty good at that.
Q87 Mr Chaytor: In terms of institutional
performance, does it follow that the function of school choice
is effectively served by the current tests?
Dr Boston: It could be served
by a better test.
Q88 Mr Chaytor: You have been very
strong on the effectiveness of the tests and the importance of
full cohort testing. But full cohort testing is not the only way
of getting the information that the Government and the public
require. Why has the QCA been so resistant to techniques of light
Dr Boston: I do not think that
we have been resistant to it. In fact I think we were the first
people to start talking about it publicly. We offered advice to
the Government and the Government were not at that stage heading
in that direction. They were heading in the direction of the progress
tests, as they were then called, or the single level test, which
I think is fine. But one of the issues with the Key Stage tests
is that they are a full cohort test. There is a new test each
year. They take a long time to develop and then all the test items
can no longer be used again. The Government set great store by
sample tests such as PIRLS, PISA and TIMSS. In other countries
such as America, for example, the national assessment of educational
progress is a test of a statistically valid sample, which takes
the same test items each year. It is slightly changed, but it
is basically the same thing. It will give you an absolute measure
of whether standards on that test are rising or falling. It is
horses for courses. There are ways in which this can be organised.
The way that the Government are moving is to go for the single
level tests, which I strongly support. But we need to be wary,
if we are to have single level tests but phase out Key Stage tests,
that we do not saddle the single level tests with these 14 functions.
We should use the single level tests for some of the functions
and have other sorts of tests for other functions.
Q89 Chairman: If we have been using
all these tests for 14 different things all this time, is it legitimate
for people like us to say to you, well where was the QCA? Have
you been telling Ministers over all these years that this is a
ridiculous system of testing and that it is so wide that we are
picking out 14 different outcomes and that you need to divide
into four very specific groupsyour corkscrew, your screwdriver
and so on? Where have you been? Have you been telling the Government
this for a long time and they just would not listen?
Dr Boston: No. I do not think
that that would be fair for you to say. The discourse on what
assessment is about and how we do it is a public debate.
Q90 Chairman: I am sorry, but most
of my constituents do not believe that. Parents of children taking
tests believe that you are the person who looks after this sort
of stuff, and that if you do not like what is going on, you should
tell the Government that they should do something about it, and,
if it really came to it, that you would come out from your corner
and say that tests are not fair.
Dr Boston: I am certainly not
saying that the key stage tests are not fit for purpose. I am
saying that there are some purposes for which they are far fitter
than others. They can be used for these purposes. There is no
question about that. But for many of them there is a better way
to do it.
Q91 Chairman: That is what our expert
witnesses have been saying: there are too many tests. You have
not really answered that. We went to New Zealand and they said
that they would like to know more about our students, but that
to test at 7, 11, 14, 16, 17 and 18 we must be crazy. Why does
the QCA never seem to say anything about the number of tests and
the fact that the other expert witnesses say that those tests
are not fit for purpose?
Dr Boston: I do not believe that
there are too many tests, particularly in primary education. There
is undue pressure on preparation for the tests, but if we consider
the amount of time that is actually taken up by the testing process,
it is not high, and it is certainly higher in some other countries.
In secondary education it is far more intense. There is no question
about that. Our concernor my concernhas not been
with the burden of assessment, as people sometimes refer to it,
but with the high stakes put on the assessments because, in the
case of Key Stage tests, they carry 14 different functions.
Q92 Chairman: That is what we keep
coming back to. Why have you not blown the whistle on those 14
different functions and said that they should not be used in that
Dr Boston: I provide advice to
Government. I am not out there as an independent commentator.
Q93 Chairman: Are you saying that
you have told the Government that they are not fit for purpose
for a long time and they have not reacted to that?
Dr Boston: No. I have never told
the Government that these tests are not fit for purpose because
I do not think that that is the case. I think that they are fit
for purpose. I have certainly said that there are many purposes
that would be served better by different sorts of tests. Indeed,
as you know, some time ago I raised the issue of sample testing,
on which the Government were not keen for other reasons.
Q94 Chairman: What about the other
point that we picked up on in the evidencethat people said
that because you have not blown the whistle on the tests, they
drive out the ability to teach a decent curriculum; that the teachers
are just teaching to the test and cannot explore the curriculum?
Dr Boston: Fundamentally, our
task has been to develop, deliver and build these tests and to
make sure that the results from them are valid. Although I admit
that there are some errors in them, we make sure that there are
processes for that error to be identified and for the problem
to be resolved. We have been extraordinarily forward in pushing
for the introduction of more technology and scanning, precisely
for reasons of improving the quality of marking.
Q95 Chairman: We visit examining
boards and often the progress and innovation comes from them,
not from you. I get the impression that you are running behind
Cambridge Assessment and Edexcel. They are teaching you how to
do that stuff.
Dr Boston: Edexcel, which was
the first to get into online scanning and marking in this country,
would not have got there without the very strong support that
it had from QCA, both publicly and through the Government. The
fundamental argument related to improvements in the quality of
marking. You will remember the fuss that occurred at the time
when the contract went to Edexcelor Pearsonabout
bringing in a private, overseas company to run marking when previously
it had been done by charities. The argument that we publicly and
strongly ran then was that that was the way forward. It was the
way to guarantee quality in marking and to eliminate problems
because second marking would take place alongside first marking
with the material coming up on the computer screen.
Chairman: We will drill down on that
in a minute if you do not mind. I want to call Stephen now to
talk about test targets and tables.
Q96 Stephen Williams: How do you
go about deciding what a child should know at each particular
stage in their life? Ten days ago we had the Key Stage 2 league
tables reporting that children by age 11 are meant to reach Level
4 across the subjects. How was it decided what the content of
Level 4 is and what the target is for an 11-year-old to get to
that level? What process is gone through to reach those two things?
Dr Boston: That is a very technical
question that I am sure someone behind me could answer if you
were prepared to let them, or they could slip me notes and I would
attempt to make a fist of it.
Chairman: It must be the latter and not
the former, otherwise Hansard will be driven up the wall,
so if you do not mind, we will be happy to give you some time
for someone to supply a note. Stephen, do you want to change the
drift of your questions until you get an answer on that?
Q97 Stephen Williams: I will ask
the question in broader terms because, in the introductory session,
Ken basically said that there was a constitutional nicety, a separation
of powers between a regulator of standards which ensured that
everyone had confidence that nothing was being politically manipulated.
Is it the QCA, all the advisers sitting behind you and those behind
them back at headquarters who decide what Level 4 is and how many
children should reach it by a given age, or is that box-ticking
mentality started in the Department and you are told to design
a curriculum to deliver that?
Dr Boston: I understand the question,
but I know that the experts sitting behind me will give a better
answer than I could, so can we move on and I shall take the question
in a moment?
Chairman: Lynda, did you want to come
in on that point?
Lynda Waltho: No, I wanted to follow
Chairman: Carry on, Stephen.
Q98 Stephen Williams: Ken, you said
that the primary purpose of all Key Stage tests was to assess
the individual performance of a child, yet what gets all the attentionI
refer to the August media frenzyis the performance of a
school, which is an aggregation of the performance of all the
individual children. Do you think a fair outcome of the tests
is that schools should be held to account or do you think that
is a minor delivery of the system?
Dr Boston: No, I am firmly of
the view that schools should be held to account. I believe in
full cohort testing. I believe that full cohort testing and summative
assessment have a place and that holding schools to account for
what they achieve is important.
Q99 Stephen Williams: When you say
that you believe in full cohort testing, I can see that means
that you believe in testing every child, but do you also believe
that the publication of the aggregate results for every child
is fair on the school, or would a national sample be a better
way of measuring standards across the country?
Dr Boston: Yes, I certainly believe
in reporting the achievements of that school. These are the other