Select Committee on Children, Schools and Families Minutes of Evidence

Examination of Witnesses (Questions 80 - 99)



  Q80  Ms Butler: Do I take it that you believe that the Government's direction of travel in stage not age is the right direction to be travelling in?

  Dr Boston: Yes. That is a very important step forward, and I think that the single level tests that are still in pilot stage have the prospect of combining both a summative assessment and a formative assessment. They are across a whole programme of study; it is not simply like my example of testing a youngster on percentages and moving along. It will provide us with progress data—summative data as they go through it—as well as formative data as they go through the key stage itself.

  Chairman: Annette will lead us through the next section on the purpose of testing figures for purpose, which we have started, but we are going to continue.

  Q81  Annette Brooke: Can we backtrack slightly and look at degrees of error, certainly in validation? We had the statistic in an earlier sitting that up to 30% of candidates in public examinations are awarded the wrong grade. We can touch on the issues of consistency of marking and actual mistakes in adding scores together, but what sort of percentage error are we looking at that is due simply to the nature of the design of the test? It may be that a student hits a whole set of questions and does not know the answer to those particular questions. In other words, what aspects are there other than the obvious mis-marking and adding-up errors?

  Dr Boston: I cannot say that in any test there will be this percentage of error, but there are sources of error. The figure of 30%. is a very high figure, which I have heard before and it certainly pulls you up. What are the controls we have over this? We have the nature of the mark scheme and how precise and definitive that is, in terms of allocating scores. We have performance around grade boundaries, where a score might be just above or below a grade boundary. More and more information is now being given by awarding bodies to candidates, including the return of scripts at GCSE and A-level, if you want them, and there is greater diagnosis of performance, particularly from Edexcel. If there is error, the objective is to detect it and then resolve it. The process of lodging an appeal after a result and having that heard and the paper re-examined is a legitimate and important part of the whole thing. We cannot say that the system works impeccably unless there are such robust appeal processes and they are seen to work.

  Q82  Annette Brooke: Given that the 30% figure has been mentioned, surely that is something that you have investigated fully and looked at the evidence for? Can we really say that the Government are quite justified in being confident in the test results that are finally published?

  Dr Boston: Yes, I can certainly say that we are confident in being published. However, it must be said that there are various views of comparability which compound all of this, and make people wonder whether the standards or grades are being met. One of the most recent arguments about grade inflation has been the work that Robert Coe has run from Durham, which has been interesting work to look at. He has taken what he called the test of developed ability, which was a notion of innate ability—developed ability—in an individual, and he took the example of the person getting A-levels and said that A-levels had become two grades easier over the last 20 years, and that that was a problem.

  Q83  Annette Brooke: I am not really talking about grade inflation at the moment. I am actually talking about fundamental errors and confidence in the system. I agree that grade inflation is undermining confidence, but in this sitting we are not concentrating on that.

  Dr Boston: Error exists. As I said before, this a process of judgment. Error exists, and error needs to be identified and rectified where it occurs. I am surprised at the figure of 30%. We have been looking at the range of tests and examinations for some time. We think that is a very high figure, but whatever it is it needs to be capable of being identified and corrected.

  Q84  Annette Brooke: What is the primary purpose for which Key Stage tests are designed? We were talking about a variety of purposes. What is the No. 1 purpose?

  Dr Boston: The No. 1 purpose is to decide the level that a child has reached at the end of a key stage.

  Q85  Annette Brooke: Given that the tests are also used for the purpose of assessing a school's performance by parents, local authorities and the Government, and that we want more teacher assessment—you said that yourself—and full professional development, do you think it is reasonable to ask teachers to judge pupils' performance when they themselves and their schools are being judged by the results? Is there not a major conflict here?

  Dr Boston: The use of diagnostic assessment and assessment of pupil performance, and training teachers to have an understanding of standards and to be able to decide where their children rest, where their achievement is, is very sound. I have talked about teacher assessment before and see immense value in it, and in the Institute of Educational Assessors in moderation, but I am not signed up to the abolition of external tests and to the elimination of external marking. I certainly think that it has a place and that in any assessment system a balance is needed between internal and external, but I certainly would not sign up for a summative assessment process that did not include a significant component of external marking.

  Q86  Mr Chaytor: Of the 22 purposes to which the assessment results can be put, you stressed what you think are the most effective purposes, which are best served by the current system. Which of the 22 are least well served by the current arrangements?

  Dr Boston: With regard to the personal value of students' achievements and the formative assessment to identify students' proximal learning needs and guide subsequent teaching, the national curriculum tests are less effective than the new tests—the single-level tests—will be. In respect of student monitoring to decide whether students are making sufficient progress in attainment in relation to targets, the single-level tests will do that better; and no, these are not the tests to deliver the diagnosis of learning difficulties. We could develop better and simpler tests to identify the general educational needs of students to transfer to new schools. We can use Key Stage tests to segregate students into homogeneous groups or screening to identify youngsters who differ significantly from their peers, but we could simply design better ones as well. I will not go through the 14; it is a matter of stripping down. The tests are good at assessing institution performance; a standard test is applied to all schools in the country to children of the same age and it will give you at one level a measure of the performance of that institution. You might want to moderate that when you to come to setting targets for that institution in terms of its intake, but they are pretty good at that.

  Q87  Mr Chaytor: In terms of institutional performance, does it follow that the function of school choice is effectively served by the current tests?

  Dr Boston: It could be served by a better test.

  Q88  Mr Chaytor: You have been very strong on the effectiveness of the tests and the importance of full cohort testing. But full cohort testing is not the only way of getting the information that the Government and the public require. Why has the QCA been so resistant to techniques of light sampling?

  Dr Boston: I do not think that we have been resistant to it. In fact I think we were the first people to start talking about it publicly. We offered advice to the Government and the Government were not at that stage heading in that direction. They were heading in the direction of the progress tests, as they were then called, or the single level test, which I think is fine. But one of the issues with the Key Stage tests is that they are a full cohort test. There is a new test each year. They take a long time to develop and then all the test items can no longer be used again. The Government set great store by sample tests such as PIRLS, PISA and TIMSS. In other countries such as America, for example, the national assessment of educational progress is a test of a statistically valid sample, which takes the same test items each year. It is slightly changed, but it is basically the same thing. It will give you an absolute measure of whether standards on that test are rising or falling. It is horses for courses. There are ways in which this can be organised. The way that the Government are moving is to go for the single level tests, which I strongly support. But we need to be wary, if we are to have single level tests but phase out Key Stage tests, that we do not saddle the single level tests with these 14 functions. We should use the single level tests for some of the functions and have other sorts of tests for other functions.

  Q89  Chairman: If we have been using all these tests for 14 different things all this time, is it legitimate for people like us to say to you, well where was the QCA? Have you been telling Ministers over all these years that this is a ridiculous system of testing and that it is so wide that we are picking out 14 different outcomes and that you need to divide into four very specific groups—your corkscrew, your screwdriver and so on? Where have you been? Have you been telling the Government this for a long time and they just would not listen?

  Dr Boston: No. I do not think that that would be fair for you to say. The discourse on what assessment is about and how we do it is a public debate.

  Q90  Chairman: I am sorry, but most of my constituents do not believe that. Parents of children taking tests believe that you are the person who looks after this sort of stuff, and that if you do not like what is going on, you should tell the Government that they should do something about it, and, if it really came to it, that you would come out from your corner and say that tests are not fair.

  Dr Boston: I am certainly not saying that the key stage tests are not fit for purpose. I am saying that there are some purposes for which they are far fitter than others. They can be used for these purposes. There is no question about that. But for many of them there is a better way to do it.

  Q91  Chairman: That is what our expert witnesses have been saying: there are too many tests. You have not really answered that. We went to New Zealand and they said that they would like to know more about our students, but that to test at 7, 11, 14, 16, 17 and 18 we must be crazy. Why does the QCA never seem to say anything about the number of tests and the fact that the other expert witnesses say that those tests are not fit for purpose?

  Dr Boston: I do not believe that there are too many tests, particularly in primary education. There is undue pressure on preparation for the tests, but if we consider the amount of time that is actually taken up by the testing process, it is not high, and it is certainly higher in some other countries. In secondary education it is far more intense. There is no question about that. Our concern—or my concern—has not been with the burden of assessment, as people sometimes refer to it, but with the high stakes put on the assessments because, in the case of Key Stage tests, they carry 14 different functions.

  Q92  Chairman: That is what we keep coming back to. Why have you not blown the whistle on those 14 different functions and said that they should not be used in that way?

  Dr Boston: I provide advice to Government. I am not out there as an independent commentator.

  Q93  Chairman: Are you saying that you have told the Government that they are not fit for purpose for a long time and they have not reacted to that?

  Dr Boston: No. I have never told the Government that these tests are not fit for purpose because I do not think that that is the case. I think that they are fit for purpose. I have certainly said that there are many purposes that would be served better by different sorts of tests. Indeed, as you know, some time ago I raised the issue of sample testing, on which the Government were not keen for other reasons.

  Q94  Chairman: What about the other point that we picked up on in the evidence—that people said that because you have not blown the whistle on the tests, they drive out the ability to teach a decent curriculum; that the teachers are just teaching to the test and cannot explore the curriculum?

  Dr Boston: Fundamentally, our task has been to develop, deliver and build these tests and to make sure that the results from them are valid. Although I admit that there are some errors in them, we make sure that there are processes for that error to be identified and for the problem to be resolved. We have been extraordinarily forward in pushing for the introduction of more technology and scanning, precisely for reasons of improving the quality of marking.

  Q95  Chairman: We visit examining boards and often the progress and innovation comes from them, not from you. I get the impression that you are running behind Cambridge Assessment and Edexcel. They are teaching you how to do that stuff.

  Dr Boston: Edexcel, which was the first to get into online scanning and marking in this country, would not have got there without the very strong support that it had from QCA, both publicly and through the Government. The fundamental argument related to improvements in the quality of marking. You will remember the fuss that occurred at the time when the contract went to Edexcel—or Pearson—about bringing in a private, overseas company to run marking when previously it had been done by charities. The argument that we publicly and strongly ran then was that that was the way forward. It was the way to guarantee quality in marking and to eliminate problems because second marking would take place alongside first marking with the material coming up on the computer screen.

  Chairman: We will drill down on that in a minute if you do not mind. I want to call Stephen now to talk about test targets and tables.

  Q96  Stephen Williams: How do you go about deciding what a child should know at each particular stage in their life? Ten days ago we had the Key Stage 2 league tables reporting that children by age 11 are meant to reach Level 4 across the subjects. How was it decided what the content of Level 4 is and what the target is for an 11-year-old to get to that level? What process is gone through to reach those two things?

  Dr Boston: That is a very technical question that I am sure someone behind me could answer if you were prepared to let them, or they could slip me notes and I would attempt to make a fist of it.

  Chairman: It must be the latter and not the former, otherwise Hansard will be driven up the wall, so if you do not mind, we will be happy to give you some time for someone to supply a note. Stephen, do you want to change the drift of your questions until you get an answer on that?

  Q97  Stephen Williams: I will ask the question in broader terms because, in the introductory session, Ken basically said that there was a constitutional nicety, a separation of powers between a regulator of standards which ensured that everyone had confidence that nothing was being politically manipulated. Is it the QCA, all the advisers sitting behind you and those behind them back at headquarters who decide what Level 4 is and how many children should reach it by a given age, or is that box-ticking mentality started in the Department and you are told to design a curriculum to deliver that?

  Dr Boston: I understand the question, but I know that the experts sitting behind me will give a better answer than I could, so can we move on and I shall take the question in a moment?

  Chairman: Lynda, did you want to come in on that point?

  Lynda Waltho: No, I wanted to follow on later.

  Chairman: Carry on, Stephen.

  Q98  Stephen Williams: Ken, you said that the primary purpose of all Key Stage tests was to assess the individual performance of a child, yet what gets all the attention—I refer to the August media frenzy—is the performance of a school, which is an aggregation of the performance of all the individual children. Do you think a fair outcome of the tests is that schools should be held to account or do you think that is a minor delivery of the system?

  Dr Boston: No, I am firmly of the view that schools should be held to account. I believe in full cohort testing. I believe that full cohort testing and summative assessment have a place and that holding schools to account for what they achieve is important.

  Q99  Stephen Williams: When you say that you believe in full cohort testing, I can see that means that you believe in testing every child, but do you also believe that the publication of the aggregate results for every child is fair on the school, or would a national sample be a better way of measuring standards across the country?

  Dr Boston: Yes, I certainly believe in reporting the achievements of that school. These are the other schools.

previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2008
Prepared 13 May 2008