Select Committee on Children, Schools and Families Minutes of Evidence


Examination of Witnesses (Questions 100 - 119)

MONDAY 17 DECEMBER 2007

DR KEN BOSTON

  Q100  Stephen Williams: That is based on the aggregate of the results for each child in the school, so would you reject alternatives completely?

  Dr Boston: It depends on the purpose. If your purpose is to find out whether children are writing and reading as well as they did 10 years ago nationally, the best test is to give a sample of them virtually the same test as was given to a sample of them 10 years ago. That will tell you whether we have gone up or down. If you want to report on the performance of a school this year in relation to the school next door, the sample will clearly not do that, but the full cohort test will. It is again about purpose. Both purposes are legitimate and some of the difficulties with the testing programme or the examination programme when looking at whether standards have changed significantly year on year or over a 20-year period, are that the curriculum, teaching methods and other things such as class size have changed. If you really want to know whether people are better at reading than they were 20 years ago, give them the same test.

  Q101  Stephen Williams: I see that you have had time to look at the note that has been passed to you.

  Dr Boston: This is Dr Horner's contribution in handwriting. The report of the task group on assessment and testing in 1989—while I was still elsewhere, Barry—decided on a 10-point scale and proposed a graph of progress that identified Level 4 at age 11. That report was then published, presumably. We are now on an eight-point scale—aren't we?—so that 10-point scale has been reduced. That does not fully answer your question, Stephen.

  Q102  Stephen Williams: No, four out of 10 is 40% and, on the same scale, four out of eight is 50%, so it seems to be a completely different target. Perhaps we are getting a bit too technical. I was trying to get at whether the Government were feeling the QCA's collar in respect of how we set the standards for children. Therefore, is it right that we have a separate regulator of standards for the future?

  Dr Boston: Barry, I should very much like to give you a written statement tomorrow in answer to this question.[5]

  Chairman: Okay.

  Q103  Stephen Williams: Before the note, I was going to mention the difference between how a child's performance is assessed and how a school's performance is assessed. You were going into how children's performance was assessed over time. Is there another way in which you can assess a school's effectiveness apart from the league table mentality that we have at the moment? Is there an alternative? After all, they do not have league tables in Wales or Scotland.

  Dr Boston: You can certainly assess the performance on the basis of teacher reporting, as against the school reporting its performance against a template of benchmarks, perhaps, as occurs in some other countries, and reporting its testing of its students against national averages in literacy, numeracy and so on. I know of cases where that occurs.

  Q104  Stephen Williams: You are a man of international experience. Do you think that anywhere else does it better than England—whether a state in Australia or anywhere else—without this sort of national frenzy every August, with people wondering whether things are going down the pan or the Government saying, "No, things have only ever got better"?

  Dr Boston: I think England is pretty rare in the way it does this in August, although I would not say unique.

  Q105  Chairman: Is that good or bad?

  Dr Boston: The annual debate about whether too many have passed and whether standards must have fallen is a very sterile debate and I would be glad to see the back of it. If it is right that this new regulator will lead to the end of that, it is a good thing. We are not so sure that it will. There are other, better, ways of celebrating success and achievement—not questioning it.

  Q106  Stephen Williams: Do you think that any particular country does it a lot better than England?

  Dr Boston: No. I think that in other countries where the results come out there is less public criticism of youngsters on the basis that, because they have three A grades, the result must be worthless. Such criticism is a very bad thing. From my previous experience, I have a great interest in Aboriginal education. There was 200 years of Aboriginal education in Australia with absolutely no impact on the performance of Aboriginal kids until we introduced full cohort testing and reporting at school level. Then, suddenly, people took Aboriginal education seriously and it began to improve.

  Q107  Stephen Williams: If nobody else wants to come in on this section, I should like to ask one last question, going back to the start, about introducing the new regulator. We have only had the report, Confidence in Standards, from the Secretary of State today, and we have not been able to digest it fully yet. When were you consulted about the split in the QCA's responsibilities? Was it before, during or after the August round of exam results that we had a few months ago?

  Dr Boston: It was after.

  Q108  Fiona Mactaggart: You have been talking clearly about the difficulty of tests fulfilling 14 different purposes. The fact is that they fulfil some of those inadequately. You suggested that the best way to see if children over time are able to achieve the same standard is through sampled testing. Do we do very much of that, and if not, why not?

  Dr Boston: Those tests will tell you whether performance on a particular task has improved over time. We do not do that as a country. We pay a lot of attention to PIRLS and PISA and the national maths and science study. In developing its Key Stage tests from year to year, the QCA does pre-test. Part of those pre-tests in schools, which youngsters think are just more practice tests, are pre-tests for what we will use in 18 months' time. In them we often use anchor questions, which are the same questions that have been asked a few years before or in consecutive years. They might be only slightly disguised or might not be changed at all. That is to help develop tests that maintain standards so that Level 4 is Level 4 year on year. The boundary between the levels is set by the examiners. It might be 59 in one year and 61 in another, but they know that in their judgment that is a Level 4. They draw on those tests.  We have not used the tests systematically enough to say, "We used these six questions for the past eight years and we know that students are getting better at reading or worse at writing," but that is the basis on which we develop and pre-test those emerging assessments.

  Q109  Fiona Mactaggart: I am struck by this. You are saying that we do it a bit to ensure the comparability of tests over time. We all accept that some of that kind of work is a necessary function of getting accurate summative tests, but there is a constant threat in debate in assessment about whether standards have changed over time. I do not think that I properly understand why we have not bothered to invest what does not strike me as a very large amount of resource in producing that kind of sampling over time to see whether standards are improving or weakening, and where. We would then have a national formative assessment about where the strengths and weaknesses of our education system are over time. Do we have a mechanism that is designed to do that? If not, why not?

  Dr Boston: No, we do not. We use PIRLS and PISA and in the recent results, they confirmed what we already know; for example, at PIRLS level, the line I talked about is not as steep as it was before. It has flattened off, but has not come to a plateau. The notion of a sampling programme is something that we have raised with Government. Some years ago, before I came into this job, there was the Assessment of Performance Unit, which did some of that work. That is no more. I do not know the background and the reasons why the work was not pursued, but it was work of this sort. It would seem to me that we need to be thinking not of either/or. That is the message that I really want to get across. We are not thinking of Key Stage tests or single level tests or sample tests. If we want to serve those 22 legitimate purposes of testing—I am sure there are more—we need a number of tests that will deliver between them all those things, but which are designed so that they are very close to what Paul Newton calls the design inference, where the user inference and the design inference are very close indeed.

  Q110  Fiona Mactaggart: What I do not understand about the proposed new system is that if we developed a wider range of tests to separate some of these functions more precisely so that we get more accurate information rather than trying to infer information from tests that are designed to do something else, which is what we do at present, who would take the lead in developing the sample tests and introducing them? Would it be the QCA or the new regulatory authority? I have not had time to read through the document, but I do not understand whose job is what.

  Dr Boston: It would be the QCA, and it would do its work partly through stimulating the private sector market and the awarding bodies to work with it. Presumably the QCA would take the initiative on remit from the Government. That would be critical: the Government would decide that they wanted a set of new tests. We did not go out and invent single level tests. We were remitted to do them. We produced them at Government request, and with our very strong support. So the initiative would rest fundamentally with the Government, but the body that would lead on it would be the QCA, or whatever the QCA might end up being called some time in the future. The regulatory authority is to ensure that, once the product—the assessment—is there, it delivers on standards and maintains standards. The regulator is not a development authority; it is an authority to regulate products and ensure their quality once they are there.

  Q111  Fiona Mactaggart: When you were remitted to develop the concept of single level tests, were you remitted to develop a test that was a one-way street, rather than a test that could be re-administered? I gather that the National Foundation for Educational Research is concerned about the fact that this is just a single pass test and that someone who chooses when they do it might pass then but might not necessarily pass it a month later.

  Dr Boston: We were remitted to produce a test which would be taken as a one-off. Further down the track if we get to a point, as I think we might, where single level tests are available virtually on line, on demand, we would need to go to a data bank of test items. What we have at the moment is a Level 3 test or a Level 4 test. A judgment is then made on the score you get about whether you are secure in Level 4. That test is then finished with. The time may come in the future, as with Key Stage 3 ICT tests, where there is a computer in the corner on which you can take at any stage your Level 4 or Level 5 reading test. That would depend on a data bank. In that sense it is constantly renewable, if I understand the question correctly.

  Q112  Fiona Mactaggart: It was not so much about whether it was renewable. If the teacher of the child can choose the moment at which the child takes a single level test and it is a propitious day for that particular child, the child may do well in the test and succeed, but it might still be rather a frail attainment. There is anxiety about whether that is a fully accurate picture of the child's capacity and the general learning level even though they can do it on a fair day with wind behind them.

  Dr Boston: I am remiss in that I have not fully explained the relationship between the Assessment of Pupil Performance and the tests. The APP programme is designed essentially to produce greater understanding among teachers about what is represented by a level—the profile of a Level 4 in reading, the profile of a Level 5 in reading and the difference between them. It represents the different indicators that show a child is either at Level 4 or Level 5, and the child is then entered for the test. The test is meant to be confirmation that the teacher has made the judgment correctly.

  Sitting suspended for fire evacuation.

  On resuming—

  Chairman: Dr Boston, we are back in business, although only briefly. I suspect that we will have to call you or your team back at some stage, because this has been unfortunate. I will give a question to each member of the team, and you will answer speedily. I will start with David, followed by Stephen, then Fiona, and I will finish.

  Q113  Mr Chaytor: On maintenance of standards, will the new A* grade at A-level have the same pass rate in all subjects across all examining boards?

  Dr Boston: No.

  Q114  Mr Chaytor: Does the existing A-level threshold have the same pass rate in all subjects?

  Dr Boston: No.

  Q115  Mr Chaytor: Does that cause a problem?

  Dr Boston: No.

  Q116  Mr Chaytor: Will there not be a huge discrepancy between different subjects in different boards?

  Dr Boston: The A/B boundary is set by professional judgment. The reality is that subjects are different; there is no attempt to say that, for example, 10% must pass or have an A grade in every subject. No country in the world achieves precise comparability between subjects in terms of standards. Australia tries to do so: it takes all the youngsters who get a certain grade in, for example, English, geography, and art, and, if they find that a lot of the youngsters who are taking those three are getting higher grades in geography than in the other two subjects, then they deflate the mean of geography. Some pretty hairy assumptions underlie that. Here, an A/B boundary is set by professional examiners broadly at the level that a hard-working, well-taught, student who has applied himself or herself fully would achieve on a syllabus or specification.

  Q117  Mr Chaytor: Are the thresholds for subjects on examining boards matters of public record? That is, is the percentage score that triggers a B, an A or an A* on the record and available to pupils and parents?

  Dr Boston: The answer is no, I believe.

  Q118  Mr Chaytor: My next question is, should it be?

  Dr Boston: I would think not.

  Q119  Mr Chaytor: Why not?

  Dr Boston: The essential point is that you might have a harder paper one year than another, in which case the boundaries might change significantly. The point is not the numerical score where the boundary is drawn. The fundamental point is the professional judgment of the examiners, who decide where the A/B boundary is and where the E/U boundary is. They do that on the basis of their experience and past statistical evidence using papers of similar demand.


5   Note by witness: In 1988 the Task Group on Assessment and Testing (TGAT) designed the assessment system for the national curriculum. This included the development of a then 10 level scale to cover the years of compulsory schooling. Level 4 was pitched as the reasonable expectation for the end of the primary phase, to ensure pupils could move on with confidence in their skills to tackle the secondary curriculum. Back


 
previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2008
Prepared 13 May 2008