House of COMMONS



children, schools and families committee



testing and assessment



Monday 17 December 2007


Evidence heard in Public Questions 55 - 127





This is an uncorrected transcript of evidence taken in public and reported to the House. The transcript has been placed on the internet on the authority of the Committee, and copies have been made available by the Vote Office for the use of Members and others.



Any public use of, or reference to, the contents should make clear that neither witnesses nor Members have had the opportunity to correct the record. The transcript is not yet an approved formal record of these proceedings.



Members who receive this for the purpose of correcting questions addressed by them to witnesses are asked to send corrections to the Committee Assistant.



Prospective witnesses may receive this in preparation for any written or oral evidence they may in due course give to the Committee.


Oral Evidence

Taken before the Children, Schools and Families Committee

on Monday 17 December 2007

Members present:

Mr. Barry Sheerman (Chairman)

Annette Brooke

Ms Dawn Butler

Mr. David Chaytor

Mr. John Heppell

Fiona Mactaggart

Lynda Waltho

Stephen Williams


Examination of Witness

Witness: Dr. Ken Boston, Chief Executive, Qualifications and Curriculum Authority (QCA), gave evidence.


Q55 Chairman: I welcome you, Dr. Ken Boston, to our deliberations. It is the first time that you have appeared before this Committee-we saw you in a previous Committee on a reasonably regular basis. It was good of you to come here at short notice, given that people-certainly those in Parliament-are close to the time when they disappear from London for their Christmas break. You were good enough to enable us to keep the momentum of our inquiry this side of Christmas, so that we can reach a conclusion early in the new year. We appreciate your taking the trouble to do that. This is an historic day for testing and assessment, although we did not plan it that way. We usually give witnesses a chance to say something at the start, after which we ask questions. Would you like to make a brief statement?

Dr. Boston: I should like to take a couple of minutes to make a statement. Thank you for giving me the opportunity to give evidence to the Select Committee. I shall give a brief preface on standards and national performance. In its regulatory capacity, it is the job of the Qualifications and Curriculum Authority to ensure that assessment standards are maintained year on year for national curriculum tests, GCSEs, GCEs and other qualifications.

The assessment standard is the height of the hurdle that is to be jumped in any examination or test-it is the degree of difficulty. Our regulatory task and the task of our division, the National Assessment Agency, which delivers the national curriculum tests, and the task of the awarding bodies, which deliver the general qualifications, is to keep the hurdle at the same height year on year.

The performance standard is different. It is the number of students who clear the hurdle in a particular year. When we say that standards are rising-as they are-we mean that increasing numbers are clearing the hurdle. I make that point at the start because the two uses of the word "standards" are critically important and have been the source of much confusion. In our capacity-other than regulation and the areas of curriculum, assessment and qualifications development-our role is to work with the Government to drive up performance standards and increase the number of those who clear the various hurdles. We are partners with the Government and other bodies in the national enterprise of raising performance standards overall.

The QCA has been absolutely scrupulous in ensuring that our regulatory decisions are not influenced by political considerations. In my time in the job, at least, Ministers and civil servants have been similarly principled in ensuring that they remain totally disengaged from the QCA's regulatory functions. However, there has always been a logical inconsistency in the body accountable for maintaining assessment standards reporting to Ministers whose job is to drive up performance standards. The Government's decision announced this morning to establish a new body from within the QCA to take over its regulatory responsibilities and report to Parliament, not Ministers, will resolve that difficulty and is therefore very welcome. At the same time, it will allow the QCA to become, in due course, a new organisation to focus on the role of curriculum and assessment, and qualifications, in raising national performance standards.

I would like to say a couple of words about national performance standards and how to drive them up. Performance standards are rising, but in England, as in school systems across much of the western world, the rate of improvement in educational performance has slowed in recent years. If you look at the graph of our performance and those of many other western nations, you will see that the lines are not moving up as steeply as they were a few years ago. In some counties, the graph has virtually reached a plateau.

There seems to be, internationally, a glass ceiling at about the 80% competence level: that is, at the level at which about eight in every 10 young people reach the agreed national bench marks, such as level 4 at key stage 2. However, we are by no means unique. Fullan, Hill and others have shown that the conditions for breaking through that glass ceiling already exist and the difficulty here and elsewhere has not been in finding what to do, but in bringing together in the country's classrooms the things that need to be done.

There are three approaches to teaching and learning that, if brought together effectively within classrooms, will cause individual, school and national performances to move upwards more sharply, with national performance standards potentially rising to the 90% competence level and perhaps above that. The first of those is personalised learning, which is a term that I quite dislike, because it is commonly characterised as putting the learner in charge of the learning, with all the implications of the secret garden of curriculum that we have heard in the past, without the edge of challenge and discipline in grappling with difficulty, which are fundamental to all real learning.

Personalised learning is better described as highly focused teaching, where the teacher is firmly in charge of the process of instruction and designs it to stretch the individual beyond the level of what we might call the comfort zone. There is an educational theory of 30 years' standing underpinning that, which focuses on drawing the learner into new areas of learning that are beyond his reach at that point, but which, with effort and application, are achievable. As I have said, there is ample evidence over the past 30 years to show that that works. Personalised learning is deeply rooted in curriculum, but requires a three-dimensional curriculum that has depth, rather than a two-dimensional curriculum. It should be a deep, rich resource from which a teacher can draw bespoke material to take each young person to their next level of knowledge, skill and understanding.

The second component is systematic and precise measurement in the classroom of the current stage of learning to enable the teacher to shape the next stage for each child. If personalised learning is to drive up performance at individual, school and national levels, it needs to stand on a foundation of frequent, low-stakes assessment of individual performance. That testing needs to happen routinely and incidentally within the classroom as a matter of course. Some of it can be supported by technology, such as a child taking a 10-minute task on a computer to prove for himself and the teacher whether he has yet mastered, for example, percentages and can be challenged with something more demanding, or whether more work on percentages is needed to make him secure. We need to enable teachers to use more of that sort of assessment in schools. There is an immense professional thirst for it and, because youngsters come to see frequent and incidental assessment as integral to their learning and as hurdles to train for and take pleasure in leaping, in that sense they do take charge of their own learning.

The third and final component is professional learning for teachers to enable them to assess teacher performance better and to use the assessment information on each student to design and implement personalised instruction. Teachers need to be able to convert the formative assessment data into information that will enable them to make instructional decisions not at some time in the future-nor at the start of next year or at the end of the key stage-but tomorrow. That is when decisions on intervention need to be implemented.

In England, significant progress has been made on each of those three essential prerequisites, achieving further improvement in school and system performance by bringing them together in classrooms. The new secondary curriculum has been designed to support highly focused teaching in the sense that I have described. That will also be an objective of the forthcoming review of the primary curriculum and of our work with Sir Jim Rose in the context of the broad view of the curriculum in the children's plan. The children's plan puts 1.2 billion into supporting the personalisation of learning over the next three years.

The pilot single-level tests are also a significant step forward in providing information that has the additional potential to provide summative data on school and system performance. The tests represent a substantial investment in addition to the current, key stage tests, which they are expected to replace in due course. There are also, of course, growing data banks of test datums produced by the QCA at the request of Government, such as the key stage 3 SATs, and other assessment instruments developed by the private sector, which will support assessment of separate components for programmes of study.

The assessment of pupil performance programme, which is now being rolled out nationally in both primary and secondary schools, goes to the heart of the teachers' professional learning in making instructional decisions based on assessment information. The Government are committing 150 million over the next three years for the development of staff in assessment for learning.

To conclude those additional remarks, let me say that at the moment I am pretty optimistic about the future. There seems to be a willingness across Government, the teaching profession and the broader public to engage in genuine discussion about the future of testing and assessment and to come out of the trenches to some extent. There seems also to be a real recognition of the importance of three things-personalised learning, formative assessment, and professional development for teachers-which are the essential keys to raising performance standards and the only way in which this country will drive itself through the glass ceiling at around 80 per cent.

Q56 Chairman: Thank you for that introduction, which was a pretty thorough look at the whole field. If we are going to get through all our questions in the time available, the question and answers will have to be quick-fire.

I want to start by asking why all that was necessary. You gave evidence to the Committee not very long ago, when you seemed to be an extremely happy chairman of the QCA. You did not say to us that there is a fundamental problem with the QCA structure and that if only the Government would listen there should be some fundamental changes. Nevertheless, fundamental changes are what we have here. Some of us who know the history and the origins of the changes, over the past 10 or 15 years, feel that we have kind of been here before. Why do you think that the changes have come about now?

Dr. Boston: Our private, but consistent, advice to Government has been that there is a perception that the regulatory decisions could be manipulated by Government, given the way in which we report to Ministers rather than to Parliament. That argument is strong, and we have made it again and again. The Government have accepted the argument in so far as it relates to the regulatory side of our work. The other side of our work will continue much as it is. I believe that that is a step forward.

Q57 Chairman: Do you understand that the regulatory part will be in parallel to what has been established as the relationship of Ofsted to Parliament?

Dr. Boston: I am not precisely sure what the governance arrangements will be, except that it will have its own board, its own chairman and its own chief executive-I do not think that anyone is sure yet and lawyers are looking at the matter. The issue of whether it is a non-ministerial department, or reports to Parliament in some other way, still needs to be worked through as part of the consultation process.

Q58 Chairman: When it was believed that Ofsted was responsible to and answerable to Parliament, there was a hard-fought battle to ensure that it did so through this Committee, or its predecessor Committee.

Dr. Boston: Yes.

Q59 Chairman: So, I assume that constitutionally, the parliamentary relationship will be mediated through a Select Committee.

Dr. Boston: That would be my assumption, but those matters are being considered within the Department, not the QCA.

Q60 Chairman: In broad terms, do you think that this morning's proposals are to be welcomed?

Dr. Boston: Yes.

Q61 Chairman: In their entirety-there is no hesitation, qualification? I won't say the Australian equivalent of welcome, but you know what I mean.

Dr. Boston: With a modest, restrained British approach to things, Mr. Chairman, yes, these proposals are to be welcomed.

Q62 Chairman: Let us drill down a little. In this Committee, and the previous one, we did not see great public demand for these changes. Do you believe that the public were knocking on people's doors-they were certainly not knocking on my door-saying that they wanted a more independent relationship? Or is it that they were worried about standards? There was always a fuss in August when the results came out-the Daily Mail would always tell us that standards were going down and that there was grade inflation and much else. Is that what people are responding to? Is that what the Government have responded to-the furore that goes on in August?

Dr. Boston: Certainly, the Government have listened to and heard our concerns about the ambiguity present where there is a body that, among other things, is responsible for regulation and reports on the maintenance of assessment standards to a Government who are committed to driving up standards to meet particular targets. As I said, in reality, we have not been troubled by this. I do not think that anyone could point to an occasion when pressure has been put on the organisation by the Government or civil servants with regard to standards- certainly, I am totally unaware of it, and I am certain that it has never happened. However, if we consider one of the causes of the August debate to be that the separation of the regulator from Government is not perfectly clear, then that August debate might be diminished if the separation were made more apparent. Of course, there may be other issues in the August debate that are not resolved by that situation.

Q63 Chairman: As you know, August is a slow news time. They always bring the education correspondents back for August, so if they have to write about something, I am sure that they will do so. What is your view of the balance between the agency and the other body? How will it be handled, and how will the two organisations develop?

Dr. Boston: The Secretary of State has asked us to set up an interim regulatory authority. That should be done virtually immediately, and there should be as much distance between the regulatory body and the parent body-the QCA-as is possible by the summer examinations. Of course, the legislation will not be passed and take effect until 2009.

The way we are looking at setting up the interim arrangements is for the QCA board, which cannot be discharged of its regulatory responsibilities without a change in the Act, nevertheless carrying out those responsibilities, not through me as chief executive, but through Isabel Nisbet, the head of regulation and standards, who, it has been announced today, will be the acting chief executive of the new regulatory authority-Ofqual, or whatever shorthand we might finally use to describe it. That organisation will be operating in shadow form from April.

I will not be dealing personally with the awarding body chiefs on matters of standards and I will not be setting levels in relation to national curriculum tests, as I do at the moment. That will be done by David Gee as head of the NAA. I will be responsible for managing the affairs of the board. I will remain the accounting officer for the entire organisation, but the shadow regulator's funds will be ring-fenced. An interim board with an interim chairman will be established for the shadow regulator, and the proposal is that, to all intents and purposes, it should function as a separate body from about April. Not only will it function separately, but it will do so from Coventry, because many of them would otherwise be moving to our temporary premises in the old adult learning inspectorate.

Q64 Chairman: We must get on to the last thing.

We have dipped our toe into the area of testing and assessment. We have already had a lot of written evidence and we have had a seminar. People mostly wanted to talk about, not the constitutional role of the two organisations, or the split between the roles of the organisations-that was hardly mentioned-but too much testing, grade inflation, and a range of things that concern parents, students and commentators. It seems that this is to take our eye off the ball, so that we can say, "Look, this is all alright. We are making some big, grand, but complex changes out there," whereas most parents and students are worried about other things entirely, such as too much testing. Everywhere in the world they say that there are too many tests. Academics come before us and tell us that we test the wrong things or too many things. Those are the real issues, are they not?

Dr. Boston: Yes, they are. Certainly, during the interim period, we will not be taking our eyes off those balls.

Chairman: Let us get drilling now with David.

Q65 Mr. Chaytor: To pursue today's announcement a little further. What will it cost?

Dr. Boston: I do not have an answer to that, but we will be meeting to establish the shadow regulatory authorities for which we will need completely new front-of-house facilities.

From April, if you ring the regulatory authority, you will not want someone from the QCA answering the phone. The media will need to be different, as will the presentation and delivery. We are looking at that, with a view to presenting a budget bid to the DCSF for putting it in place.

Q66 Mr. Chaytor: Do you know at what stage your budget bid will be presented?

Dr. Boston: It will be presented within the next few weeks; by early January.

Q67 Mr. Chaytor: In your opening presentation, you put a lot of emphasis on the distinction between assessment standards and performance standards. In 1996, the QCA's predecessor and Ofsted published a report on assessment standards, saying that there had been no weakening in the previous 20 years. In 2007, can the QCA say that there has been no weakening in assessment standards in the previous 11 years?

Dr. Boston: Yes. I would also have to say that being able to say that is the product of vigilance and monitoring. Of course, when looking at standards, which are made by humans, and evidence produced by full-cohort papers-a new, different paper each year-judgments have to be made about the way in which one paper and performance equates with previous papers and performance, and so on.

Much of our work on maintenance of standards is looking back over a period of time. The reviews that we undertake of groups of subjects over a period of time indicate, from time to time, that in one area there might have been a drift, and that needs to be corrected. In a report earlier this year we looked at music, including elements of the music curriculum and music performance, and there appeared to have been a drift there over five years. That then needs to be corrected by altering criteria with awarding bodies.

It is a process of monitoring, review and adjustment, but taken in balance as a whole-as an overview of the situation-my answer clearly and unambiguously is yes.

Q68 Mr. Chaytor: But will today's announcement about the split of the QCA's functions in any way reduce the likelihood of drift in assessment standards over the next 10 or 20 years? Your argument seems to be that there has been some drift here and there, which is largely the inevitable result of human error and weakness of human judgment that has been corrected. But is there anything in the new structure that will stop that happening?

Dr. Boston: No. The new body-the regulatory authority-will use codes of practice similar to those we have used in the past. It will use monitoring processes with awarding bodies. It may choose to extend its work beyond the work we fundamentally do, which is at the front end of the qualification, developing the criteria and then accrediting the qualification submitted to meet those criteria, and at the end of the process, after the examination is running, looking at whether the code of practice has been applied in the awarding process.

As we move forward with regulation-since Isabel has been with the organisation, she has driven this very hard-we need to be regulating more on the basis of the assessment of risk and going into particular points through the process, rather than focusing initially at the start and, finally, at the end.

Q69 Mr. Chaytor: But none of those issues could not be grasped by the QCA in its present format. Is not that the case?

Dr. Boston: That is true.

Q70 Mr. Chaytor: There is nothing about the new form of regulator that will give an enhanced guarantee of no reduction in assessment standards.

Dr. Boston: It is precisely the same style.

Q71 Mr. Chaytor: What I am trying to get at is this: is the conclusion, therefore, that the only argument for change is to somehow deal with the annual two-weeks-in-August hysteria in the tabloid press?

Dr. Boston: Well, I would not describe it as dealing with the two weeks of hysteria, because while the basis for that might be diminished I am not sure that it is going to go away. The basis of the separation that is occurring is, as I see it, the logical one: a regulatory authority should not be reporting to the political party that is currently trying to drive up standards.

Q72 Mr. Chaytor: In terms of structural change within the QCA, will the existing structure of the organisation adapt itself neatly to a division into the two new functions or will this require a major overhaul?

Dr. Boston: No. This will require some major separation of the organisation. The regulation and standards division is clearly at the core of regulation,

although not all that it does will go to the new regulatory authority. There are other elements in our curriculum division and in the qualifications and skills division, where regulatory work is done. The re-accreditation of A-levels, for example, which is essentially regulatory, is done through the qualifications division as a 14 to 19 qualification. We have to unpick those functions and make provision for that work to transfer to the regulator.

Q73 Mr. Chaytor: Within the QCA as it stands, there are three main divisions. The structure of the organisation is based on three main areas.

Dr. Boston: There are four: regulation, qualifications and skills, curriculum and the NAA, which is the operational arm that delivers the national curriculum tests and the modernisation agenda.

Q74 Mr. Chaytor: In terms of assessment standards and performance, this is a blurring of these two functions across the four divisions.

Dr. Boston: Yes, organisationally there is a bit of a blurring. This is meant to clarify it. Regulations and standards or Ofqual-or whatever we end up calling it in shorthand-sitting at Coventry, will be purely to do with assessment standards and nothing else.

Q75 Chairman: We have a QCA. You are the experts on the curriculum. The Government have just announced yet another inquiry into curriculum, not by you, but by Jim Rose. What is he doing being pulled into that? You are the competent body. You know more about this than Jim Rose. Why are you not doing it? I would be sulking if I were you.

Dr. Boston: The intention announced by Government is that the inquiry will be led by Jim Rose, but that we will work with him as the chief source of advice on evidence and as the body organising and managing a consultation, which presumably will be very widespread. We need to take this out and get genuine consultation with the professionals.

Q76 Chairman: Have they appointed Jim Rose because he is more of a political fixer than you?

Dr. Boston: I have no comment on that, Mr. Chairman.

Q77 Chairman: Some of us on the predecessor Committee were not too keen on the Rose report. He went totally overboard on synthetic phonics, but we hope that he will do a better job with you on the curriculum.

Dr. Boston: He is certainly a very valued member of our board, and I believe that we will be able to work together very effectively to achieve this. Finally, of course, it will be his advice that goes to the Government. There is no question about that, but we will provide the horsepower in shaping that advice and carrying out the consultation.

Q78 Ms Butler: We are all aiming for the same goal: to ensure that our children are very well educated. We also want to ensure that schools are properly evaluated. In your opinion, are there any other ways in which the effects of national policy on the state schooling system could be effectively evaluated? Do you have any ideas or opinions on how it could be further improved?

Dr. Boston: I am not quite sure that I get the question. Do you mean methods other than the current assessment system?

Q79 Ms Butler: Other than the current system and how it works.

Dr. Boston: That question takes us fundamentally to the issue of the fitness for purpose of assessments. What are we assessing and why? That is the area in which the paper that Paul Newton from the QCA prepared for the Select Committee is very helpful. The current key stage tests are absolutely fit for the purpose for which they were designed. That is for cohort testing in reading, writing, maths and science for our children at two points in their careers and for reporting on the levels of achievement. They are assessments that are developed over two and a quarter years, and are pre-tested. They are run through teacher panels, pre-tested again, and run through teacher panels again. The marks scheme is developed over a period of time. In terms of the way in which they are put together, if your purpose is for cohort testing, in these dimensions, these are the Rolls-Royce. You are not going to get better; they are fit for purpose.

The issue arises with any assessment when, having achieved an assessment that is fit for one purpose, you strap other purposes on to it. As Paul's paper shows, there are 22 purposes currently being served by current assessments, and 14 of those are in some way being served by key stage test assessments. Some of those purposes are very close to what is the design purpose, the essential function-the design inference, as Paul calls it. Some of the user inferences-the purposes to which they are put-are much more distant.

One of the things that attracts me to the single level tests is that the Government are now looking at a new suite of tests that will have, not only the summative role-potentially when you add up what children have achieved at the end of the key stage, to get similar data to the summative data that you get now-but potentially a formative and development role because they are taken during the key stage test, and will potentially have less impact on preparation for the test because you are not preparing everyone to take the test at a particular time. You are building children up to take the test when they are ready. My judgment is that, given that there are so many legitimate purposes of testing, and Paul Newton lists 22, it would be absurd to have 22 different sorts of tests in our schools. However, one serving 14 purposes is stretching it too far. Three or four serving three or four purposes each might get the tests closer to what they were designed to do. To take a very simple analogy, Barry, if you want to cut paper or cloth, you have scissors; if you want to slice an apple up, you have a knife; if you want to turn a screw, you have a screwdriver; if you want to open a bottle, you have a corkscrew. To some extent, we are not building tests, we are building Swiss army knives here, and when you put all of these functions on one test, there is the risk that you do not perform any of those functions as perfectly as you might. What we need to do is not to batten on a whole lot of functions to a test, but restrict it to three or four prime functions that we believe are capable of delivering well.

Q80 Ms Butler: Do I take it that you believe that the Government's direction of travel in stage not aid is the right direction to be travelling in?

Dr. Boston: Yes. That is a very important step forward, and I think that the single level tests that are still in pilot stage have the prospect of combining both a summative assessment and a formative assessment. They are across a whole programme of study; it is not simply like my example of testing a youngster on percentages and moving along. It will provide us with progress data-summative data as they go through it-as well as formative data as they go through the key stage itself.

Chairman: Annette will lead us through the next section on the purpose of testing figures for purpose, which we have started, but we are going to continue.

Q81 Annette Brooke: Can we backtrack slightly and look at degrees of error, certainly in validation? We had the statistic in an earlier sitting that up to 30% of candidates in public examinations are awarded the wrong grade. We can touch on the issues of consistency of marking and actual mistakes in adding scores together, but what sort of percentage error are we looking at that is due simply to the nature of the design of the test? It may be that a student hits a whole set of questions and does not know the answer to those particular questions. In other words, what aspects are there other than the obvious mis-marking and adding-up errors?

Dr. Boston: I cannot say that in any test there will be this percentage of error, but there are sources of error. The figure of 30 per cent. is a very high figure, which I have heard before and it certainly pulls you up. What are the controls we have over this? We have the nature of the mark scheme and how precise and definitive that is, in terms of allocating scores. We have performance around grade boundaries, where a score might be just above or below a grade boundary. More and more information is now being given by awarding bodies to candidates, including the return of scripts at GCSE and A-level, if you want them, and there is greater diagnosis of performance, particularly from Edexcel. If there is error, the objective is to detect it and then resolve it. The process of lodging an appeal after a result and having that heard and the paper re-examined is a legitimate and important part of the whole thing. We cannot say that the system works impeccably unless there are such robust appeal processes and they are seen to work.

Q82 Annette Brooke: Given that the 30% figure has been mentioned, surely that is something that you have investigated fully and looked at the evidence for? Can we really say that the Government are quite justified in being confident in the test results that are finally published?

Dr. Boston: Yes, I can certainly say that we are confident in being published. However, it must be said that there are various views of comparability which compound all of this, and make people wonder whether the standards or grades are being met. One of the most recent arguments about grade inflation has been the work that Robert Coe has run from Durham, which has been interesting work to look at. He has taken what he called the test of developed ability, which was a notion of innate ability-developed ability-in an individual, and he took the example of the person getting A-levels and said that A-levels had become two grades easier over the last 20 years, and that that was a problem.

Q83 Annette Brooke: I am not really talking about grade inflation at the moment. I am actually talking about fundamental errors and confidence in the system. I agree that grade inflation is undermining confidence, but in this sitting we are not concentrating on that.

Dr. Boston: Error exists. As I said before, this a process of judgment. Error exists, and error needs to be identified and rectified where it occurs. I am surprised at the figure of 30%. We have been looking at the range of tests and examinations for some time. We think that is a very high figure, but whatever it is it needs to be capable of being identified and corrected.

Q84 Annette Brooke: What is the primary purpose for which key stage tests are designed? We were talking about a variety of purposes. What is the No. 1 purpose?

Dr. Boston: The No. 1 purpose is to decide the level that a child has reached at the end of a key stage.

Q85 Annette Brooke: Given that the tests are also used for the purpose of assessing a school's performance by parents, local authorities and the Government, and that we want more teacher assessment-you said that yourself-and full professional development, do you think it is reasonable to ask teachers to judge pupils' performance when they themselves and their schools are being judged by the results? Is there not a major conflict here?

Dr. Boston: The use of diagnostic assessment and assessment of pupil performance, and training teachers to have an understanding of standards and to be able to decide where their children rest, where their achievement is, is very sound. I have talked about teacher assessment before and see immense value in it, and in the Institute of Educational Assessors in moderation, but I have not signed up to the abolition of external tests and to the elimination of external marking. I certainly think that it has a place and that in any assessment system a balance is needed between internal and external, but I certainly would not sign up for a summative assessment process that did not include a significant component of external marking.

Q86 Mr. Chaytor: Of the 22 purposes to which the assessment results can be put, you stressed what you think are the most effective purposes, which are best served by the current system. Which of the 22 are least well served by the current arrangements?

Dr. Boston: With regard to the personal value of students' achievements and the formative assessment to identify students' proximal learning needs and guide subsequent teaching, the national curriculum tests are less effective than the new tests-the single-level tests-will be.

In respect of student monitoring to decide whether students are making sufficient progress in attainment in relation to targets, the single-level tests will do that better but, no, these are not the tests to deliver the diagnosis of learning difficulties. We could develop better and simpler tests to identify the general educational needs of students to transfer to new schools. We can use key stage tests to segregate students into homogeneous groups or screening to identify youngsters who differ significantly from their peers, but we could simply design better ones as well. I will not go through the 14; it is a matter of stripping down. The tests are good at assessing institution performance; a standard test is applied to all schools in the country to children of the same age and it will give you at one level a measure of the performance of that institution. You might want to moderate that when you to come to setting targets for that institution in terms of its intake, but they are pretty good at that.

Q87 Mr. Chaytor: In terms of institutional performance, does it follow that the function of school choice is effectively served by the current tests?

Dr. Boston: It could be served by a better test.

Q88 Mr. Chaytor: You have been very strong on the effectiveness of the tests and the importance of full cohort testing. But full cohort testing is not the only way of getting the information that the Government and the public require. Why has the QCA been so resistant to techniques of light sampling?

Dr. Boston: I do not think that we have been resistant to it. In fact I think we were the first people to start talking about it publicly. We offered advice to the Government and the Government were not at that stage heading in that direction. They were heading in the direction of the progress tests, as they were then called, or the single level test, which I think is fine. But one of the issues with the key stage tests is that they are a full cohort test. There is a new test each year. They take a long time to develop and then all the test items can no longer be used again. The Government set great store by sample tests such as PIRLS, PISA and TIMSS. In other countries such as America, for example, the national assessment of educational progress is a test of a statistically valid sample, which takes the same test items each year. It is slightly changed, but it is basically the same thing. It will give you an absolute measure of whether standards on that test are rising or falling.

It is horses for courses. There are ways in which this can be organised. The way that the Government are moving is to go for the single level tests, which I strongly support. But we need to be wary, if we are to have single level tests but phase out key stage tests, that we do not saddle the single level tests with these 14 functions and that we use the single level tests for some of the functions and have other sorts of tests for other functions.

Q89 Chairman: If we have been using all these tests for 14 different things all this time, is it legitimate for people like us to say to you, well where was the QCA? Have you been telling Ministers over all these years that this is a ridiculous system of testing and that it so wide that we are picking out 14 different outcomes and that you need to divide into four very specific groups-your corkscrew, your screwdriver and so on? Where have you been? Have you been telling the Government this for a long time and they just would not listen?

Dr. Boston: No. I do not think that that would be fair for me to say. The discourse on what assessment is about and how we do it is a public debate.

Q90 Chairman: I am sorry, but most of my constituents do not believe that. Parents of children taking tests believe that you are the person who looks after this sort of stuff, and that if you do not like what is going on, you should tell the Government that they should do something about it, and, if it really came to it, that you would come out from your corner and say that tests are not fair.

Dr. Boston: I am certainly not saying that the key stage tests are not fit for purpose. I am saying that there are some purposes for which they are far fitter than others. They can be used for these purposes. There is no question about that. But for many of them there is a better way to do it.

Q91 Chairman: That is what our expert witnesses have been saying: there are too many tests. You have not really answered that. We went to New Zealand and they said that they would like to know more about our students, but that to test at 7, 11, 14, 16, 17 and 18 we must be crazy. Why does the QCA never seem to say anything about the number of tests and the fact that the other expert witnesses say that those tests are not fit for purpose?

Dr. Boston: I do not believe that there are too many tests, particularly in primary education. There is undue pressure on preparation for the tests, but if we consider the amount of time that is actually taken up by the testing process, it is not high, and it is certainly higher in some other countries. In secondary education it is far more intense. There is no question about that. Our concern-or my concern-has not been with the burden of assessment, as people sometimes refer to it, but with the high stakes put on the assessments because, in the case of key stage tests, they carry 14 different functions.

Q92 Chairman: That is what we keep coming back to. Why have you not blown the whistle on those 14 different functions and said that they should not be used in that way?

Dr. Boston: I provide advice to Government. I am not out there as an independent commentator.

Q93 Chairman: Are you saying that you have told the Government that they are not fit for purpose for a long time and they have not reacted to that?

Dr. Boston: No. I have never told the Government that these tests are not fit for purpose because I do not think that that is the case. I think that they are fit for purpose. I have certainly said that there are many purposes that would be served better by different sorts of tests. Indeed, as you know, some time ago I raised the issue of sample testing, on which the Government were not keen for other reasons.

Q94 Chairman: What about the other point that we picked up on in the evidence-that people said that because you have not blown the whistle on the tests, they drive out the ability to teach a decent curriculum; that the teachers are just teaching to the test and cannot explore the curriculum?

Dr. Boston: Fundamentally, our task has been to develop, deliver and build these tests and to make sure that the results from them are valid. Although I admit that there are some errors in them, we make sure that there are processes for that error to be identified and for the problem to be resolved. We have been extraordinarily forward in pushing for the introduction of more technology and scanning, precisely for reasons of improving the quality of marking.

Q95 Chairman: We visit examining boards and often the progress and innovation comes from them, not from you. I get the impression that you are running behind Cambridge Assessment and Edexcel. They are teaching you how to do that stuff.

Dr. Boston: Edexcel, which was the first to get into online scanning and marking in this country, would not have got there without the very strong support that it had from QCA, both publicly and through the Government. The fundamental argument related to improvements in the quality of marking. You will remember the fuss that occurred at the time when the contract went to Edexcel-or Pearson-about bringing in a private, overseas company to run marking when previously it had been done by charities. The argument that we publicly and strongly ran then was that that was the way forward. It was the way to guarantee quality in marking and to eliminate problems because second marking would take place alongside first marking with the material coming up on the computer screen.

Chairman: We will drill down on that in a minute if you do not mind. I want to call Stephen now to talk about test targets and tables.

Q96 Stephen Williams: How do you go about deciding what a child should know at each particular stage in their life? Ten days ago we had the key stage 2 league tables reporting that children by age 11 are meant to reach level 4 across the subjects. How was it decided what the content of level 4 is and what the target is for an 11-year-old to get to that level? What process is gone through to reach those two things?

Dr. Boston: That is a very technical question that I am sure someone behind me could answer if you were prepared to let them, or they could slip me notes and I would attempt to make a fist of it.

Chairman: It must be the latter and not the former, otherwise Hansard will be driven up the wall, so if you do not mind, we will be happy to give you some time for someone to supply a note. Stephen, do you want to change the drift of your questions until you get an answer on that?

Q97 Stephen Williams: I will ask the question in broader terms because, in the introductory session, Ken basically said that there was a constitutional nicety, a separation of powers between a regulator of standards which ensured that everyone had confidence that nothing was being politically manipulated. Is it the QCA, all the advisers sitting behind you and those behind them back at headquarters who decide what level 4 is and how many children should reach it by a given age, or is that box-ticking mentality started in the Department and you are told to design a curriculum to deliver that?

Dr. Boston: I understand the question, but I know that the experts sitting behind me will give a better answer than I could, so can we move on and I shall take the question in a moment?

Chairman: Lynda, did you want to come in on that point?

Lynda Waltho: No, I wanted to follow on later.

Chairman: Carry on, Stephen.

Q98 Stephen Williams: Ken, you said that the primary purpose of all key stage tests was to assess the individual performance of a child, yet what gets all the attention-I refer to the August media frenzy-is the performance of a school, which is an aggregation of the performance of all the individual children. Do you think a fair outcome of the tests is that schools should be held to account or do you think that is a minor delivery of the system?

Dr. Boston: No, I am firmly of the view that schools should be held to account. I believe in full cohort testing. I believe that full cohort testing and summative assessment have a place and that holding schools to account for what they achieve is important.

Q99 Stephen Williams: When you say that you believe in full cohort testing, I can see that means that you believe in testing every child, but do you also believe that the publication of the aggregate results for every child is fair on the school, or would a national sample be a better way of measuring standards across the country?

Dr. Boston: Yes, I certainly believe in reporting the achievements of that school. These are the other schools.

Q100 Stephen Williams: That is based on the aggregate of the results for each child in the school, so would you reject alternatives completely?

Dr. Boston: It depends on the purpose. If your purpose is to find out whether children are writing and reading as well as they did 10 years ago nationally, the best test is to give a sample of them virtually the same test as was given to a sample of them 10 years ago. That will tell you whether we have gone up or down. If you want to report on the performance of a school this year in relation to the school next door, the sample will clearly not do that, but the full cohort test will. It is again about purpose. Both purposes are legitimate and some of the difficulties with the testing programme or the examination programme when looking at whether standards have changed significantly year on year or over a 20-year period, are that the curriculum, teaching methods and other things such as class size have changed. If you really want to know whether people are better at reading than they were 20 years ago, give them the same test.

Q101 Stephen Williams: I see that you have had time to look at the note that has been passed to you.

Dr. Boston: This is Dr. Horner's contribution in handwriting. The report of the task group on assessment and testing in 1989-while I was still elsewhere, Barry-decided on a 10-point scale and proposed a graph of progress that identified level 4 at age 11. That report was then published, presumably. We are now on an eight-point scale-aren't we?-so that 10-point scale has been reduced. That does not fully answer your question, Stephen.

Q102 Stephen Williams: No, four out of 10 is 40% and, on the same scale, four out of eight is 50%, so it seems to be a completely different target. Perhaps we are getting a bit too technical. I was trying to get at whether the Government were feeling the QCA's collar in respect of how we set the standards for children. Therefore, is it right that we have a separate regulator of standards for the future?

Dr. Boston: Barry, I should very much like to give you a written statement tomorrow in answer to this question.

Chairman: Okay.

Q103 Stephen Williams: Before the note, I was going to mention the difference between how a child's performance is assessed and how a school's performance is assessed. You were going into how children's' performance was assessed over time. Is there another way in which you can assess a school's effectiveness apart from the league table mentality that we have at the moment? Is there an alternative? After all, they do not have league tables in Wales or Scotland.

Dr. Boston: You can certainly assess the performance on the basis of teacher reporting, as against the school reporting its performance against a template of benchmarks, perhaps, as occurs in some other countries, and reporting its testing of its students against national averages in literacy, numeracy and so on. I know of cases where that occurs.

Q104 Stephen Williams: You are a man of international experience. Do you think that anywhere else does it better than England-whether a state in Australia or anywhere else-without this sort of national frenzy every August, with people wondering whether things are going down the pan or the Government saying, "No, things have only ever got better"?

Dr. Boston: I think England is pretty rare in the way it does this in August, although I would not say unique.

Q105 Chairman: Is that good or bad?

Dr. Boston: The annual debate about whether too many have passed and whether standards must have fallen is a very sterile debate and I would be glad to see the back of it. If it is right that this new regulator will lead to the end of that, it is a good thing. We are not so sure that it will. There are other, better, ways of celebrating success and achievement-not questioning it.

Q106 Stephen Williams: Do you think that any particular country does it a lot better than England?

Dr. Boston: No. I think that in other countries where the results come out there is less public criticism of youngsters on the basis that, because they have three A grades, the result must be worthless. Such criticism is a very bad thing.

From my previous experience, I have a great interest in Aboriginal education. There was 200 years of Aboriginal education in Australia with absolutely no impact on the performance of Aboriginal kids until we introduced full cohort testing and reporting at school level. Then, suddenly, people took Aboriginal education seriously and it began to improve.

Q107 Stephen Williams: If nobody else wants to come in on this section, I should like to ask one last question, going back to the start, about introducing the new regulator. We have only had the report, "Confidence in Standards", from the Secretary of State today, and we have not been able to digest it fully yet. When were you consulted about the split in the QCA's responsibilities? Was it before, during or after the August round of exam results that we had a few months ago?

Dr. Boston: It was after.

Q108 Fiona Mactaggart: You have been talking clearly about the difficulty of tests fulfilling 14 different purposes. The fact is that they fulfil some of those inadequately. You suggested that the best way to see if children over time are able to achieve the same standard is through sampled testing. Do we do very much of that, and if not, why not?

Dr. Boston: Those tests will tell you whether performance on a particular task has improved over time. We do not do that as a country. We pay a lot of attention to PIRLS and PISA and the national maths and science study.

In developing its key stage tests from year to year, the QCA does pre-test. Part of those pre-tests in schools, which youngsters think are just more practice tests, are pre-tests for what we will use in 18 months' time. In them we often use anchor questions, which are the same questions that have been asked a few years before or in consecutive years. They might be only slightly disguised or might not be changed at all. That is to help develop tests that maintain standards so that level 4 is level 4 year on year. The boundary between the levels is set by the examiners. It might be 59 in one year and 61 in another, but they know that in their judgment that is a level four. They draw on those tests.

We have not used the tests systematically enough to say, "We used these six questions for the past eight years and we know that students are getting better at reading or worse at writing," but that is the basis on which we develop and pre-test those emerging assessments.

Q109 Fiona Mactaggart: I am struck by this. You are saying that we do it a bit to ensure the comparability of tests over time. We all accept that some of that kind of work is a necessary function of getting accurate summative tests, but there is a constant threat in debate in assessment about whether standards have changed over time. I do not think that I properly understand why we have not bothered to invest what does not strike me as a very large amount of resource in producing that kind of sampling over time to see whether standards are improving or weakening, and where. We would then have a national formative assessment about where the strengths and weaknesses of our education system are over time. Do we have a mechanism that is designed to do that? If not, why not?

Dr. Boston: No, we do not. We use PIRLS and PISA and in the recent results, they confirmed what we already know; for example, at PIRLS level, the line I talked about is not as steep as it was before. It has flattened off, but has not come to a plateau. The notion of a sampling programme is something that we have raised with Government. Some years ago, before I came into this job, there was the Assessment of Performance Unit, which did some of that work. That is no more. I do not know the background and the reasons why the work was not pursued, but it was work of this sort. It would seem to me that we need to be thinking not of either/or. That is the message that I really want to get across. We are not thinking of key stage tests or single level tests or sample tests. If we want to serve those 22 legitimate purposes of testing-I am sure there are more-we need a number of tests that will deliver between them all those things, but which are designed so that they are very close to what Paul Newton calls the design inference, where the user inference and the design inference are very close indeed.

Q110 Fiona Mactaggart: What I do not understand about the proposed new system is that if we developed a wider range of tests to separate some of these functions more precisely so that we get more accurate information rather than trying to infer information from tests that are designed to do something else, which is what we do at present, who would take the lead in developing the sample tests and introducing them? Would it be the QCA or the new regulatory authority? I have not had time to read through the document, but I do not understand whose job is what.

Dr. Boston: It would be the QCA, and it would do its work partly through stimulating the private sector market and the awarding bodies to work with it. Presumably the QCA would take the initiative on remit from the Government. That would be critical: the Government would decide that they wanted a set of new tests. We did not go out and invent single level tests. We were remitted to do them. We produced them at Government request, and with our very strong support. So the initiative would rest fundamentally with the Government, but the body that would lead on it would be the QCA, or whatever the QCA might end up being called some time in the future. The regulatory authority is to ensure that, once the product-the assessment-is there, it delivers on standards and maintains standards. The regulator is not a development authority; it is an authority to regulate products and ensure their quality once they are there.

Q111 Fiona Mactaggart: When you were remitted to develop the concept of single level tests, were you remitted to develop a test that was a one-way street, rather than a test that could be re-administered? I gather that the National Foundation for Educational Research is concerned about the fact that this is just a single pass test and that someone who chooses when they do it might pass then but might not necessarily pass it a month later.

Dr. Boston: We were remitted to produce a test which would be taken as a one-off. Further down the track if we get to a point, as I think we might, where single level tests are available virtually on line, on demand, we would need to go to a data bank of test items. What we have at the moment is a level 3 test or a level 4 test. A judgment is then made on the score you get about whether you are secure in level 4. That test is then finished with. The time may come in the future, as with key stage 3 ICT tests, where there is a computer in the corner on which you can take at any stage your level 4 or level 5 reading test. That would depend on a data bank. In that sense it is constantly renewable, if I understand the question correctly.

Q112 Fiona Mactaggart: It was not so much about whether it was renewable. If the teacher of the child can choose the moment at which the child takes a single level test and it is a propitious day for that particular child, the child may do well in the test and succeed, but it might still be rather a frail attainment. There is anxiety about whether that is a fully accurate picture of the child's capacity and the general learning level even though they can do it on a fair day with wind behind them.

Dr. Boston: I am remiss in that I have not fully explained the relationship between the assessment of pupil performance and the tests. The APP programme is designed essentially to produce greater understanding among teachers about what is represented by a level-the profile of a level 4 in reading, the profile of a level 5 in reading and the difference between them. It represents the different indicators that show a child is either at level 4 or level 5, and the child is then entered for the test. The test is meant to be confirmation that the teacher has made the judgment correctly.


Sitting suspended for fire evacuation.

On resuming-

Chairman: Dr. Boston, we are back in business, although only briefly. I suspect that we will have to call you or your team back at some stage, because this has been unfortunate. I will give a question to each member of the team, and you will answer speedily. I will start with David, followed by Stephen, then Fiona, and I will finish.

Q113 Mr. Chaytor: On maintenance of standards, will the new A* grade at A-level have the same pass rate in all subjects across all examining boards?

Dr. Boston: No.

Q114 Mr. Chaytor: Does the existing A-level threshold have the same pass rate in all subjects?

Dr. Boston: No.

Q115 Mr. Chaytor: Does that cause a problem?

Dr. Boston: No.

Q116 Mr. Chaytor: Will there not be a huge discrepancy between different subjects in different boards?

Dr. Boston: The A/B boundary is set by professional judgment. The reality is that subjects are different; there is no attempt to say that, for example, 10% must pass or have an A grade in every subject. No country in the world achieves precise comparability between subjects in terms of standards. Australia tries to do so: it takes all the youngsters who get a certain grade in, for example, English, geography, and art, and, if they find that a lot of the youngsters who are taking those three are getting higher grades in geography than in the other two subjects, then they deflate the mean of geography. Some pretty hairy assumptions underlie that. Here, an A/B boundary is set by professional examiners broadly at the level that a hard-working, well-taught, student who has applied himself or herself fully would achieve on a syllabus or specification.

Q117 Mr. Chaytor: Are the thresholds for subjects on examining boards matters of public record? That is, is the percentage score that triggers a B, an A or an A* on the record and available to pupils and parents?

Dr. Boston: The answer is no, I believe.

Q118 Mr. Chaytor: My next question is, should it be?

Dr. Boston: I would think not.

Q119 Mr. Chaytor: Why not?

Dr. Boston: The essential point is that you might have a harder paper one year than another, in which case the boundaries might change significantly. The point is not the numerical score where the boundary is drawn. The fundamental point is the professional judgment of the examiners, who decide where the A/B boundary is and where the E/U boundary is. They do that on the basis of their experience and past statistical evidence using papers of similar demand.

Q120 Fiona Mactaggart: Does the fact that schools are held accountable through tests that are really designed to be summative tests of children's achievement mean that teachers teach a less-rounded curriculum?

Dr. Boston: My only reaction to that is absolutely anecdotal. We have a network of 1,000 schools to which we relate intensively, and I have been told by people at the QCA who work closely with schools, and from what I hear from professional bodies, head teachers and so on, that their answer to that question is frequently yes. I do not run a school, and I do not have first-hand evidence of that, but all the evidence that I hear in my position is about the narrowing of the curriculum that results from these tests. Presumably, there may be some better approach to that with the single-level tests. I have also spoken to many head teachers who are probably the exception to the rule and say, basically, the objective is good educational nutrition for these youngsters, and if they have got that they will pass the tests. That is a better way than simply narrowly training them to take the assessment.

Q121 Fiona Mactaggart: I am sure that they are right. However, because of lack of self-confidence and other things among many teachers, such teachers are not in the majority, I suspect. Would it be possible for you to devise a test? I have listened to you speak about testing. Your commitment is to using testing to improve the quality of education for children, yet here seems to be some evidence that in one respect testing in Britain is narrowing the quality of education for our children. Could you devise a separate way of holding schools accountable, which could avoid that difficulty so that that function is dealt with differently from the way in which we were assessing children's attainment?

Dr. Boston: Holding them accountable for what?

Q122 Fiona Mactaggart: For the quality of teaching. At the moment, they are held accountable by the attainments of the children through examinations.

Dr. Boston: I see the desirability of the aim, but at the moment I cannot neatly and glibly say, "Yes, we could do this, this and this." I see the point of the question.

Q123 Fiona Mactaggart: In the meantime, is there anything you can do to reduce the burden of testing in terms of the rest of the curriculum?

Dr. Boston: Apart from providing advice to Government on assessment reform, I cannot see a way in which, within the ambit of the QCA itself, we could be catalytic in producing that change.

Q124 Stephen Williams: Perhaps I can go back to the subject of what was called the historic day-I assume that that reference was to the announcement on confidence and standards that was made earlier today. In my earlier question to Ken, I asked him when he was consulted about the split, and about the setting up of the new organisation. Have you been consulted on the structure? I have been reading chapter 2 during our sitting, which does not make it clear whether there will be a sort of Ofsted, with a chief inspector and a board. I think that I heard you refer to a board-is that right?

Dr. Boston: We have certainly been consulted, and our advice has been sought on where we might go from here now that the Government have made the decision to go ahead and now that consultation has happened. The intention, as I understand it-I thought that it was set out in the document-was that there should be a non-departmental body with its own board and its own chief executive. I have no detail beyond that at this stage. We have been consulted and have been asked for quite detailed advice on how we might set up shadow arrangements-I described our proposals on that earlier. They have still to be accepted by Government, but they seem to be an intelligent way forward.

Q125 Stephen Williams: If we assume that there will be a board-I cannot see that in the document, but I have only skim read it so far-what sort of people should be on it? In relation to A-levels, do you agree that it would be sensible for universities to be represented on the board, given that roughly 90% of children who achieve A-level standards now continue to higher education?

Dr. Boston: The regulator will of course be responsible for all qualifications-not just the general ones but vocational and adult ones, too. The regulator will clearly have a role in devising new approaches to the recognition of awarding bodies, including the post-Leitch recognition of employers as both awarding bodies and training providers. The board of the new body would, I think, need to consist of higher education representatives, business representatives and teaching profession representatives. It would probably be pretty similar in composition to the current QCA board.

Q126 Chairman: We shall have to finish now, but is it right that you have a choice as to which way you jump? Can you choose which organisation you opt for?

Dr. Boston: No, I will continue as chief executive of the QCA.

Q127 Chairman: I have one last question. When we pushed you today, you tended to say, "But I'm a regulator." In a sense, therefore, some of your answers have persuaded me that the reforms are right. When I asked you why you did not push for the reforms or take a certain course in advising the Government, you showed a certain unhappiness. The indication was that there was a functional stress between the two roles. Is that right?

Dr. Boston: There is a stress, yes. I am not an independent commentator on education. I certainly have a responsibility under the current legislation to be absolutely separate from the Government and from everyone on maintenance and regulation of standards. My position has always been that the minute any Government attempted to interfere with that, I would be the first to declare it publicly.

On issues such as the curriculum and provision of qualifications, the current role is to advise the Government. We do not have the capacity to go out and say that we are simply going to introduce a new form of testing in two years' time. Those decisions are for the Government-they always have been, and they always will be. There has been tension, and you have exposed it cleverly in our discussion.


Chairman: Ken Boston, it has been a pleasure to have you here. I am sorry that we were disrupted and that there is unfinished business that perhaps, when you return from Australia, we can revisit with you. Thank you to all those who have attended. I wish a happy Christmas to everyone.