House of COMMONS




Monday 10 December 2007


Testing and assessment



Evidence heard in Public Questions 1 - 54





This is an uncorrected transcript of evidence taken in public and reported to the House. The transcript has been placed on the internet on the authority of the Committee, and copies have been made available by the Vote Office for the use of Members and others.



Any public use of, or reference to, the contents should make clear that neither witnesses nor Members have had the opportunity to correct the record. The transcript is not yet an approved formal record of these proceedings.



Members who receive this for the purpose of correcting questions addressed by them to witnesses are asked to send corrections to the Committee Assistant.



Prospective witnesses may receive this in preparation for any written or oral evidence they may in due course give to the Committee.


Oral Evidence

Taken before the Children, Schools and Families Committee

on Monday 10 December 2007

Members present:

Mr. Barry Sheerman, in the Chair

Ms Dawn Butler

Mr. David Chaytor

Mrs. Sharon Hodgson

Fiona Mactaggart

Mr. Andy Slaughter

Lynda Waltho

Stephen Williams


Examination of Witnesses

Witnesses: Professor Sir Michael Barber, Expert Partner, Global Public Sector Practice, McKinsey and Company, and Professor Peter Tymms, Director of Curriculum, Evaluation and Management, School of Education, Durham University


Q1 Chairman: May I welcome Professor Sir Michael Barber and Professor Peter Tymms to the first evidence session of the new Committee? We have been busily building the team, seminaring and deciding our priorities for investigation, but this is our first proper session, so thank you very much for being able to appear before us at reasonably short notice.

Both of you will know that our predecessor Committee started an inquiry into testing and assessment. It was a quite different Committee, but with its interest in schools, it decided to embark on a serious investigation into testing and assessment. It managed to tie up with a nice little bow almost every other area through 11 different reports in the previous Parliament, but it could not conclude this one. It troubled people to the extent that copious volumes of written evidence had come to the Committee, and it would seem wrong if we did not make such an important issue our first topic, pick up that written evidence, slightly modify and expand the terms of reference and get on with it. So, thank you very much for being here.

You are key people in this inquiry: first, Michael, because of your association with testing and assessment, through which many of us have known you for a long time, right back to your National Union of Teachers days; and secondly, Professor Tymms, through your career in a number of institutions, where we have known you, and known and admired your work.

We generally give witnesses a couple of minutes to make some introductory remarks. You know what you have been invited to talk about. If you would like to have a couple of minutes-not too long, although a couple of minutes is probably a bit short-to get us started, then I shall start the questioning. Peter, you were here first, so we shall take you first.

Professor Tymms: I am director of a centre at the University of Durham which monitors the progress of children in order to give schools-not anybody else-good information. It provides us with a tremendous database from which to view other issues, meaning that I have taken an interest in all the different assessments-key stage and so on. They have concluded that standards in reading have stayed constant for a long time, but that in mathematics, they have risen since about 1995. Those are the headlines on testing.

On the introduction of new policies, I am keen to say-I might return to this-that there is a need for good trials. If we try something new, we should get it working before we move it out to the rest of the public. I am very keen for new ways of operating to be properly evaluated before they are rolled out, and then to be tracked effectively. We have been missing that.

Chairman: Thank you.

Sir Michael Barber: Thank you very much for your invitation, Chairman.

I shall comment on the story of standards in primary schools, which I see in four phases. The first came between 1988 and 1996, when the then Conservative Government put in place the national curriculum, national assessment, Ofsted inspections, league tables and the devolution of resources to schools. There were lots of ups and downs in that story, but nevertheless that framework was established.

Secondly, there was the phase with which I was associated-Government policy under David Blunkett who was the then Secretary of State for Education and Employment-during which there was a focus on what we called standards, rather than on structures. A big investment in teachers' skills, through the national literacy and numeracy strategies, led to rises in the national test results. I have always accepted that some of that was down to teaching and the tests, but a lot of it was down to real improvements evidenced by Ofsted data and international comparisons.

In the third phase, between 2000 and 2005, the Government were focused largely on long-term, underpinning and structural reforms, including of the teaching profession, of secondary education and the introduction of the children's agenda, at which stage results plateaued. Things got harder, too, because we had picked the low-hanging fruit, as it were. I think that we should have stayed much more focused on literacy and numeracy, in addition to the others things that we did. That was my error. Now there is an opportunity to make real progress on literacy and numeracy as a result of the Rose review last year and the new emphasis on phonics. By the way, I completely agree with Peter on the pilots and progression. If all those things are put together, I could envisage a fourth stage, during which we can begin to make progress.

In summary, we have gone from being below average, on international comparisons, to above average-we are above France, Scotland and the EU average. However, we have a long way to go and significant improvements to make. If we want to be world class, we must do more.

Q2 Chairman: Thank you for those introductory remarks.

I remember taking the Committee to New Zealand where people wanted to be able to assess more carefully the progress of students and were looking at what we had done. I recall their horror when it was suggested that they might adopt our system. They said, "We want to know how our young people are doing, but we do not want to go to the extent that you are of testing at so many ages." Are you sympathetic to that point of view? Do you think that we over-test?

Sir Michael Barber: Personally, I do not think that we over-test in primary schools-if that is what you are talking about. Primary school children take literacy and numeracy tests aged seven and externally-set and marked literacy, numeracy and science tests aged 11. That is a relatively small number of tests during a six-year primary school career. The information provided by the tests is fundamental to understanding how the system is working and to looking for strategies for future improvements. I do not think that we over-test at all.

Q3 Chairman: Even if that adds up to ages seven, 11, 14, 16, 17 and 18?

Sir Michael Barber: I focused my answer on primary schools. There is a separate debate to be had about secondary examinations and tests at ages 14, 16, 17 and 18. However, at primary level, we conduct the bare minimum of testing if we want to give parents, the system, schools and teachers the information that they need, at different levels, in order to drive through future improvements. One of the benefits of 10 years, or so, of national assessments is that this system has better information with which to make decisions than many others around the world.

Professor Tymms: I do not think that testing at seven and 11 is too much testing. However, if you have a system in which you take those tests, put them into league tables and send Ofsted inspectors in to hold people accountable, schools will test a lot more. So we probably do have too much testing in the top end of primary schools, but that is not statutory testing. It is the preparation for the statutory testing, so it is a consequence of what is happening.

Of course, we do need the kind of information that those tests were designed to get at. You mentioned the need to know what our children are doing and their levels. If we wanted to know the reading standards of 11-year-olds in this country, we could probably find out by assessing 2,000 pupils picked at random. We do not have to assess 600,000 pupils. One purpose is to know what the levels are, which could be done with a sampling procedure, with the same tests every year, which would be secret and run by professionals going out and getting the data. There is another kind of information, for teachers about their pupils, which they could get by their own internal tests or other tests if they wanted, and another kind of information for parents. There is an interface: how do they get that information? Do they go to the schools, or do they read it in their newspapers? Do they know about their own pupils? Those layers of information, and how to get them, provide the complex background to the answer to your question.

There is too much testing, but not because of a single test at 11-for goodness' sake, children can do that. I think that I was tested every two weeks when I was about eight years old, and I quite enjoyed them. Not all children do, but the possibility of that exists. We need good information in the system for parents, teachers and Parliament, and we need to know it nationally, but we do not necessarily have to do the sort of testing that we currently have to get that information. There are different purposes and reasons for doing it. I guess that I can expand on that as you need.

Q4 Chairman: But Michael is known to believe-I am not setting you against each other-in the notion that testing would drive up standards. It was the "engine", was it not? I am not misquoting you, am I?

Sir Michael Barber: It is not a misquote, but it is not a complete view of what I believe. I believe that, in order to drive up standards, we need a combination of challenge and support. Assessment and Ofsted inspection provide the challenge in the system, and then we need serious investment in teachers and their skills, pay and conditions. I am in favour of assessment, being able to benchmark schools and the information that that provides to heads, teachers and parents. I agree with Peter that there may in addition be an advantage to sampling techniques, probably linked with the international benchmarks to assess the performance of the whole system.

Q5 Chairman: I have slightly misquoted you: testing was "the engine to drive performance", I think you said.

Sir Michael Barber: But I am saying that the accountability system on its own is not enough. You need investment in teachers' skills, which is what the national literacy and numeracy strategies did. They gave teachers the skills and wherewithal to understand how to teach reading, writing and mathematics. The evidence of that is powerful. Only recently, the effective pre-school and primary education research programme, which Pam Sammons and others run, has shown clearly the benefits in student outcomes if teachers teach the last part of the literacy hour well-the plenary. Detailed pedagogical skills need to be developed by teachers, which needs an investment. Obviously, you also need to pay teachers well, ensure that the system is recruiting enough teachers and devolve money to the schools. I am strongly in favour of the challenge that comes from an accountability system, along with the wherewithal for heads and teachers to get the job done in schools-not one or the other, but both.

Q6 Chairman: Any comment on that, Peter?

Professor Tymms: There is an assumption here that standards have risen and that the national literacy strategy made a difference. In fact, over those years, reading hardly shifted at all. I perhaps need to back that up, because there are a lot of different sets of data. Somebody can claim one thing, somebody can claim another and so on. Is this an appropriate moment to go into that?

Chairman: Yes, indeed.

Professor Tymms: Okay. From 1995 to 2000, we saw a massive rise in the statutory test data at the end of primary school. They were below 50% and got up towards 80%. From about 2000 onwards, they were pretty flat. That looks like a massive rise in standards, and then it was too difficult because we had got to the top end, all our efforts had gone and so on. In fact, in 1998 or thereabouts, I was looking at our test data-we use the same test every year with the same groups of pupils-and did not see any shift in reading standards. The key stage assessments use a new test every year, and one must decide what mark corresponds to level 4. That is harder. Test scores rose year on year as a percentage of level 4 with a new test, but did not rise with a static test, and that raised a question. At the same time, Hawker was working at the Qualifications and Curriculum Authority, and said in The Times Educational Supplement that if results continued to rise, we would need an independent investigation.

Around that time, QCA decided internally that it would investigate further. It commissioned Cambridge Assessment under Massey to take the tests from 1995 and 1996, and to go to a place that had not been practising the tests-Northern Ireland. It took equivalent samples of pupils and gave the 1995 and 1999 tests to them. If those tests were measuring a level 4 of the same standard, the same proportion should have got level 4, but they did not. Far more got level 4 with the later test, so the standards were not equivalent, and that was fully supported in the Massey study.

Massey did a follow-up study in which he compared the 2000 and 1995 tests, and found rises in maths, which were not as big as the tests suggested, but nevertheless were rises. He found that writing scores had increased, but called the rise in reading skills illusory. Additionally, several local education authorities collected independent data on reading, using the same test across the whole LEA year after year, and there was practically no shift in reading scores, but there was a rise in maths scores. I was able to look at 11 separate studies, which all told the same story: over that period there was probably a slight to nothing rise-about one 10th of a standard deviation-which might have been achieved if children had practised tests, but there was no underlying rise. In maths, there was an underlying rise.

There are two things going on. One is that children get better at tests if they practise them. Prior to national testing, they were doing practically no tests-it was necessary to go back to the time of the 11-plus for that. We saw a rise because of practising tests, and we saw an additional rise because standards were not being set correctly by the School Curriculum and Assessment Authority and then QCA between 1995 and 2000. Then there was teaching to the test. After 2000, QCA got its act together and set standards correctly. It now has a proper system in place, and standards are flat. There are small rises, and we must treat them with interest, but with a pinch of salt.

Let us suppose that it is decided in committee that level 4 is anything above 30 marks. If it were decided that it was one mark higher than that, the level 4 percentage might go up by 2% or 3%, and that would make national headlines, but that would be due to errors of measurement. The discussion in the committee is about three or four points around that point. The accuracy in one year, although there may be 600,000 pupils, is dependent on the cut mark, which is clear and was set incorrectly between 1995 and 2000. The assumption that standards were going up because we were introducing accountability, because we had testing, because we had Ofsted, and because we had the 500 initiatives that the Labour party put in place without evaluation shortly after coming to office, was based on a misjudgment about standards. Maths, yes; reading, no; writing, yes.

Sir Michael Barber: This is, as evidenced by Peter's comments, a complicated area, and I accept that completely. First, the national literacy and numeracy strategies are effectively a major investment in teachers' skills and their capacity to teach in classrooms. That is a long-term investment; it is not just about this year's, next year's or last year's test results. It is a long-term investment in the teaching profession's capacity, and it is well worth making because for decades before that primary school teachers were criticised for not teaching reading, writing and maths properly, but no one had invested in their skills and understanding of best practices.

Secondly, there is a debate about extent, but we seem to be in agreement on maths and writing. When I was in the delivery unit after I left the Department for Education and Employment, I learned that it is dangerous to rely on one set of data. When looking at reading standards, it is right to look at several sets of data. One is the national curriculum test results, which tell an important story. Of course, there is an element of teaching to the test, but an element of teaching to a good test is not necessarily a bad thing, although overdoing it is. I always accepted that in debate with head teachers and teachers during that time.

The second thing is that Ofsted records a very significant improvement in teachers' skills over that period of time. If teachers improve their skills in teaching reading, writing and mathematics, you would expect the results to go up. The third data set that I would put in that linked argument is that international comparisons-most importantly, the progress in international reading literacy study, or PIRLS-showed that England in 2001 did very well up on international comparisons in reading.

In 1999 came the first accusations that the test results were not real. Jim Rose led a review involving representatives of all the parties represented on this Committee, which found no evidence whatever of any tampering with the tests. In addition, people in other countries have taken the kinds of things we did in that phase of the reform and replicated, adapted or built on them-Ontario being the best example-and they, too, have had improvements in reading, writing and maths.

To summarise, although we might disagree about the extent of improvement, I think we agree that there has been significant improvement in maths and writing, which are very important. We are debating whether there has been improvement in reading. I think the combination of data sets that I have just set out suggests that there has been significant improvement in reading. I would be the first to say that it is not enough and that we have further to go in all three areas; nevertheless, we have made real progress.

My final point is that over that period, there has, as far as I can make out, been no significant change in reading and writing in Scotland, where there was no literacy strategy. The results in international comparisons in Lothian, Scotland tick along roughly at the same position.

Q7 Chairman: There has been a sharp drop in recent PIRLS. Does that mean we are going backwards?

Sir Michael Barber: Actually, I think it means that other countries have improved faster over that period. As I said in my opening statement, between 2001 and 2005, the Government were focused on some serious, long-term, underpinning reforms-most importantly, in my view, for the long run, solving the teacher recruitment shortage and bringing some very good new people into the teaching profession. That will have benefits for decades to come, but there was a loss of focus on literacy and numeracy at that point. Personally, I wish I had pressed harder on that at the time, but that is what you are seeing-the PIRLS data follows the same patterns as the national curriculum tests.

Q8 Chairman: I want to shift on because colleagues will get restless, but Peter was shaking his head, so I shall have to ask you to comment, Peter.

Professor Tymms: I must comment on several of those points. Take PIRLS, for starters, in 2001, and in 2006, when it apparently went back. Michael's comment was that we did not look good the second time because other countries went better than us. Certainly, some countries went better, but, in fact, PIRLS is standardised and uses Rasch models to get the same marks meaning the same thing, and our marks dropped back there. It was not just other people getting better; we actually got worse.

But I want to persuade you that PIRLS in 2001 got it wrong and made us look better than we were and that the level has remained static. The reason for that is that for those international tests to work properly, the students who are tested must be a representative sample of the country. The PIRLS committee defines how to collect those pupils. We went out, in this country, to collect the pupils to do it and asked the schools to do the tests, but about half of the schools did not want to do it and refused to play ball. The second wave of schools were asked and only some of them complied, and then a third wave were asked. If you look at the 2001 PIRLS data, you will see two asterisks by England, because our sampling procedure was not right. If you are the head of a school and you are asked to do the tests, but your kids are not reading too well that year, you will say no, whereas if they are doing really well, you will say, "Oh yes, I'll go for it." So we had a bias in the data. We got people who really wanted to play ball, and it made us look better than we were.

The next year, when schools were paid to do the tests-some held out and got quite a lot of money-we got a proper representative sample and found our proper place, which shows that our standards are just, sort of, in the middle for reading. The blip previously, which was crowed about a lot, was a mistake in the data.

Q9 Chairman: So, it was quite an awkward mistake in some ways, if it was a mistake. It is interesting that under PIRLS-we will shift on, before I get a rebellion here-most of the big countries like us, such as Germany and France, are about the same. Okay, Finland and some smaller countries such as Taiwan and Korea will always be high up there, but countries with big populations-in Europe, places such as France and Germany that are, in a sense, like Great Britain-are at around the same position.

Professor Tymms: I would point to a different pattern in the data which relates not to size but to the language that is chosen. Translating the results of reading tests in other languages is problematic to begin with. Can one say that reading levels are the same? You pay when you take your choice. But a long tail of underachievement in reading, will also be found in all the other countries where English is spoken. You will find it in Australia and even in Singapore, which is largely a Chinese population but reading in English, and in Canada and America. That is because English is a difficult language to learn to read, whereas Finnish is much more regular in the way that it is translated and written on to the page. If you are going to be born dyslexic, do not be born in a country where people speak English, because it will really be a problem. Be born in another country such as Germany or Italy. I make that general point.

Sir Michael Barber: Peter has made an important point. I would like to add two other things. First, other European countries look at our reforms in education over the past 10 years and are impressed by them. I have had conversations with people from several of the countries that we have talked about, and on this set of PIRLS we were actually significantly above the EU average. We were above France and just behind Germany.

The long tail of underachievement is a real issue. Personally, I think that the places to look for English-speaking populations that do really well on reading, writing and, indeed, generally are the Canadian provinces. Some of their practices are very impressive. That is one place I would urge you to look if you are thinking about the future.

Chairman: Thank you for those opening responses.

Q10 Fiona Mactaggart: You talk a lot about whether our assessment system accurately assesses standards over time, but that is only one purpose of assessment. I wonder whether our national assessment system is fit for purpose as a tool for assessment for learning. I am concerned about the fact that we have examinations at seven. I am not sure that they help teachers as much as they should. Could you give your views on whether standard assessment tests-SATs-in primary and secondary education help teachers use assessment for learning?

Professor Tymms: They were not designed to do that. A test taken at the end of primary school is clearly not meant to help children in primary schools because they are about to leave and go to secondary schools, which often ignore the information and do their own tests as soon as students come in because they do not believe what the primary schools say they have done. Unfortunately, that is the way of the world. It happens when children who do not have A-levels in mathematics go to university. They are immediately tested in mathematics. Even if you take pre-school, all the information passed from the pre-school to the reception teacher is often ignored, as the reception teacher does their own assessment.

The tests are certainly not being used as assessment for learning, other than that the practice for the tests and other tests that might be used leading up to a test might be used in that way. They might be used as assessment for learning a little bit at age seven, but an infant school certainly would not use them in that way because it would be passing its kids on to the junior school. The tests are not intended to do that kind of thing, so they cannot be and are not used in that way. They are meant to hold schools to account in order to produce information for parents.

If we want assessment for learning, we must do something different. Many schools and teachers do that kind of thing off their own bat. There are other ways to assess. For example, there are diagnostic and confirmatory assessments. We could go into that kind of thing, but they are not assessments for learning.

Sir Michael Barber: You made an aside about tests or exams at seven. It is important for the system and, indeed, teachers in schools, to know early on whether children are learning to read and write and do mathematics, because if intervention is needed to support a child in getting on track with their cohort, the sooner you know that they have a problem, the easier it is to fix it.

One purpose of national curriculum tests is to provide accountability and to provide information for parents, as Peter rightly said, and it is absolutely right that that should be the case. However, in addition to that, over a period of time the tests have taught teachers what the levels are. The basis of assessment for learning is for the teacher and, obviously, the student or pupil to be able to understand what level they are working at and what they need to do next to get to the next level. If it had not been for the national curriculum and the national tests, I doubt very much whether the quality of those conversations would be as good as they are. The key to assessment for learning is investment in teachers' skills to do that, so that they are constantly focused-not just individually, but in teams with their colleagues-on improving the quality of their teaching, working out what they must do to get the next child up to the next level and therefore constantly improving their pedagogy, which is the essence of the whole issue.

Q11 Fiona Mactaggart: The interesting thing is that your view, Peter, is that the real function of those tests is to hold schools to account, rather than as assessments for learning. I was speaking to a head teacher on Friday, who said to me, "Fiona, I just wish all primary schools were all through, because then we wouldn't have inflated test results for 7-year-olds coming out of infant schools." Her analysis was that in infant schools, for which key stage 1 SATs were summative results, there was a tendency towards grade inflation, which undermines your point, Michael. I agree that you need to know to intervene early, but if the accountability function militates against accuracy of assessment for learning, how do you square it?

Sir Michael Barber: First, the key stage 1 results are not under the same accountability pressures as those for key stages 2 or 4. Secondly, I would not have moved away from externally set and marked tests for key stage 1, because if you consider the evidence in the work of Pam Sammons and others, objective tests marked externally to the school are more likely than teacher-assessed tests in the school to provide a drive for equity. If that had been done, I doubt that the issue you just raised would have occurred.

Professor Tymms: The assessment for learning is really interesting. The evidence is that if we give back to pupils information on how to get better, but we do not give them grades, they are likely to get better. Putting in the grades, marks and levels and feeding back countermands-undermines-the feedback. That is very clear in the randomised trials and in the meta-analysis by Black and William in the Black report. The feedback to pupils on how to get better is vital, but it is undermined in other ways.

The other point that Michael raised about identifying special needs early is also crucial. The key stage assessments will not identify special needs or identify them early; they are too late and not precise enough. If, for example, a child is likely to have trouble reading, they can exhibit it when they are 5 or 4-years-old through a phonological problem, which can be assessed diagnostically at an early stage. A child later on, who has, for example, a decoding or a word-recognition problem, or perhaps they can do both but they do not understand or make sense of the text despite being able to bark the words, can also be diagnosed. Diagnostic assessments can be put in place, but they are different from the summative assessments at the key stages. There are horses for courses, and we must be careful about how we aim to use them.

Q12 Fiona Mactaggart: So, if the assessments do not necessarily do what we want, how else could we assess the impact of national policies on schools? How can we test what the Government policies, national curriculum or improvements in teacher training do? How do we know?

Professor Tymms: We need a series of different systems; we should not have a one-size-fits-all test. We need an independent body, charged with monitoring standards over time, which would use a sampling procedure in the same way as the NAEP does in the United States, as the NAA used to in the United Kingdom and as other Governments do in their countries. The procedure would become impervious to small changes in the curriculum, because it would have a bank of data against which it would check issues over time, so that we might track them and receive regular information about a variety of them, including not only attainment but attitudes, aspirations, vocabulary and so on.

I would ensure that teachers had available to them good diagnostic assessments of the type that I described. I would also ensure that there was a full understanding of assessment for learning among the pupils, and I would continue to have national tests at the age of 11, but I would not put the results in league tables. In fact, I would ensure that there were laws to prevent that sort of thing from happening.

Q13 Fiona Mactaggart: Would you have to keep them secret from parents?

Professor Tymms: No. Parents would be allowed to go to a school and ask for the results, but I would not make the results the subject of newspaper reports, with everyone looking at them in a sort of voyeuristic way. There are real problems with those tables, which are actually undermining the quality and the good impact that assessment data can have. We are forcing teachers to be unprofessional. League tables are an enemy of improvement in our educational system, but good data is not. We need good data. We need to know the standards and variations across time, but we do not need a voyeuristic way of operating and pressure that makes teachers behave unprofessionally.

Sir Michael Barber: At the risk of ruining Peter's reputation, I agree with a lot of that, and I want to say a few things about it. First, as I understand it, a new regulator is due to be set up. An announcement was made a couple of months ago by Ed Balls: I am not sure where that has got to, but the announcement was made in precise response to the issues that Peter has raised. Personally, I have no doubt about the professionalism of the QCA in the past decade. It has done a good job, but it is important that standards are not just maintained but seen to be maintained. The new regulator will help with that once it is up and running.

Secondly, on monitoring standards over time, as I said earlier, particularly now that international benchmarking has become so important not just here but around the world, I would like the regulator to use samples connected with those benchmarks and help to solve the problems of getting schools to participate in samples, which Peter mentioned. That would be extremely helpful.

I agree completely with Peter about investing in teachers' skills and giving them the diagnostic skills to make them expert in assessment for learning. When I debate the programme for international student assessment results with Andreas Schleicher, who runs PISA-he is an outstanding person and it may be worth your interviewing him-he says that virtually no country in the world implements more of the policies that would be expected to work according to the PISA data than England, but that that has not yet translated into consistent quality, classroom by classroom. That is the big challenge, and what Peter recommended would help to achieve it.

Like Peter, I would keep tests at 11. On league tables, the issue-and I have this debate with head teachers a lot-is that unless a law is passed, which I do not see as terribly likely, there are only two options for the schools system. One is that the Government, in consultation with stakeholders, design and publish league tables. The other is that one of the newspapers does it for them. That is what happened in Holland and it is happening, too, in Toronto and in Finland. It happens with universities. If you talk to university vice-chancellors, you find that they are in despair because various newspapers and organisations are publishing league tables of university performance over which they have no leverage. The data will be out there-this is an era of freedom of information, so there is a choice between the Government doing it or somebody else doing it for them. If I were a head teacher, I would rather have the Government do it-at least you can have a debate with them-than have the Daily Mail or another newspaper publish my league tables for me.

Professor Tymms: Can I pick up on that? I wish to make two points about league tables. First, we publish the percentage of children who attain a level 4 and above, so if a school wants to go up the league tables it puts its effort into the pupils who might just get a level 4 or a level 3. It puts its efforts into the borderline pupils, and it does not worry about the child who may go to Cambridge one day and has been reading for years, or the child with special needs who is nowhere near level 4. That is not going to show up on the indicator, so we are using a corrupting indicator in our league tables.

Secondly, if you look at the positions of primary and secondary schools in the league tables, you will find that secondary schools are pretty solid in their positions year on year, but primary schools jump up and down. That is not because of varying teachers but because of varying statistics. If a school has only 11 pupils and one gets a level 4 instead of a level 3, the school is suddenly up by almost 10% and jumps massively. There is a massive fluctuation, because we produce league tables for tiny numbers of pupils. We can include only children who are there from key stage 1 to key stage 2 on the value added, which often means there is turbulence in a school. We should not publish for tiny numbers.

The Royal Statistical Society recommends always quoting a measure of uncertainty for error, which is never done in those tables. We have 20,000 primary schools, and if the Government did not produce tables that the newspapers could just pick up and put in, it would require a pretty hard-working journalist to persuade them to give the press their data. It would be possible to make laws saying that you cannot publish tables. Parliament makes laws saying that you should not have your expenses scrutinised, so why can we not produce a law that says that schools' results should not be scrutinised?

Q14 Mr. Slaughter: You said a few moments ago, Sir Michael, that one of the purposes of national testing at seven and 11 was to identify children who are in difficulties. That sounds counter-intuitive. Would you not expect teachers to know that anyway? If testing has a role, is it not in assessing the needs of individual children, just as testing is used, for example, to assess the needs of people with a hearing problem? Otherwise, it is likely to lead to buck passing? If we test everybody, it almost becomes the responsibility of the state or someone else to ensure that everyone reaches a higher level. Given the length of time that we have had testing, how far has that become true? Stories in newspapers report the reverse, and say that a substantial minority of children still move onto secondary school without those skills.

Sir Michael Barber: I am not arguing that national curriculum tests alone will solve every child's problems. I agree strongly with what Peter said about teachers developing the diagnostic skills to diagnose such things. We want all teachers-I shall focus on primary schools-to be able to teach reading, writing, mathematics, and some other things, well, and then develop over time the skills needed to deal with individuals who fall behind. It is very good to see Government initiatives, such as the Every Child a Reader initiative, that pick up children who fall behind. I am in favour of all that. You need good diagnosis, which incidentally is one of the features of the Finnish education system that makes it so good-they diagnose these things early.

The national curriculum tests have spread understanding among teachers of what the levels are and of what being good at reading, writing and mathematics looks like. They also enable the system to identify that among not just individual students, but among groups of students who have fallen behind. The system has great data about particular groups of students or schools that are falling behind, which enables it to make informed decisions about where to target efforts. My point is not just about individual students, therefore, but about groups of students or variations within the cohort.

I shall comment on the point about league tables. In the end, the data will out-this is an era of freedom of information. We can have a perfectly valid debate about whether level 4 is the right indicator. However, the percentage achieving level 5 went up very rapidly during the early phase of the national literacy strategy, which suggests that good teaching is good teaching is good teaching. That was a result of the combination of the accountability system and the big investment in teachers' skills.

Q15 Lynda Waltho: In evidence so far, we have heard that the testing regime serves a large number of purposes-specifically, end of key stage, school accountability, assuring standards over time and assessment for learning. I am getting the feeling that there is not a lot of confidence that at least two of those are being achieved. What about the others? Can the system fulfil any of those purposes? Is it working? Is it fit for purpose? I do not have the impression that it is. As a former teacher and a parent, I found the regime useful in all of those areas at some point, but what is your assessment of its capabilities across that range?

Professor Tymms: I do not think that it is being used at all for assessment for learning. And I do not think that it can be, except where it is used incidentally. It provides a level against which teachers can set their pupils. If a teacher in a high-achieving school could judge her pupils, she would probably underestimate them because she would send us her judgment on those she knows. The reverse would probably happen in a low-achieving school. Standardised levels for national tests give the firm ground on which a teacher can make a judgment. That is a good thing. It is there and it is being used. It gets information to parents, but it has its downsides.

I do not think that testing is good at monitoring standards over time. We are saying, "Take this test, and we will hold you to account for the results and put them in league tables. We will send in an Ofsted inspector and ask you to assess your pupils and send us the results". That is an inherently problematic system. It is a little difficult.

Another inherently problematic thing is having qualifications and curriculum in the same body-the QCA. Somebody should design the curriculum and somebody should assess it, but they should be separate bodies.

That is an unhealthy way to operate a system. If we want to know what standards are over time, we are far better off with an independent body. If we change the curriculum-we read in The Times that that will happen, and we hear it regularly-and introduce an oral test, suddenly level 4 will not mean the same thing, because a different curriculum will be assessed. We cannot monitor standards over time, but by having an independent body charged with monitoring standards not just against the national curriculum but against an international concept of mathematics or reading, we can track things over time. We must do different things.

I come back to the need to understand the special needs of the child and pick out the child who already has a serious problem. Teachers can assess their children pretty well, but they cannot be expert in all the special needs-varieties of dyslexia, dyscalculia, attention-deficit hyperactivity disorder and so on-nor should they be expected to be. However, they might spot a problem with a child who needs to be assessed in different ways, so tools to help the teacher help the child and identify special needs and things falling back or not going quite right to begin with would make sense. Computerised diagnostic assessments with bespoke tests in which the child uses headphones to listen to the computer and is asked questions according to how they respond is be the way of the future, but it cannot be the way of the future for statutory assessments, which require a new test every year to maintain security.

Q16 Lynda Waltho: There would be more tests then.

Professor Tymms: Different types, and probably less testing. We have more testing if we have league tables. It is the league tables that are our enemy.

Sir Michael Barber: I think that, on the whole, the national curriculum tests are beneficial. I have a lot of confidence in them, and I am always cautious in advising anybody or any education system to move too rapidly in changing assessment or qualifications, as that involves a lot of risk. Nevertheless, one should not stick with things for all time. I think that they have been good tests and that they have been good for accountability purposes. Along with the supports that I mentioned earlier, they have helped to drive improvement in the system.

I agree with Peter about the need for an independent body to monitor standards over time-that is absolutely right. The proposal that is currently being piloted in 400 or 500 schools-progression pilots in which children are tested when they are ready for level tests-is very promising, but it is all in the detail. If that works, it could be beneficial in making sure that children at all stages and ages are making progress. The data show that, at present, there is a bit of drop-off in progress for years 3 and 4, but we would be able to move away from that if we had testing-when-ready tests. There is a lot of promise in them, but, as with any shift in the testing and assessment system, it is all about getting the detail right.

Q17 Chairman: We can come back to your last point. You mentioned a comment by Professor Schleicher.

Sir Michael Barber: I do not think that Andreas Schleicher is a professor, but he would be a very worthy one.

Q18 Chairman: Can you guide us to what you were quoting from?

Sir Michael Barber: I was quoting from a conversation with him. Before using his comments in the Committee, I checked that he was happy to be quoted on the record. You can put the quote on the record. He is quite happy to be quoted along the lines that I gave.

Q19 Lynda Waltho: You both discussed whether league tables were an enemy or a friend. It seems that you have completely different ideas. I agree with you, Sir Michael. I think that it is likely that the newspapers will develop their own league tables. If they do league tables about what we spend on our breakfast at the House of Commons, they will do league tables for school results, believe me. Would it not be better if the Government set out explicitly the full range of purposes for league tables; in effect, if they explained the results better? Would that make a difference, or am I just being a bit naive?

Professor Tymms: It would be interesting to try, but I do not know. If I buy something, I never bother reading the instructions until I get stuck. I would guess that most people would just look down the league tables and read the small print and headlines to find out who is at the top and who is at the bottom. When the league tables come out every year, the major headlines that we see are whether boys have done better than girls, or vice versa, or that one type of school has come top. It is the same old thing time and again, despite great efforts to steer journalists in a different direction. I despair of league tables, but it would certainly be worth trying providing more information. I think that the Royal Statistical Society's recommendation not to give out numbers unless we include the uncertainties around them is a very proper thing to do, but it is probably a bit late. The cat is out of the bag, and people are looking at the league tables. Even if there is more information, people will concentrate on the headline figures.

Sir Michael Barber: You can always look at how you can improve a data system like that and explain it better. I agree about that. I have been a strong advocate of league tables-and not only in relation to schools-because they put issues out in public and force the system to address those problems. League tables, not just in education, have had that benefit. Going back some time, I remember lots of conversations with people running local education authorities. They would know that a school was poor, and it would drift along being poor. That was known behind closed doors, but nothing was done about it. Once you put the data out in public, you have to focus the system on solving those problems. One reason why we have made real progress as a system, in the past 10 to 15 years, in dealing with school failure-going back well before 1997-is that data are out in the open. That forces the system to address those problems.

Professor Tymms: Why has it not got better then?

Sir Michael Barber: It has got significantly better. We have far fewer seriously underperforming schools than we had before.

Chairman: We do not usually allow one witness to question another, but never mind. You can bat it back.

Sir Michael Barber: It was a fair question.

Q20 Mr. Chaytor: Looking at tables and accountability, may I ask you a question, Michael? In response to a remark from Peter, you said that it is important not to rely on a single data set, but is not that exactly the flaw of our system of league tables? Whatever the level, whether in primary or secondary school, the headline is the single data set. Is there any other public institution or system of accountability for public services in Britain that relies on a single data set, other than that which we have in schools? Do we use a single data set for hospitals, police authorities or primary care trusts?

Sir Michael Barber: My remark about not relying on a single data set was in reference to measuring progress over time. That is why I referred to several sets when we debated what had happened to literacy in the past decade or more. That is what I meant. You would triangulate the data sets. I think that league tables based on national tests are perfectly respectable and fit for that purpose. As I said in answer to Lynda Waltho, it is not the case that you cannot improve them; you can have a debate about how to improve them. In the schools system, we do not rely purely on tests and league tables to assess the quality of schools. We also have Ofsted inspection, which considers the leadership and management of schools, the ethos within them and the quality of teaching as well as the standards that are achieved. That is important because it creates a more rounded picture of what schools are for.

Q21 Mr. Chaytor: But in terms of accountability to parents, which is the most significant-the 5 A to Cs score, the percentage at level 4 score or the Ofsted report? The report is a broader document, but it is also dominated by results-perhaps increasingly?

Sir Michael Barber: It takes account of results, but it does not add anything new to them. However, it looks at what is going on inside the school that delivers those results. Some of the things that I mentioned, such as quality of leadership and management are lead indicators of what will happen to results. With stronger leadership and better-quality teaching, in time the results will improve. I strongly support Ofsted inspection for that reason. There are things that you can do to improve it all the time. That is part of the task of the new chief inspector, whom I understand you will interview soon. You can debate that with her. As I understand it-and you will know from your constituents-parents consider performance in published test results, but they also examine Ofsted reports and take great interest in them when they come round. Of course, they appear only once every three years as opposed to every year.

Q22 Mr. Chaytor: May I ask both of you, but perhaps Peter first, what is the relationship between the single data set of test results and pupil intake? We can all agree that the quality of teaching is essential to improvement, but is there received wisdom that such-and-such a percentage of the outcome is determined by the input?

Professor Tymms: A league table position is largely determined by the intake of pupils to that school. It might vary depending on how you analyse it, but if you had measures of pupils on intake, that would certainly explain more than 50% of the variants in the results, and maybe up to 70% The amount that is due to the quality of teaching is typically quoted as being about 10 to 15% of the variants in secondary schools, after intake is taken into account, which means that we are down to about 5 to 7% of the variation in the league tables being due to the quality of the school-maybe less, once everything is taken into account. In primary schools it is slightly more, but it is still dominated by the intake.

What we see in the league table is dominated by the intake, so we talk about a school at the bottom end of the league, but if we put all the schools in the table, a lot of schools at the bottom would be special schools, as they have children with severe learning problems. We need to know what the intake is and the progress made, and therefore the value added, in order to make sense of the figures. A lot of mistakes were made through judgments that schools at the bottom of league tables were bad, because that was not taken into account. It is quite difficult to take that into account, but we are moving forward. That is why the earlier measures are so important. Of course, once there is teacher judgment, you can no longer rely on outcome measures, as they are not objective tests and teachers might do things to boost their positions. The data become suspect.

Q23 Mr. Chaytor: Would you accept that figure of 50 to 70%?

Sir Michael Barber: It varies from one system to another, but home background is clearly a major influence on outcomes. Nobody is debating that. We recently published a report having examined some of the best-performing systems in the world, which get much higher consistency in the quality of teaching and therefore the quality of outcomes than ours. They seem to be better at overcoming the disadvantage that children bring into a school. It is important stuff-what do those systems do? I am summarising a substantial report, but first, they select great people into teaching. Even in the 21st century, when young people have many options, they are still getting great people into teaching. We have done reasonably well on that in the past decade, but nobody can be complacent. Secondly, they train them really well, focusing on the quality of classroom teaching. Thirdly, they do the sort of things that Peter and I have been talking about-they ensure that the processes in the schools, assessment for learning and others, mean that each teacher constantly improves their skills and their ability to deliver great lessons for their students. Fourthly, they have systems that do not write off any student, as we were talking about earlier. They care, they spot early when children are falling behind and they pick them up and catch them up.

We could do all that. If we did-some schools do it brilliantly-we would reduce the impact of home background on the outcomes that students achieve. That is what we must do, and publishing the data puts that issue on the agenda in a way that nothing else would.

Q24 Mr. Chaytor: If there is a general consensus that the relationship between home background and pupil intake is the dominant explanation of a score in the league table, is there not a dynamic built into the system that there will always be failing schools? From day one of the league tables, a certain number of schools were at the bottom of the pile. The existence of the league table reinforces the sense of failure in those schools and there is almost a spiral of decline. Is that not an inevitable consequence of a league table system based on a single data set?

Professor Tymms: Yes, I think that you are quite right. For example, you will find that fewer people apply for headships in schools at the bottom of the league table. Such schools have great difficulty appointing heads-they might have to appoint ordinary teachers-whereas there are enormous numbers of applications to schools at the top of the league table. Those schools have the pick of the bunch which provides a positive reinforcement, while others get worse and worse. It is the Matthew effect in operation-"For whosoever hath, to him shall be given". That is a real concern.

On the international differences between schools, it is right to say that some countries have enormous variations between schools and that others have very little variation. In our country, there is a large variation-we have private schools and some very tough schools. However, if you go to the United States or to China-bizarrely-you will find much greater variations, largely because their schools are funded by local taxes, which means that if you live in a poor area, you have a poor school and poorly-paid teachers. We have that a bit in this country owing to the private system.

A nice league table came out in the Educational Researcher looking at qualifications of teachers in schools according to affluence and deprivation. In this country, you will typically find that the more affluent the school, the higher the qualifications and greater the experience of the teachers. That trend is much more dramatic in some countries, but in others it is actually reversed-they put their apparently better teachers into tougher schools in order to reverse that situation. We do not do that kind of thing here; we do not even think that that is possible. We have a serious discrepancy, however, between those at the top and those at the bottom. We know about that on an individual pupil basis, but it is on a school basis as well, which is reflected in the league tables.

Sir Michael Barber: I agree with what Peter said about the US. You might suppose that schools would enter a spiral of decline, but that is not what happens or what the data show. The number of schools achieving less than 30% five As to Cs has dropped dramatically from more than 600 to about 50-I cannot remember the data exactly, but they are available. By putting the data in the open, resources have been targeted to those schools, so programmes such as the Excellence in Cities programme, have helped struggling schools to improve. We have seen bigger improvements in some of those areas than in other parts of the country.

You could reinforce that further. I am interested in what they have done in New York city recently with their new accountabilities system, under which a school gets double value for moving forward a student in the bottom third of the performance distribution. You could provide greater incentives to moving forward students in the bottom third. Programmes such as the Teach First initiative and the Excellence in Cities programme have got good teachers and head teachers into disadvantaged schools. One of the reasons for that has been the fact that the data are out in the open.

Professor Tymms: I cannot let that go. The advice that we are hearing on payment by results is so misguided. If teachers can get more money for their schools according to the number of pupils, we have a problem. We have a system in which teachers have been paid according to their pupils' progress. That is an unhealthy system to advocate. That system advocates schools and gives them more money because they push more pupils forward, but they are the ones producing those results. Again, you strain professionality by going down that route.

Sir Michael Barber: May I correct that? With the allocation of resources, you need to do that in order to bring equity. I am not advocating anything other than that. The Excellence in Cities programme gives money to schools and areas because they suffer from disadvantages compared with the average. The resources are to bring greater equity. I am not sure what Peter was commenting on, but I was not making the point that he disagreed with.

Q25 Chairman: Peter, would you not want to reward specialist teachers, even if they are charged and do better with the most difficult students?

Professor Tymms: It is a very difficult problem. It would be attractive to say that people doing better should be paid more and promoted. However, schools have promotion systems already that reward those teachers. We should not pay them according to their year's results or tell them, "If your pupils get level 4s we will give you more money." They are the very teachers invigilating those pupils. They are the ones opening those papers and giving out the results. Making that direct link would strain professionality too much.

Furthermore, we are talking about one or two pupils getting an extra result in one year compared with the previous year. That is too close to the bone. It is not the way to go. We need to distance ourselves from that direct link with pupils' marks on papers and from rewarding head teachers for moving up the league tables. Let us consider the percentage of five As to Cs in secondary schools. Of course, many more students have achieved that and many more schools do that, but students are just entered for a few more tests. That is largely what happened, and largely what caused the improvement. The underlying quality of the improvement is not there to be shown. Many students who would not previously have been entered for GCSEs now are, but that does not mean that standards have changed. We must be careful how we define schools that are doing badly and those that are doing well.

Q26 Ms Butler: On that point, do you think that the contextual value added data play a role in how we weight pupils who have done better after coming in at the lower end of the spectrum?

Sir Michael Barber: I think that contextual value added data is important, because it helps us to understand the system in a way that cannot be done without it, so I am strongly in favour of it. The quality of the data in our system is now better than it has ever been, and compares very well internationally. The ability to do value added analysis on individual pupil level data, which we now have in the national system, is a huge benefit.

We need contextual value added data as well as raw data, because when students reach the age of 16, they may go into the labour market with everyone else, so it is not enough to take account just of value added. People need to reach a basic standard that gives them access, hopefully, to higher education or to work. I am in favour of the raw results being used and thought about to drive action, but I am also in favour of contextual value added data being available so that we can understand what impact policies and schools are having on the system. It is helpful to understand the system, but it is not enough on its own to drive equity in outcomes.

Professor Tymms: Yes, value added is vital and helps us to understand, but the way in which it is calculated is important. Contextual value added is one way of calculating it, but we must be careful. For example, when looking at the progress made by children in maths and reading at key stage 1 to key stage 2, and value added, we ask what children normally get given those level 1 results, and what did they get at level 2? If they did better than most children with the same starting point, that is essentially the value added, but in a broader value added system, we might take account of children's home background, ethnicity, age and so on. There we must be careful. For example, in the system children from a poor background do not do well, so if such children fall by the wayside and do less well on average when progressing from key stage 1 to key stage 2, our value added system, which takes that into account, assumes that that is all right. In fact, it may be the system that is making them fall by the wayside, because we are excusing bad performance. Contextual value added, which tries to take everything into account, brushes that under the carpet, and we must expose it and see what is happening. There are different ways of looking at value added, and in Durham we always give schools different ways of looking at that, so that they can see it is one way or another. That is important.

In the United States, a couple of great researchers, Doug Williams and Steve Raudenbush, talk about two types of value added: type A and type B. Parents want to know how their child will progress at a school. They want to know pupils' scores at the beginning and later, so that they know what is likely to happen in that school. That is type A value added. An administrator might ask how well the school is doing, given its circumstances. We know that pupils progress less well in schools in tough areas, so various schools should be looked at to see how well they are doing. Those are different types of value added. A system that says there is one type of value added-core contextual value added-is misleading, because we need much more information. We can get that information, and it can improve the system. Good information for parents, for administrators and for the country is vital.

Sir Michael Barber: For the record, I agree totally. That is one reason why national curriculum assessment for all students is an important part of being able to generate such data.

Q27 Mr. Chaytor: May I pursue one more issue? On the choice and setting of targets at key stage 2, level 4 is seen as the point below which children have failed.

However, am I not right in thinking that when the key stage system was established in 1988, level 4 was chosen as the average level of performance? My question is twofold. First, will there come a point at which the failure threshold will have to move up to level 5? Secondly, what does the research suggest about the impact on children's enjoyment of learning and on their motivation when they start their secondary school career knowing that they have failed and that they have been labelled by the local newspaper as having failed? What is the link between targets and enjoyment and motivation?

Professor Tymms: They are really good questions, so I shall do my best to answer them.

First, on the targets, of course we have had a shift in standards so that level 4 is not the level 4 with which we started. That does not make too much sense. Further, we should think about targets in terms of the value-added approach: you see where the children were and where they are likely to go and not in terms that level 4 is good and below level 4 is bad. For some pupils, level 3 is a great result and a real success; for others, level 4 is a dreadful fallback from where they were. So, when thinking about where we expect to go, we must think in those terms-about progress, rather than about absolute levels. A teacher or a school should be held to account only for the progress that their children make, not for the level that they attain. We must keep that in mind.

The targets that are imposed are not the best ones; we should use targets that come from within. In the research into targets and whether if I set myself a target I do better, it is clear that targets really work on relatively simple tasks-such as chopping down trees and washing dishes. On complex targets, such as teaching and running a school, targets do not work, and that is where ownership comes in. We have got ourselves in a bit of a tizz over the targets.

The research into fear of failure and so on is a complicated area. It is clear that young children, as they go through life, are predestined to fail in some things and succeed in others. In a sense, they expect than to happen and then to "Try harder and I'll do better." They are resilient in terms of a little failure and a little success. However, we do not want to slap down children who have done remarkably well to get to a level 3 from where they started. It is an error to label them as failures, and it is also problematic to label their school as a failure, because they feel that in themselves.

I have not seen research into the exact issue that you described, but I reviewed research into the feelings of children towards reading over the years. In our data, we saw that they stayed fairly constant over time, but other data suggest that children are less positive towards books than they used to be. We know that when they get older, they get less positive, which is a feature of education in general, and we know that boys more than girls become less positive as they get older, so by the time primary school finishes, there is a set of disaffected boys moving on to secondary school. They do not like school. If asked "Do you like school?", they say no. "Do you look forward to school?" "No." "Do you like your teachers?" "No". They then go on to a secondary school that has to start with the kids from where they are, and that is a pretty tough job.

We must worry about these things, and any national monitoring system should examine attitudes, self-esteem, welfare and physical growth-all the issues coming out of "Every Child Matters". We do not have that yet.

Q28 Chairman: May I take you back to the first part of David's question and to the question before that? We pushed you on why you are so resistant to payments by results-for getting good achievement out of young people who are less easy to teach. We have a had system for years whereby, as I understand it, if you were the high mistress of St. Paul's in the City or of King Edward's boys or girls school, you had a wonderful group of highly motivated kids who had passed all sorts of examinations to get in. If you did not get wonderful results out of them, serious questions would be asked. The people teaching such groups have always received the best pay, but you are making anti-Freud-I mean David Freud-points. You would not incentivise somebody who did a really good job of taking the most difficult youngsters and bringing them up further than you would expect. Why are you so resistant to that?

Professor Tymms: I am resistant to the direct link between the marks of those kids and the pay of their teachers. I am not against reward, and I am not against paying teachers for good results and I am not against getting good teachers in and rewarding them or paying teachers more if they are working in tough circumstances and doing a good job. But a broader decision needs to be made by the head, or perhaps by others, to say, "This teacher is doing well and is worthy of good pay." It is the direct link to the marks that I worry about. That is where the devil lies.

Sir Michael Barber: I shall come to David's question. However, I think that, within the framework set for national pay and conditions, head teachers should make the decisions about who to reward. I think that for the system to do that from outside for individual teachers is complicated and likely to be damaging. However-I think I am agreeing with Peter here-whole-school rewards for making real progress, particularly in disadvantaged areas, would be wholly positive. Obviously, you have to get the detail right of how that works.

On David's question, I agree with Peter that the system should get into measuring some of these wider outcomes, including enjoyment, motivation, and so on. I think that that is something that Ofsted inspection could do better in future. Ofsted inspection has been beneficial, but you could do more of that and use it to get into some of those issues, as indeed some systems are now thinking about-for example, in Victoria, Australia.

I have written a book about Government targets called, "Instruction to Deliver". You could look at the arguments for and against and the mistakes that were made, but you could also look at the benefits from really good targets that focus on the essence and the business. So I will not go into that. A good target can inject real ambition into a system.

However, I should really like to address the level 4 question. When I look at the 21st century, I see a labour market that is going to demand very high skills, not just in terms of reading, writing and mathematics, but in respect of rounded human beings able to work in teams and so on. I see a very demanding labour market for the young people coming through. The rest of their lives, too, will be very demanding: there are a lot of challenges in the 21st century. It is absolutely right that we are demanding more of our system than when the levels in the national curriculum were founded in 1988.

Level 4 was chosen for the end of primary school because it is for reading and writing well, not just for basic reading and writing. A child who gets level 3 can read perfectly well if you put a book in front of them, but reading and writing well is what gives you access to the secondary curriculum and that is what we have got to keep focused on. Sometimes I have the feeling-I know that some teachers and heads feel like this, because we have had this debate-that the Government imposed all these targets. However, the truth is that the targets, in effect, or the demands are placed by the 21st century: the Government are a mediator of those and sometimes they get it right and sometimes they get it wrong. But we would be betraying our young people if we did not set out for them the demands of the future that they are going into. Therefore, we should be trying to get a school system that can match up to and meet those standards.

Q29 Mr. Chaytor: Looking at key stage 4, is Warwick Mansell, in his book on testing and assessment, right to be scandalised by the extent of teacher intervention in the production of GCSE coursework?

Professor Tymms: I do not know enough about this.

Sir Michael Barber: I have not read Warwick Mansell's book.

Chairman: We always like it when witnesses say, "I don't know." It is the people who give us an opinion on everything, even if they do not know it, that we do not like. We are grateful for that.

Stephen wants to do a postscript on this section and move on to the next section.

Q30 Stephen Williams: Perhaps our witnesses could never be politicians.

Just a quick supplementary to David's line of questions, particularly to Sir Michael, who seems to be the main enthusiast for league tables. Just to be clear, is it Sir Michael's preference that, if league tables are going to exist, it would be better if the Government designed them, included all the variables on the tables, and published them like that? Is that basically what you would recommend?

Sir Michael Barber: If I have understood the question correctly-

Stephen Williams: At the moment, newspapers create league tables. The Evening Standard printed a league table, which I read on Thursday morning in London, and the Bristol Evening Post, which I saw when I got home in the afternoon, had a completely different league table, which was much better because it included free school meals, special educational needs students, the number of people entered and was measuring level 4 rather than level 5, which is what the Evening Standard seemed to be concerned about. So we had two completely different league tables at either end of the railway line. Would it better if the Government said that they were the league tables and that is what should be published?

Sir Michael Barber: I apologise for my misunderstanding. The Government should put the data out in formats that vary over time, and that is what has been happening. When the data is out there, individual newspapers can vary it. I was warning against the Government saying that they would not publish league tables at all, but the data getting out there and newspapers making up a set of league tables as happens in some countries and, indeed, in relation to higher education now. The fact that the Government are debating what should be in the league tables, which is after all public information that sets the standard for the system and gives parents information along with the various stakeholders, is right. Once the information is out there, newspapers can do what they choose.

Q31 Stephen Williams: I went to school in South Wales and, even though I do not like league tables, my natural curiosity leads me to want to know how Mountain Ash comprehensive school does in the league tables but I cannot find out. There are no league tables in Wales, so even though pupils sit the same public examinations as is England, there are no league tables. Does it necessarily follow that newspapers will create them if the data are not published?

Sir Michael Barber: Obviously, we shall see over time, but that is what has been happening around the world. One of the things that the Programme for International Student Assessment report says is that there is trend towards published public information about school performance. Indeed, that is associated with positive things in the PISA results.

Chairman: Let us look at grade inflation.

Q32 Stephen Williams: Every August, we go through the season of the three sets of SATs. Key stage results are published, as are A-levels and GCSEs. Different sections of the national media and commentators bemoan the declining standards compared with the time when they sat their examinations and so on. Is it the opinion of either of you that there really has been grade inflation at GCSE and A-level?

Professor Tymms: I shall respond by quoting the analysis of Dr. Robert Coe of the data, which I can provide for the Committee, if necessary. We used our data in the Curriculum, Management and Evaluation Centre to examine matters. The way in which we analysed matters was to take data based on general developed ability, say, two years before GCSE and then look at the grades that the student gained at GCSE.

Q33 Stephen Williams: Key stage 3 through to GCSE.

Professor Tymms: It was two years before. There is an assessment at he beginning of year 10 and then we look at the grades that were achieved. We can do that over many years. We take pupils with a particular level of ability and see what grades they get. Generally, we find pretty flat lines at GCSE. Standards appear to have been maintained at GCSE over several years. There is a little fluctuation according to some subjects, some of which apparently get easier while some apparently get a bit harder. However, the headline is pretty well standard.

A2-level tells us quite a different story. If we use the same ability test, at the beginning of A2-level, and look at the grades, we find that pupils of a particular ability are getting higher and higher grades and have been for many years. In fact, if we went back some years, a D in mathematics might be the equivalent of getting a B now. That is quite a big jump. The biggest change is in mathematics, but it is less in others and there is a big difference in different subjects.

It is complicated subject, but we were talking about fit for purpose. If we consider the purpose of A-level and selection for university, we see that Durham University's law department is inundated by students with straight As. The position is similar at Oxford and Cambridge, so to distinguish between them we create a market for producing tests for the selection of more students. The A-levels should have been doing that. We have a problem with the levels at A-level. So many students are getting As that we now need to distinguish between them.

Q34 Chairman: But only 20,000 students get three straight As out of all the people who take A-level. That must put matters into perspective.

Professor Tymms: Yes, but if you went back you would find that 30% used to fail A-level and get below an E. Now the number is down to just a few per cent. with straight failed A-levels. There has been a dramatic shift.

Stephen Williams: The 20,000 straight As would be enough to fill up all the departments at the top universities in the country.

Chairman: I am sorry, but it depends on what you call top universities.

Q35 Stephen Williams: Professor Tymms is saying that he accepts that there is grade inflation at A-level. How many people got a 2.1 at Durham 20 years ago compared with how many people get a 2.1 now?

Professor Tymms: There has been grade inflation there, but I do not know specifically about Durham University. I know about Harvard University.

Q36 Stephen Williams: Universities moan about the entry standards at A-level, but when I looked at it, lo and behold, I saw that the number of people getting 2.1 and firsts has gone up, because no one wants a 2.2 any more.

Professor Tymms: I am not going to defend that.

Sir Michael Barber: Peter probably knows better than me the data on A-levels. I just want to make one general point at the beginning. I believe that the kids coming out of our schools now are the best educated generation in history, and that owes a lot to the reforms and investment of the past 10 to 20 years. The kids do not get the credit that they deserve for that. They get run down a lot in the media, and that is a big problem. I very strongly believe that today's kids are the best educated generation in history. However, that is not to say that that is good or equitable enough; I would like it to be better. I talked about the challenges of the 21st century, but I am very pleased that this generation is the best educated in history because of the problems facing not just this country but the planet generally over the next 10 to 20 years. That requires a well educated generation.

My second point goes back to what we were saying before. Having a new independent exams regulator, as proposed by Ed Balls, will really help in this area. I hope that that will come to pass. Thirdly, the arrangements for doing A-level exams-retaking modules and so on-enable more young people to succeed. That may be one of the factors why Peter-and he may want to comment on this-sees what he is seeing.

On GCSEs, I am glad to hear what Peter has to say. I believe-and I got into trouble for this in my first few months in the Department in 1997-that in the very early years of GCSEs, between 1988 and 1990, there was an element of grade inflation. There is an account of this debate in my book. The progressive changes in the QCA since then have tightened it up and helped the standard rather well.

Q37 Stephen Williams: I was going to ask about the variables. I am sure that the National Union of Teachers and other teaching unions would say that we have the best qualified teaching profession that we have ever had, and that the quality of teaching is very high. However, is it also because the structure of the exams has changed? The modular system has been mentioned and the fact that you can retake modules. Therefore, can we really compare results now with those 10, 15 or 20 years ago, which the newspapers tend to do?

Professor Tymms: I recommend that the Committee talks to Dr. Robert Coe, who has specifically studied the subject. I can just talk in general about it. There are several factors why that might have happened. Lots of things have changed here, so a direct comparison is not straightforward. However, modular has happened and there are more students. If you have more students, you want to aim your grades at the students in front of you; that is a natural thing to do. Yes, we wanted more people to go to university, so we have had to lower A-level standards in order to get them there. So there is a natural logic to this.

I worry about the standards of mathematics and physics for students at the top end. I would look at the quantity of syllabuses that are being covered and talk to mathematicians, physicists and chemists about what is actually happening. We need more scientists, and more scientists at a very high level. We need more people motivated to study science. There is a tendency to think that if we make those exams and give more grades, we will get more people studying it. Actually, some of the bright kids are challenged by really hard subjects and to make them easier is not helpful. It is a complicated situation, and attracting more people to science is perhaps outside our scope here.

Q38 Stephen Williams: Given that grades have gone up, and that is a given fact, does that mean that the standards themselves have been debased?

Professor Tymms: No, it does not automatically mean that. You need to look at this in more detail in order to check that. I am telling you that students with the same ability are getting higher grades, so you could argue that there has been better teaching between now and then, and that might indeed be the case, but we need to look at the standard setting and see what we mean by equivalent standards.

This is a complicated area which evolves. No doubt the Committee will have heard of the Flynn effect. If you take non-verbal ability measures across the western world for the past 25 to 50 years, you will see that they have been rising steadily. People appear to be getting taller and cleverer. They are more able to do things that they have never done before. The same is not true for verbal skills. We also have the anti-Flynn effect. You will see a decrease in Piagetian levels of children just finishing primary school-Michael Shayer's work on that is very important. Why has that happened? Is it because we are taking away the Piagetian work in the early parts of primary schools that are now not focusing on that early development through play and so on? It is difficult to know that, but these are general patterns that we are seeing across the western world.

Sir Michael Barber: I can definitely say that my memory is not improving over time, but I just want to raise three general points. One is that I think that the quality of teaching and the quality of the teachers that we are recruiting have improved significantly. I think that young people are more motivated than they were 20 or 30 years ago. A lot of people in those days expected to get jobs in unskilled and semi-skilled work forces and did not need to try hard in school. This is the challenge for the future-we need to think about how we as a culture prepare ourselves for the 21st century as I described. There is an element in our culture that assumes that, if more children are passing exams, standards must have got worse. We must guard against that. We need a culture from business, universities, parents and the school system saying that more and more children can achieve high standards. That is what we need, and that is what we want to see in the 21st century.

Q39 Stephen Williams: One final question. Is it the Flynn or the Finn effect?

Professor Tymms: Flynn.

Q40 Stephen Williams: I heard about it on "Start the Week" this morning, and someone was pouring cold water on it, saying that factored backwards, it implies the Victorians were stupid, when clearly they were not. If grades have been inflated, and if it is accepted that roughly 90% of those who pass A-levels now go to university rather than straight into work, as was the case when I took them, are A-levels fit for purpose?

Professor Tymms: You really need to ask what the purpose is. If the purpose is straight selection to university, there is a problem at the top end with that differentiation. We need more differentiation, and if we do not get that right, other systems will come in-people will produce their own American SATs for selection to university, or a new law test. That will undermine the purpose of A-levels, which have been a very good motivator in our colleges and sixth forms. There are some great teachers working in that area, and it would undermine that.

There is another question about whether A-levels are fit for purpose. Do they prepare students well for their next stage of study? Again, it is quite complicated. AQA's research committee has been investigating whether that is the case. It has gone to the law departments and psychology departments to find out whether they believe that law and psychology A-levels and so on are useful. There is another issue out there. There are never straightforward answers, but we need to ask the questions. Are the students going on to university actually able to do those kinds of thing? People are always complaining about maths and reading, so we see four-year courses instead of three-year courses because students apparently have not done enough maths. If you are just asking straight whether they are fit for purpose, I do not think that they are fit for purpose at the top end for selection, but for the rest they do pretty well.

I should add one other thing about A-level standards. It has to do with the setting of standards over time. I talked earlier about setting standards for key stage assessments over time. The way that it is done for key stage 2, for example, is multifarious. There are lots of ways to maintain the standards over time, but one way is to take the students who do the key stage assessment this year and give a proportion of them next year's test secretly to see how they do-pre-testing it with the next people and seeing what level they were given last year. It is not a perfect system, but it is an interesting way to do it. A-levels and GCSEs get no pre-testing. All the standard-setting is done afterwards on the basis of statistical relationships. No items used last year are used this year. In something like the programme for international student assessment, they do the tests, release half the items and keep some so they can be used next year to standardise next year's test. It is the same with the progress in international reading literacy study. A-levels and GCSEs do not have any pre-testing, which may be an issue that needs to be faced up. Most of the systems in the world have pre-testing.

Chairman: I am aware that we have two sections to complete this evening, and some of us want to hear Ed Balls in another place later. Sir Michael.

Sir Michael Barber: I will be brief. In an era when we are moving towards everybody staying compulsorily in full-time or part-time education until 18, which I believe to be absolutely right, A-levels are clearly not the whole answer to the challenge. To pick up on the point about fitness for purpose, we need to get apprenticeships working well. I spent Friday afternoon with some apprentices at the Rolls-Royce plant in Derby-a fascinating conversation. We need to get the new diplomas to work well. We should make the international baccalaureate available. I am in favour of developing a range of possible qualifications for young people, so that we can have qualifications fit for the whole cohort, all of them have something to aim for and all of them go into the labour market with qualifications that have real value.

Q41 Chairman: If we want young people to stay on until 18, the natural school leaving age for learning and skills progression, what is the point of having a major exam at 16? Is it not becoming redundant?

Sir Michael Barber: When the full 14-to-19 programme is working well, the debate will change. I do not think that we are there yet, but I agree that that might well be part of the debate, absolutely.

Q42 Mrs. Hodgson: I would like to move on to models of assessment, but I have a bit of a cold, so you must excuse my deep voice.

I understand that, at the moment, the Government are doing about 500 pilots in schools on making good progress. I understand that currently the main purposes of assessment are listed as points one to four. I just wanted to say something about point four: assessment for learning, improving both learning and teaching. I know that this Committee has heard my views on the personalised teaching agenda and I know that it is making good progress, emphasising more informal teacher assessment and personalisation in teaching.

Regarding personalisation of teaching, should it not be specialisation in teaching? I say that because it touches on one of the things that I am concerned about, as the Chairman is well aware. Earlier, Sir Michael, you said, "The sooner you know the problem, the easier it is to fix it." So you probably can guess where I am going. I wonder why, when you were advising the Department for Education and Employment on the literacy hour and numeracy hour, you did not suggest that, when children are identified with, say, dyslexia, there should be specialist dyslexia teachers in every school to work with those children? So, getting back to the models of assessment and bearing my particular interest in mind, do you think that the current key stage tests remain the appropriate model of assessment and, if they are not, what alternatives would you suggest?

Sir Michael Barber: First of all, by the way, when I worked in the Department for Education and Employment on the literacy and numeracy hours and all of that, I had detailed conversations with the Dyslexia Institute and the British Dyslexia Association. Ken Follett, who is very actively involved in that world, was somebody whom I talked to often, and incidentally I still do talk to him. I think that what you say is right, that once you get really good teaching consistently across the cohort of literacy, most children will make progress, and then the ones that have a problem, whether it is dyslexia or something else, will be easier to identify. I think that the problem, if you go back before the literacy and numeracy strategies, was that children who had a problem got muddled up in the cohort, because nobody had invested in the teacher's skills to teach reading, writing and mathematics in the way that they are now generally able to do. So I completely agree with your point.

Whether you use the word "personalisation" or "specialisation", I believe very strongly that, as soon as a child is identified as having a problem such as dyslexia, there needs to be specialist people available to advise and help. Importantly, they need to advise the child on how to catch up with the cohort and not sink further behind the cohort. That is really important.

I think that the progression pilots that you referred to, which the Government are running now, will effectively involve testing when ready; when the teacher thinks that a child is ready to go to the next level, they will use a single level test. That system has a lot of potential and we talked about it earlier in the Committee. I have been an advocate of just-in-time testing since the mid-1990s, when I published a book called "The Learning Game", but they have to get the detail right. That is why I think that it is important that this type of testing is being piloted.

Professor Tymms: I have talked about the present system, so I will not add to what I have said about that. Let me just pick up on the teacher judgment and the single level test, because I read about that in The Times today and I had read some previous material in tender documents finalising the test data. I just wonder if I have got it right. Apparently, under this system the teachers will make judgments, then the pupils will do the tests and that information will be used to feed in to the information going in to league tables and so on. However, now we have cut off the test, which is security, and we are relying on the teacher judgment, but the teachers will be judged by their judgments. Surely that cannot be the way that the system will operate. That is one thing that puzzles me here.

The second thing is that, if we are going to have a single test to do that, we know that, at the moment, the tests, say at key stage 2, which I regard as good, reliable, valid tests, have pretty big margins of error when it comes to assessing a particular level of a child. Therefore, by focusing on a single level, they will be less accurate than that. That will be worrying about the quality of the data, so I would be keen to see the results of the trials that are being done and whether that system is viable and produces good, reliable data on those students. I also noted that it suggests two tests a year for a pupil, rather than one, which seems a strange route to take.

Thinking more broadly about the personalised and specialised learning, I have some sympathy with what you are saying about the specialised learning, but I also have sympathy for the personalised learning. With regard to the assessment that we use currently for children just starting school, there are some children whose vocabulary levels are extremely low, most are pretty good for children of that age and some children at the top are quite exceptional-some of them start school with higher language levels than some of the 11-year-olds leaving primary school. The teacher of such a class has to deal with that group year on year with that phenomenal range in mathematics, language and reading, and that is mixed-ability teaching, which means that you have to do something different with different children in the same class.

There are other models: I mentioned the computerised diagnostic assessment earlier. In fact, in Northern Ireland, from this term, all 900 primary schools will not do science, but will do computerised diagnostic assessments that will give information to the teacher on the strengths and weaknesses of individual children so that they can improve that with the feedback. Therefore, there is a different model operating there, and we could look at how those things are operating differently.

Q43 Mrs. Hodgson: With regard to what alternative you would suggest, what jumped out at me was that "Making Good Progress" has been called a one-way ratchet because the teacher will decide when the child is ready for that test. A child might consistently get bad tests, but if they are retested on a good day the ratchet will go up. There is never a chance, however, for the child to be levelled down, so it could just be that they have a good test on a good day. It therefore produces high levels of certainty so that misclassification is minimised, or retesting of doubtful cases does not happen. I have not got the full details of "Making Good Progress", but I do not know if there are any alternatives available instead of the new single level tests.

Professor Tymms: Yes, within our centre we run the performance indicators in primary schools project for schools. Many schools do the test with the children every year, and we look at year-on-year progress. They get flat graphs on that, or computer diagnostic assessments would do that-there are plenty of systems out there. This is just one system, and I really think that we need to look at the trials and the statistics on that to see how well they look. We need to monitor the progress of children and spot them when they are falling by the wayside.

Sir Michael Barber: Clearly, there are alternative systems. The technical details of the progression pilots need to be worked through to ensure that the problems that you and Peter have drawn attention to do not occur. I think that there is a lot of promise in them, but the detail will be crucial, as I have said consistently. I know that Committees are criticised for travelling, so maybe you could do this by reading papers or by video conference, but if I were you, I would look at what is being done in New York City, Hong Kong, where the secondary curriculum is being completely reorganised, and Ontario, where the literacy and numeracy programme, which was originally modelled on ours, is being built on and taken forward. These examples all have implications.

Q44 Mrs. Hodgson: You mentioned personalised learning. I went on a delegation to Sweden that looked at the free school model that is used there, and I was very interested in how they really do focus on personalised learning, as they stream the children according to ability, not age. You might have one nine-year-old who was in with 11-year-olds for numeracy, but in with seven-year-olds for literacy. The children are mixed up according to their ability, which is very interesting.

Professor Tymms: In Bob Slaven's "Success for All" programme, he points to the good research evidence for bringing together children with the same reading age some time in the week. So that is an interesting way forward.

Sir Michael Barber: I agree with that.

Chairman: Dawn has waited extremely patiently to ask about the unintended consequences of testing.

Q45 Ms Butler: Sir Michael, you mentioned our basically being future-proof, and I completely agree: we have to make sure that we teach young people for the future, and the Government are right still to focus on maths, English and science as the core subjects. My first question is about testing. Professor Tymms, you said that it was not the testing, but the pre-testing that was the problem for the younger kids. You then said that there was no pre-testing for GCSEs and A-levels. What are the effects of that amount of testing on children, teachers and schools?

Professor Tymms: I am using "pre-testing" with two different meanings, so I must clarify that. What I meant in relation to setting standards was that the exam-awarding bodies did not pre-test the GSCE tests before they gave them out for real. What I meant in relation to primary schools was that the schools themselves take past papers and get their kids to redo them. Of course, that happens at GCSE as well-pupils will have mocks and practise this and that. The teachers do lots of previous work, but the pre-test is done at key stage assessments by QCA or whoever is employed to do it; it does not happen at A-level and the rest in the standard setting. That just clarifies the point.

Q46 Ms Butler: Wonderful. So what do you think the effects of that amount of testing are on children, teachers and schools?

Professor Tymms: They are multifarious. When you set up a system, you never quite know what is going to happen, and there are lots of unexpected consequences. We have to worry about the focus and the narrowing of the curriculum. Of course, we want to get reading, writing and maths right, but we also want drama and physical activity-we want to keep the children physically active-and there is evidence that that has decreased. In fact, in 2002, with Andy Wiggins, I did a survey comparing Scottish schools and English schools and found evidence of the narrowing of the curriculum, a blame culture in the classroom and so on. We need to watch such things to see what is happening-we need to track and monitor the monitoring. There are unintended consequences, including a focus on borderline children, which is an unhealthy thing. There is a focus on the ones who are likely to get the 4 A*s to C or the children who are not going to get level 4. Little clubs are therefore set up to work on the borderline children, rather than the child with special needs. Lots of peculiar things go on as a result.

Sir Michael Barber: When I worked in the delivery unit, we looked at a lot of targets and data sets, and people predicted perverse or unintended consequences. We used to say, "Obviously, you should just predict the ones you think will happen and then we'll check." If you focused on street crime, for example, the police would predict that other crimes would get worse. In fact, that is not what happened, but it is always worth checking those things. On the level boundaries, we found that although the target was about level 4, the percentage achieving level 5 rose very rapidly, even though that was not the borderline at stake. Good teaching is good teaching, just as good policing is good policing.

I would like to say other two things. Literacy and numeracy underpin the whole curriculum, and unless you get them right in primary school, young people will be held back in all kinds of ways, including in drama and all the other things that really matter. The second thing that I want to say is that, on the whole, the schools that do best academically do best in a wider set of outcomes, because they are well-run institutions teaching well and doing everything properly. That is not a perfect fit, but it is generally the case. It is absolutely right to focus on literacy and numeracy, but of course you also want the wider curriculum for young people.

Q47 Ms Butler: That leads me to my next question. Would the performance and so on of schools be improved if we used a separate mechanism, such as reforming Ofsted inspections? You talked about Ofsted looking at all the different variations such as the leadership of schools and so on. Would improving Ofsted inspections improve schools and their overall performance?

Sir Michael Barber: Peter may want to come in, because he has had strong views for many years on Ofsted, but I think that Ofsted should constantly keep its inspection process under review. Since Ofsted was set up in its current form, it has been a positive influence on the schools system over the past 10 to 15 years, but it can always get better. As implied in your question, it should be the institution that looks at those wider things, including the ethos of the school, which matters so much, and its comments on them should get you in, beneath, below and around the data from the tests. Ofsted should constantly keep its processes under review.

My view is that all processes, including leadership training, professional development for teachers and Ofsted, should focus in the next decade on achieving a consistent quality of classroom teaching. I quoted Andreas Schleicher, who said we are doing more of the right things than any other system in the world in England, but we have not yet had the impact on consistent classroom quality, so I should like to see Ofsted, professional development and leadership development all focusing on that, because it is the central challenge for our schools system.

Professor Tymms: Just before Ofsted changed to its present system, a paper was published by Newcastle university-by Shaw, Doug Newton and others-in which the authors compared the GCSE results of a school shortly after an Ofsted inspection with what it normally achieved. They showed that immediately after the inspection, their results were worse, which is interesting, considering the amount of money that was spent just to frighten the teachers. After that, Doug Newton was called in by Gordon Brown for an interview, and shortly afterwards the money for Ofsted was reduced and we went to the cheaper form of inspection.

We need a thorough examination of Ofsted's impact on schools. What is it actually doing? That is exactly your question, but rather than give an opinion, we should deliberately examine it to see what the impact is by looking at schools before and after they have inspections, and tracking them statistically across the country, because it is not clear that inspections are improving schools, although they might be. Neither is it clear that they are damaging schools, but they might be. We need to see that kind of evidence. It is a lot of money and there is a particular theory behind it.

Another point that links into that is the view of what matters in the educational system. Michael has been saying that teachers matter, and I agree absolutely. He has also emphasised the importance of heads, but it is not so clear to me that heads are key with regard to reading and maths. In fact, what we have in schools are loosely coupled organisations: the head must influence this or that, and there is the teacher in the classroom. When I undertook a recent examination of 600 secondary schools and 600 primary schools, and looked at their value-addeds and how they changed when the head changed, I could find no evidence for such change at all. Actually, the teacher is the key. The head is vital for other things, such as the morale of staff, the building of new buildings and the design of the curriculum-appointing good staff is one vital thing that the head does-but we need to think about structure. We need to monitor things continuously and always ask what is the impact of what we are paying our money for. Ofsted is one of those things.

Sir Michael Barber: We can get caught up in metaphors, but the way I see it is that the head teacher's role is like the conductor of an orchestra. They do not play a single instrument, but if they do their bit, everybody else plays better. That is probably what we are trying to do with head teachers, particularly in our devolved system in which heads are given a lot of discretion.

Q48 Chairman: You have both been in this game for quite some time. A week is a long time in politics, and 10 years is an awfully long time in politics. If you could go back to when you started, what would you do differently, not only to drive up standards-one of you said that the standards are in the heart, rather than just the head-but to increase the ability of children to excel within themselves?

Sir Michael Barber: In the book I mentioned earlier, "Instruction to Deliver", which was published in the summer, I own up to a whole range of mistakes. One reason for my looking slightly quizzical when you asked that question, is that I was thinking, "How long have you got?" I could spend the next hour or so talking about this, but I know that you have other things to do.

Chairman: We have the book to refer to.

Sir Michael Barber: First, something in which I was personally involved that I would see as a mistake took place in 2000. After the big jumps in numeracy and literacy that we have been debating, there was a general tendency, of which I was a part, to consider that primary school improvement had happened and that it was then all about secondary schools. That took the focus off, but we were really only at the beginning of seeing that improvement through. Secondly-this is a detail, but it is important, looking back-in the 2000 spending review, we set a new target for primary school literacy, aiming to raise it from 80 to 85%. I think that that was a mistake because we had not reached the 80% target. It was demoralising. I, personally, regret not negotiating more vigorously at the time.

If you look in my book you will find a whole list of things that I got wrong. Overall, I am very proud of the contribution that I have been able to make to improving the education system over the last decade. While we could have been bolder and we could have achieved more, I am absolutely confident-I think the data confirm this-that we have the best-educated generation in history. There is much more to do to prepare for the 21st century, but it has been a great experience.

Q49 Chairman: Something quite interesting that you said earlier was that it is not we who are making these demands-it is the world. It is the competitive global economy and so on. Many countries seem to be responding to that task, not by using testing and assessment and the path that you or the Government have chosen, but by choosing very different ways. People tell the Committee that the curriculum is too narrow, that people teach to the test and that children no longer get the chance to explore a whole range of activities and subjects as they used to do. What do you say to people who say that?

Sir Michael Barber: Two things. One is that I am certainly not arguing, and that may now be my fate in history, that testing and assessment are the single lever to drive improving standards. They are part of a whole system. The crucial elements are combining the challenge that comes from the testing and accountability system with serious support, investment in teachers' skills, and, as Peter said, giving teachers the capacity to do the job. It is the combination that I believe in. Systems that have pressure without support generally do not succeed and systems that have support without pressure do not succeed either. It is getting the combination right that is the key, particularly when you want to change things. Some systems-Finland is an example-recruit good people into teaching, as they have a high standard among their graduate distribution, and they train them well. Their standards have been established, so it has got into teachers' heads that they need less testing as they are already established at the top of the world league tables. If you are going to try to change things, the combination of challenge and support is most likely to get you there.

Q50 Chairman: Peter, what should they have done that they did not do?

Professor Tymms: First, they should have taken notice of the research evidence of what works. I do not mean the survey, or what is associated with what works, but what changes were made and where we saw the difference. In particular, I would go for randomised control trials. In reading, for example, there is a wealth of knowledge. We know more about reading and how to help children with reading. That knowledge was more or less ignored when we were making changes, so evidence is importance, and light of that I would go to the experts.

When the School Curriculum and Assessment Authority and its precursor, the School Examinations and Assessment Council, were set up, that was done without any test experts at all. It is only now, after the QCA has been put in place, that people are available who really knew about tests and the way forward. Now, the standard has been set properly. When it was done earlier, they would buy some people in and reckon that it could be sorted out. We need experts. When Estelle Morris spoke to the British Educational Research Association meeting a little while ago, she said that while she was Secretary of State she took almost no notice of the research that was around. I find that extremely worrying.

We need to take notice of the research, rather than surveys and statements such as "This person is doing better," or "My father said this and therefore it is good for me." We should look at what has been done in randomised controlled trials that have been shown to work. Before we put in new systems we need to trial them and check that they work. When the national literacy strategy was going to be rolled out, a trial was running, which was stopped before the strategy was ready. Everybody had to do something that had not been trialled. Later, an evaluation was made post hoc, when everybody was doing the same thing and it was too late. We need to compare this and compare that. That is really important. There is real knowledge out there. We can evaluate things, and when we put in new systems, we need to track them over time. We need, too, to get good experts.

Above all, we need good teachers. I absolutely agree: we need good teachers and we need to trust them. Perhaps we need to free up the curriculum, and perhaps teachers should experiment with it. To find new ways of working, we have to go outside England. Why cannot we allow in people to look at new ways of working, assessment and so on? They are pretty good people, those teachers. We absolutely rely on them and we should rely on them more.

Q51 Chairman: When the previous Committee looked at the issue of teaching children to read, we came up with two major recommendations. We tried to check evidence-based policy, and the evidence suggests that if you take any systematic way of teaching children to read, it works. We also said that it was to do with the quality of the teachers. We found that there is very little evidence that anyone ever trained our teachers to teach children to read on any basis at all. The Government then rushed off-influenced by a former member of this Committee, I believe-to set up a Committee that recommended synthetic phonics, which had been trialled only in Clackmannanshire. We were a little disappointed that our recommendations were not fully taken on board.

Sir Michael Barber: Chairman, I cannot help noticing the imbalance in your questions. You asked me what mistakes I have made and then asked Peter what mistakes I have made as well. I wish that you had asked him what mistakes he has made, but since you did not-

Q52 Chairman: What mistakes has he made?

Sir Michael Barber: You should ask him. However, since I have managed to get the floor, I think that basing policy on evidence is very important. I talk a lot about evidence-informed policy, and I believe that the programmes that we have been talking about are among the most evidence-informed policies ever, and we have had better evidence on which to base them.

Another issue that arises when you are involved in government is how long you have got. Looking at the data that we had on primary reading standards prior to 1996 and looking at the challenges of the 21st century-Peter and I are broadly in agreement about this-something had to be done. We took the evidence that was available. There is a great report by Professor Roger Beard-he is now at the Institute of Education-which summarises the evidence base for the literacy strategy. We worked very hard to take all the evidence into account. I have been honest about mistakes that I made, but overall it was one of the most evidence-informed policies ever. Its replications around the world demonstrate that it can be replicated with variations with the same results.

Q53 Mrs. Hodgson: On the point about good teachers, I have recently returned from Singapore where, as in your example of Finland, teachers are recruited from the top 10% of the cohort of university graduates. The Government offer whatever incentives they have to. They also headhunt teachers-they spot them. The education officers monitor graduates. They go up to them and say, "Have you thought about becoming a teacher?"

The teaching profession is held in much higher regard, and is revered as it was here 50 or 60 years ago. The pay reflects that. Teachers are paid a lot better. There is an incentive, because if students are bright and go into teaching, they might be sent to the UK, where their teaching is funded. They then go back and teach in Singapore. It is interesting that we are not at that stage.

Sir Michael Barber: That is one of the examples that we use in our recently published report, "How the World's Best-Performing School Systems Come Out on Top". We looked at systems on several continents, including the one in Singapore. What you say is absolutely right, with the exception that they do not pay teachers more than here. However, they pay them reasonably well.

If you talk to the Singaporean Education Minister, as perhaps you did, you find that they are constantly looking for ways to motivate young people to go into teaching in the future. We have done reasonably well on that over the last few years, but we have a long way to go and can never be complacent about ensuring that we secure really good entrants into the teaching profession, both out of university, and among mature people who have gone into other lines of work and change to teaching.

Q54 Chairman: Thank you, Sir Michael and Professor Tymms. It has been a really good sitting-a marathon sitting. I am sorry that we have kept you so long, but it has been so absorbing and interesting: we have enjoyed it immensely. I am sorry that we were not an all-party Committee today. It is a great pity that you did not have a slightly broader range of questions, but you did have a fair range. It was two-party, but not all-party. Will you remain in contact with us? If we want to come back and ask you some other questions about the evidence that you have given, will you be accessible?

Sir Michael Barber: Absolutely.

Professor Tymms: Sure.

Chairman: I am glad that we are not paying the full consultancy fee for today. Thank you very much for coming.