Select Committee on Children, Schools and Families Minutes of Evidence

Examination of Witnesses (Questions 1 - 19)



  Q1  Chairman: May I welcome Professor Sir Michael Barber and Professor Peter Tymms to the first evidence session of the new Committee? We have been busily building the team, seminaring and deciding our priorities for investigation, but this is our first proper session, so thank you very much for being able to appear before us at reasonably short notice. Both of you will know that our predecessor Committee started an inquiry into testing and assessment. It was a quite different Committee, but with its interest in schools, it decided to embark on a serious investigation into testing and assessment. It managed to tie up with a nice little bow almost every other area through 11 different reports in the previous Parliament, but it could not conclude this one. It troubled people to the extent that copious volumes of written evidence had come to the Committee, and it would seem wrong if we did not make such an important issue our first topic, pick up that written evidence, slightly modify and expand the terms of reference and get on with it. So, thank you very much for being here. You are key people in this inquiry: first, Michael, because of your association with testing and assessment, through which many of us have known you for a long time, right back to your National Union of Teachers days; and secondly, Professor Tymms, through your career in a number of institutions, where we have known you, and known and admired your work. We generally give witnesses a couple of minutes to make some introductory remarks. You know what you have been invited to talk about. If you would like to have a couple of minutes—not too long, although a couple of minutes is probably a bit short—to get us started, then I shall start the questioning. Peter, you were here first, so we shall take you first.

  Professor Tymms: I am director of a centre at the University of Durham which monitors the progress of children in order to give schools—not anybody else—good information. It provides us with a tremendous database from which to view other issues, meaning that I have taken an interest in all the different assessments—key stage and so on. They have concluded that standards in reading have stayed constant for a long time, but that in mathematics, they have risen since about 1995. Those are the headlines on testing. On the introduction of new policies, I am keen to say—I might return to this—that there is a need for good trials. If we try something new, we should get it working before we move it out to the rest of the public. I am very keen for new ways of operating to be properly evaluated before they are rolled out, and then to be tracked effectively. We have been missing that.

  Chairman: Thank you.

  Sir Michael Barber: Thank you very much for your invitation, Chairman. I shall comment on the story of standards in primary schools, which I see in four phases. The first came between 1988 and 1996, when the then Conservative Government put in place the national curriculum, national assessment, Ofsted inspections, league tables and the devolution of resources to schools. There were lots of ups and downs in that story, but nevertheless that framework was established. Secondly, there was the phase with which I was associated—Government policy under David Blunkett who was the then Secretary of State for Education and Employment—during which there was a focus on what we called standards, rather than on structures. A big investment in teachers' skills, through the national literacy and numeracy strategies, led to rises in the national test results. I have always accepted that some of that was down to teaching to the tests, but a lot of it was down to real improvements evidenced by Ofsted data and international comparisons. In the third phase, between 2000 and 2005, the Government were focused largely on long-term, underpinning and structural reforms, including of the teaching profession, of secondary education and the introduction of the children's agenda, at which stage results plateaued. Things got harder, too, because we had picked the low-hanging fruit, as it were. I think that we should have stayed much more focused on literacy and numeracy, in addition to the others things that we did. That was my error. Now there is an opportunity to make real progress on literacy and numeracy as a result of the Rose review last year and the new emphasis on phonics. By the way, I completely agree with Peter on the pilots and progression. If all those things are put together, I could envisage a fourth stage, during which we can begin to make progress. In summary, we have gone from being below average, on international comparisons, to above average—we are above France, Scotland and the EU average. However, we have a long way to go and significant improvements to make. If we want to be world class, we must do more.

  Q2  Chairman: Thank you for those introductory remarks. I remember taking the Committee to New Zealand where people wanted to be able to assess more carefully the progress of students and were looking at what we had done. I recall their horror when it was suggested that they might adopt our system. They said, "We want to know how our young people are doing, but we do not want to go to the extent that you are of testing at so many ages." Are you sympathetic to that point of view? Do you think that we over-test?

  Sir Michael Barber: Personally, I do not think that we over-test in primary schools—if that is what you are talking about. Primary school children take literacy and numeracy tests aged seven and externally-set and marked literacy, numeracy and science tests aged 11. That is a relatively small number of tests during a six-year primary school career. The information provided by the tests is fundamental to understanding how the system is working and to looking for strategies for future improvements. I do not think that we over-test at all.

  Q3  Chairman: Even if that adds up to ages seven, 11, 14, 16, 17 and 18?

  Sir Michael Barber: I focused my answer on primary schools. There is a separate debate to be had about secondary examinations and tests at ages 14, 16, 17 and 18. However, at primary level, we conduct the bare minimum of testing if we want to give parents, the system, schools and teachers the information that they need, at different levels, in order to drive through future improvements. One of the benefits of 10 years, or so, of national assessments is that this system has better information with which to make decisions than many others around the world.

  Professor Tymms: I do not think that testing at seven and 11 is too much testing. However, if you have a system in which you take those tests, put them into league tables and send Ofsted inspectors in to hold people accountable, schools will test a lot more. So we probably do have too much testing in the top end of primary schools, but that is not statutory testing. It is the preparation for the statutory testing, so it is a consequence of what is happening. Of course, we do need the kind of information that those tests were designed to get at. You mentioned the need to know what our children are doing and their levels. If we wanted to know the reading standards of 11-year-olds in this country, we could probably find out by assessing 2,000 pupils picked at random. We do not have to assess 600,000 pupils. One purpose is to know what the levels are, which could be done with a sampling procedure, with the same tests every year, which would be secret and run by professionals going out and getting the data. There is another kind of information, for teachers about their pupils, which they could get by their own internal tests or other tests if they wanted, and another kind of information for parents. There is an interface: how do they get that information? Do they go to the schools, or do they read it in their newspapers? Do they know about their own pupils? Those layers of information, and how to get them, provide the complex background to the answer to your question. There is too much testing, but not because of a single test at 11—for goodness' sake, children can do that. I think that I was tested every two weeks when I was about eight years old, and I quite enjoyed them. Not all children do, but the possibility of that exists. We need good information in the system for parents, teachers and Parliament, and we need to know it nationally, but we do not necessarily have to do the sort of testing that we currently have to get that information. There are different purposes and reasons for doing it. I guess that I can expand on that as you need.

  Q4  Chairman: But Michael is known to believe—I am not setting you against each other—in the notion that testing would drive up standards. It was the "engine", was it not? I am not misquoting you, am I?

  Sir Michael Barber: It is not a misquote, but it is not a complete view of what I believe. I believe that, in order to drive up standards, we need a combination of challenge and support. Assessment and Ofsted inspection provide the challenge in the system, and then we need serious investment in teachers and their skills, pay and conditions. I am in favour of assessment, being able to benchmark schools and the information that that provides to heads, teachers and parents. I agree with Peter that there may in addition be an advantage to sampling techniques, probably linked with the international benchmarks to assess the performance of the whole system.

  Q5  Chairman: I have slightly misquoted you: testing was "the engine to drive performance", I think you said.

  Sir Michael Barber: But I am saying that the accountability system on its own is not enough. You need investment in teachers' skills, which is what the national literacy and numeracy strategies did. They gave teachers the skills and wherewithal to understand how to teach reading, writing and mathematics. The evidence of that is powerful. Only recently, the effective pre-school and primary education research programme, which Pam Sammons and others run, has shown clearly the benefits in student outcomes if teachers teach the last part of the literacy hour well—the plenary. Detailed pedagogical skills need to be developed by teachers, which needs an investment. Obviously, you also need to pay teachers well, ensure that the system is recruiting enough teachers and devolve money to the schools. I am strongly in favour of the challenge that comes from an accountability system, along with the wherewithal for heads and teachers to get the job done in schools—not one or the other, but both.

  Q6  Chairman: Any comment on that, Peter?

  Professor Tymms: There is an assumption here that standards have risen and that the national literacy strategy made a difference. In fact, over those years, reading hardly shifted at all. I perhaps need to back that up, because there are a lot of different sets of data. Somebody can claim one thing, somebody can claim another and so on. Is this an appropriate moment to go into that?

  Chairman: Yes, indeed.

  Professor Tymms: Okay. From 1995 to 2000, we saw a massive rise in the statutory test data at the end of primary school. They were below 50% and got up towards 80%. From about 2000 onwards, they were pretty flat. That looks like a massive rise in standards, and then it was too difficult because we had got to the top end, all our efforts had gone and so on. In fact, in 1998 or thereabouts, I was looking at our test data—we use the same test every year with the same groups of pupils—and did not see any shift in reading standards. The key stage assessments use a new test every year, and one must decide what mark corresponds to Level 4. That is harder. Test scores rose year on year as a percentage of Level 4 with a new test, but did not rise with a static test, and that raised a question. At the same time, Hawker was working at the Qualifications and Curriculum Authority, and said in The Times Educational Supplement that if results continued to rise, we would need an independent investigation. Around that time, QCA decided internally that it would investigate further. It commissioned Cambridge Assessment under Massey to take the tests from 1996 and 1999, and to go to a place that had not been practising the tests—Northern Ireland. It took equivalent samples of pupils and gave the 1996 and 1999 tests to them. If those tests were measuring a Level 4 of the same standard, the same proportion should have got Level 4, but they did not. Far more got Level 4 with the later test, so the standards were not equivalent, and that was fully supported in the Massey study. Massey did a follow-up study in which he compared the 2000 and 1996 tests, and found rises in maths, which were not as big as the tests suggested, but nevertheless were rises. He found that writing scores had increased, but called the rise in reading skills illusory. Additionally, several local education authorities collected independent data on reading, using the same test across the whole LA year after year, and there was practically no shift in reading scores, but there was a rise in maths scores. I was able to look at 11 separate studies, which all told the same story: over that period there was probably a slight to nothing rise—about one 10th of a standard deviation—which might have been achieved if children had practised tests, but there was no underlying rise. In maths, there was an underlying rise. There are two things going on. One is that children get better at tests if they practise them. Prior to national testing, they were doing practically no tests—it was necessary to go back to the time of the 11-plus for that. We saw a rise because of practising tests, and we saw an additional rise because standards were not being set correctly by the School Curriculum and Assessment Authority and then QCA between 1995 and 2000. Then there was teaching to the test. After 2000, QCA got its act together and set standards correctly. It now has a proper system in place, and standards are flat. There are small rises, and we must treat them with interest, but with a pinch of salt. Let us suppose that it is decided in committee that Level 4 is anything above 30 marks. If it were decided that it was one mark higher than that, the Level 4 percentage might go up by 2% or 3%, and that would make national headlines, but that would be due to errors of measurement. The discussion in the Committee is about three or four points around that point. The accuracy in one year, although there may be 600,000 pupils, is dependent on the cut mark, which is clear and was set incorrectly between 1995 and 2000. The assumption that standards were going up because we were introducing accountability, because we had testing, because we had Ofsted, and because we had the 500 initiatives that the Labour party put in place without evaluation shortly after coming to office, was based on a misjudgment about standards. Maths, yes; reading, no; writing, yes.

  Sir Michael Barber: This is, as evidenced by Peter's comments, a complicated area, and I accept that completely. First, the national literacy and numeracy strategies are effectively a major investment in teachers' skills and their capacity to teach in classrooms. That is a long-term investment; it is not just about this year's, next year's or last year's test results. It is a long-term investment in the teaching profession's capacity, and it is well worth making because for decades before that primary school teachers were criticised for not teaching reading, writing and maths properly, but no one had invested in their skills and understanding of best practices. Secondly, there is a debate about extent, but we seem to be in agreement on maths and writing. When I was in the delivery unit after I left the Department for Education and Employment, I learned that it is dangerous to rely on one set of data. When looking at reading standards, it is right to look at several sets of data. One is the national curriculum test results, which tell an important story. Of course, there is an element of teaching to the test, but an element of teaching to a good test is not necessarily a bad thing, although overdoing it is. I always accepted that in debate with head teachers and teachers during that time. The second thing is that Ofsted records a very significant improvement in teachers' skills over that period of time. If teachers improve their skills in teaching reading, writing and mathematics, you would expect the results to go up. The third data set that I would put in that linked argument is that international comparisons—most importantly, the progress in international reading literacy study, or PIRLS[1]—showed that England in 2001 did very well up on international comparisons in reading. In 1999 came the first accusations that the test results were not real. Jim Rose led a review involving representatives of all the parties represented on this Committee, which found no evidence whatever of any tampering with the tests. In addition, people in other countries have taken the kinds of things we did in that phase of the reform and replicated, adapted or built on them—Ontario being the best example—and they, too, have had improvements in reading, writing and maths. To summarise, although we might disagree about the extent of improvement, I think we agree that there has been significant improvement in maths and writing, which are very important. We are debating whether there has been improvement in reading. I think the combination of data sets that I have just set out suggests that there has been significant improvement in reading. I would be the first to say that it is not enough and that we have further to go in all three areas; nevertheless, we have made real progress. My final point is that over that period, there has, as far as I can make out, been no significant change in reading and writing in Scotland, where there was no literacy strategy. The results in international comparisons indicate that Scotland ticks along roughly at the same position.

  Q7  Chairman: There has been a sharp drop in recent PIRLS. Does that mean we are going backwards?

  Sir Michael Barber: Actually, I think it means that other countries have improved faster over that period. As I said in my opening statement, between 2001 and 2005, the Government were focused on some serious, long-term, underpinning reforms—most importantly, in my view, for the long run, solving the teacher recruitment shortage and bringing some very good new people into the teaching profession. That will have benefits for decades to come, but there was a loss of focus on literacy and numeracy at that point. Personally, I wish I had pressed harder on that at the time, but that is what you are seeing—the PIRLS data follows the same patterns as the national curriculum tests.

  Q8  Chairman: I want to shift on because colleagues will get restless, but Peter was shaking his head, so I shall have to ask you to comment, Peter.

  Professor Tymms: I must comment on several of those points. Take PIRLS, for starters, in 2001, and in 2006, when it apparently went back. Michael's comment was that we did not look good the second time because other countries went better than us. Certainly, some countries went better, but, in fact, PIRLS is standardised and uses Rasch models to get the same marks meaning the same thing, and our marks dropped back there. It was not just other people getting better; we actually got worse. But I want to persuade you that PIRLS in 2001 got it wrong and made us look better than we were and that the level has remained static. The reason for that is that for those international tests to work properly, the students who are tested must be a representative sample of the country. The PIRLS committee defines how to collect those pupils. We went out, in this country, to collect the pupils to do it and asked the schools to do the tests, but about half of the schools did not want to do it and refused to play ball. The second wave of schools were asked and only some of them complied, and then a third wave were asked. If you look at the 2001 PIRLS data, you will see two asterisks by England, because our sampling procedure was not right. If you are the head of a school and you are asked to do the tests, but your kids are not reading too well that year, you will say no, whereas if they are doing really well, you will say, "Oh yes, I'll go for it." So we had a bias in the data. We got people who really wanted to play ball, and it made us look better than we were. The next year, when schools were paid to do the tests—some held out and got quite a lot of money—we got a proper representative sample and found our proper place, which shows that our standards are just, sort of, in the middle for reading. The blip previously, which was crowed about a lot, was a mistake in the data.

  Q9  Chairman: So, it was quite an awkward mistake in some ways, if it was a mistake. It is interesting that under PIRLS—we will shift on, before I get a rebellion here—most of the big countries like us, such as Germany and France, are about the same. Okay, Finland and some smaller countries such as Taiwan and Korea will always be high up there, but countries with big populations—in Europe, places such as France and Germany that are, in a sense, like Great Britain—are at around the same position.

  Professor Tymms: I would point to a different pattern in the data which relates not to size but to the language that is chosen. Translating the results of reading tests in other languages is problematic to begin with. Can one say that reading levels are the same? You pay when you take your choice. But a long tail of underachievement in reading, will also be found in all the other countries where English is spoken. You will find it in Australia and even in Singapore, which is largely a Chinese population but reading in English, and in Canada and America. That is because English is a difficult language to learn to read, whereas Finnish is much more regular in the way that it is written on to the page. If you are going to be born dyslexic, do not be born in a country where people speak English, because it will really be a problem. Be born in another country such as Germany or Italy. I make that general point.

  Sir Michael Barber: Peter has made an important point. I would like to add two other things. First, other European countries look at our reforms in education over the past 10 years and are impressed by them. I have had conversations with people from several of the countries that we have talked about, and on this set of PIRLS we were actually significantly above the EU average. We were above France and just behind Germany. The long tail of underachievement is a real issue. Personally, I think that the places to look for English-speaking populations that do really well on reading, writing and, indeed, generally are the Canadian provinces. Some of their practices are very impressive. That is one place I would urge you to look if you are thinking about the future.

  Chairman: Thank you for those opening responses.

  Q10  Fiona Mactaggart: You talk a lot about whether our assessment system accurately assesses standards over time, but that is only one purpose of assessment. I wonder whether our national assessment system is fit for purpose as a tool for assessment for learning. I am concerned about the fact that we have examinations at seven. I am not sure that they help teachers as much as they should. Could you give your views on whether Standard Assessment Tests—SATs—in primary and secondary education help teachers use assessment for learning?

  Professor Tymms: They were not designed to do that. A test taken at the end of primary school is clearly not meant to help children in primary schools because they are about to leave and go to secondary schools, which often ignore the information and do their own tests as soon as students come in because they do not believe what the primary schools say they have done. Unfortunately, that is the way of the world. It happens when children who have A-levels in mathematics go to university. They are immediately tested in mathematics. Even if you take pre-school, all the information passed from the pre-school to the reception teacher is often ignored, as the reception teacher does their own assessment. The tests are certainly not being used as assessment for learning, other than that the practice for the tests and other tests that might be used leading up to a test might be used in that way. They might be used as assessment for learning a little bit at age seven, but an infant school certainly would not use them in that way because it would be passing its kids on to the junior school. The tests are not intended to do that kind of thing, so they cannot be and are not used in that way. They are meant to hold schools to account and in order to produce information for parents. If we want assessment for learning, we must do something different. Many schools and teachers do that kind of thing off their own bat. There are other ways to assess. For example, there are diagnostic and confirmatory assessments. We could go into that kind of thing, but they are not assessments for learning.

  Sir Michael Barber: You made an aside about tests or exams at seven. It is important for the system and, indeed, teachers in schools, to know early on whether children are learning to read and write and do mathematics, because if intervention is needed to support a child in getting on track with their cohort, the sooner you know that they have a problem, the easier it is to fix it. One purpose of national curriculum tests is to provide accountability and to provide information for parents, as Peter rightly said, and it is absolutely right that that should be the case. However, in addition to that, over a period of time the tests have taught teachers what the levels are. The basis of assessment for learning is for the teacher and, obviously, the student or pupil to be able to understand what level they are working at and what they need to do next to get to the next level. If it had not been for the national curriculum and the national tests, I doubt very much whether the quality of those conversations would be as good as they are. The key to assessment for learning is investment in teachers' skills to do that, so that they are constantly focused—not just individually, but in teams with their colleagues—on improving the quality of their teaching, working out what they must do to get the next child up to the next level and therefore constantly improving their pedagogy, which is the essence of the whole issue.

  Q11  Fiona Mactaggart: The interesting thing is that your view, Peter, is that the real function of those tests is to hold schools to account, rather than as assessments for learning. I was speaking to a head teacher on Friday, who said to me, "Fiona, I just wish all primary schools were all through, because then we wouldn't have inflated test results for 7-year-olds coming out of infant schools." Her analysis was that in infant schools, for which Key Stage 1 SATs were summative results, there was a tendency towards grade inflation, which undermines your point, Michael. I agree that you need to know to intervene early, but if the accountability function militates against accuracy of assessment for learning, how do you square it?

  Sir Michael Barber: First, the Key Stage 1 results are not under the same accountability pressures as those for Key Stages 2 or 4. Secondly, I would not have moved away from externally set and marked tests for Key Stage 1, because if you consider the evidence in the work of Pam Sammons and others, objective tests marked externally to the school are more likely than teacher-assessed tests in the school to provide a drive for equity. If that had been done, I doubt that the issue you just raised would have occurred.

  Professor Tymms: The assessment for learning is really interesting. The evidence is that if we give back to pupils information on how to get better, but we do not give them grades, they are likely to get better. Putting in the grades, marks or levels and feeding back countermands—undermines—the feedback. That is very clear in the randomised trials and in the meta-analysis by Black and Wiliam in Inside the Black Box.  The feedback to pupils on how to get better is vital, but it is undermined in other ways. The other point that Michael raised about identifying special needs early is also crucial. The key stage assessments will not identify special needs or identify them early; they are too late and not precise enough. If, for example, a child is likely to have trouble reading, they can exhibit it when they are 5 or 4-years-old through a phonological problem, which can be assessed diagnostically at an early stage. A child later on, who has, for example, a decoding or a word-recognition problem, or perhaps they can do both but they do not understand or make sense of the text despite being able to bark the words, can also be diagnosed. Diagnostic assessments can be put in place, but they are different from the summative assessments at the key stages. There are horses for courses, and we must be careful about how we aim to use them.

  Q12  Fiona Mactaggart: So, if the assessments do not necessarily do what we want, how else could we assess the impact of national policies on schools? How can we test what the Government policies, national curriculum or improvements in teacher training do? How do we know?

  Professor Tymms: We need a series of different systems; we should not have a one-size-fits-all test. We need an independent body, charged with monitoring standards over time, which would use a sampling procedure in the same way as the NAEP does in the United States, as the APU used to in England and as other governments do in their countries. The procedure would become impervious to small changes in the curriculum, because it would have a bank of data against which it would check issues over time, so that we might track them and receive regular information about a variety of them, including not only attainment but attitudes, aspirations, vocabulary and so on. I would ensure that teachers had available to them good diagnostic assessments of the type that I described. I would also ensure that there was a full understanding of assessment for learning among the pupils, and I would continue to have national tests at the age of 11, but I would not put the results in league tables. In fact, I would ensure that there were laws to prevent that sort of thing from happening.

  Q13  Fiona Mactaggart: Would you have to keep them secret from parents?

  Professor Tymms: No. Parents would be allowed to go to a school and ask for the results, but I would not make the results the subject of newspaper reports, with everyone looking at them in a sort of voyeuristic way. There are real problems with those tables, which are actually undermining the quality and the good impact that assessment data can have. We are forcing teachers to be unprofessional. League tables are an enemy of improvement in our educational system, but good data is not. We need good data. We need to know the standards and variations across time, but we do not need a voyeuristic way of operating and pressure that makes teachers behave unprofessionally.

  Sir Michael Barber: At the risk of ruining Peter's reputation, I agree with a lot of that, and I want to say a few things about it. First, as I understand it, a new regulator is due to be set up. An announcement was made a couple of months ago by Ed Balls: I am not sure where that has got to, but the announcement was made in precise response to the issues that Peter has raised. Personally, I have no doubt about the professionalism of the QCA in the past decade. It has done a good job, but it is important that standards are not just maintained but seen to be maintained. The new regulator will help with that once it is up and running. Secondly, on monitoring standards over time, as I said earlier, particularly now that international benchmarking has become so important not just here but around the world, I would like the regulator to use samples connected with those benchmarks and help to solve the problems of getting schools to participate in samples, which Peter mentioned. That would be extremely helpful. I agree completely with Peter about investing in teachers' skills and giving them the diagnostic skills to make them expert in assessment for learning. When I debate the programme for international student assessment results with Andreas Schleicher, who runs PISA—he is an outstanding person and it may be worth your interviewing him—he says that virtually no country in the world implements more of the policies that would be expected to work according to the PISA data than England, but that that has not yet translated into consistent quality, classroom by classroom. That is the big challenge, and what Peter recommended would help to achieve it. Like Peter, I would keep tests at 11. On league tables, the issue—and I have this debate with head teachers a lot—is that unless a law is passed, which I do not see as terribly likely, there are only two options for the schools system. One is that the Government, in consultation with stakeholders, designs and publishes league tables. The other is that one of the newspapers does it for them. That is what happened in Holland and it is happening, too, in Toronto and in Finland. It happens with universities. If you talk to university vice-chancellors, you find that they are in despair because various newspapers and organisations are publishing league tables of university performance over which they have no leverage. The data will be out there—this is an era of freedom of information, so there is a choice between the Government doing it or somebody else doing it for them. If I were a head teacher, I would rather have the Government do it—at least you can have a debate with them—than have the Daily Mail or another newspaper publish my league tables for me.

  Professor Tymms: Can I pick up on that? I wish to make two points about league tables. First, we publish the percentage of children who attain a Level 4 and above, so if a school wants to go up the league tables it puts its effort into the pupils who might just get a Level 4 or a Level 3. It puts its efforts into the borderline pupils, and it does not worry about the child who may go to Cambridge one day and has been reading for years, or the child with special needs who is nowhere near Level 4. That is not going to show up on the indicator, so we are using a corrupting indicator in our league tables. Secondly, if you look at the positions of primary and secondary schools in the league tables, you will find that secondary schools are pretty solid in their positions year on year, but primary schools jump up and down. That is not because of varying teachers but because of varying statistics. If a school has only 11 pupils and one gets a Level 4 instead of a Level 3, the school is suddenly up by almost 10% and jumps massively. There is a massive fluctuation, because we produce league tables for tiny numbers of pupils. We can include only children who are there from Key Stage 1 to Key Stage 2 on the value added, which often means there is turbulence in a school. We should not publish for tiny numbers. The Royal Statistical Society recommends always quoting a measure of uncertainty for error, which is never done in those tables. We have 20,000 primary schools, and if the Government did not produce tables that the newspapers could just pick up and put in, it would require a pretty hard-working journalist to persuade them to give the press their data. It would be possible to make laws saying that you cannot publish tables. Parliament makes laws saying that you should not have your expenses scrutinised, so why can we not produce a law that says that schools' results should not be scrutinised?

  Q14  Mr Slaughter: You said a few moments ago, Sir Michael, that one of the purposes of national testing at seven and 11 was to identify children who are in difficulties. That sounds counter-intuitive. Would you not expect teachers to know that anyway? If testing has a role, is it not in assessing the needs of individual children, just as testing is used, for example, to assess the needs of people with a hearing problem? Otherwise, it is likely to lead to buck passing? If we test everybody, it almost becomes the responsibility of the state or someone else to ensure that everyone reaches a higher level. Given the length of time that we have had testing, how far has that become true? Stories in newspapers report the reverse, and say that a substantial minority of children still move onto secondary school without those skills.

  Sir Michael Barber: I am not arguing that national curriculum tests alone will solve every child's problems. I agree strongly with what Peter said about teachers developing the diagnostic skills to diagnose such things. We want all teachers—I shall focus on primary schools—to be able to teach reading, writing, mathematics, and some other things, well, and then develop over time the skills needed to deal with individuals who fall behind. It is very good to see Government initiatives, such as the Every Child a Reader initiative, that pick up children who fall behind. I am in favour of all that. You need good diagnosis, which incidentally is one of the features of the Finnish education system that makes it so good—they diagnose these things early.

  The national curriculum tests have spread understanding among teachers of what the levels are and of what being good at reading, writing and mathematics looks like. They also enable the system to identify that among not just individual students, but among groups of students who have fallen behind. The system has great data about particular groups of students or schools that are falling behind, which enables it to make informed decisions about where to target efforts. My point is not just about individual students, therefore, but about groups of students or variations within the cohort. I shall comment on the point about league tables. In the end, the data will out—this is an era of freedom of information. We can have a perfectly valid debate about whether Level 4 is the right indicator. However, the percentage achieving Level 5 went up very rapidly during the early phase of the national literacy strategy, which suggests that good teaching is good teaching is good teaching. That was a result of the combination of the accountability system and the big investment in teachers' skills.

  Q15  Lynda Waltho: In evidence so far, we have heard that the testing regime serves a large number of purposes—specifically, end of key stage, school accountability, assuring standards over time and assessment for learning. I am getting the feeling that there is not a lot of confidence that at least two of those are being achieved. What about the others? Can the system fulfil any of those purposes? Is it working? Is it fit for purpose? I do not have the impression that it is. As a former teacher and a parent, I found the regime useful in all of those areas at some point, but what is your assessment of its capabilities across that range?

  Professor Tymms: I do not think that it is being used at all for assessment for learning. And I do not think that it can be, except where it is used incidentally. It provides a level against which teachers can set their pupils. If a teacher in a high-achieving school could judge her pupils, she would probably underestimate them because she would base her judgment on those she knows. The reverse would probably happen in a low-achieving school. Standardised levels for national tests give the firm ground on which a teacher can make a judgment. That is a good thing. It is there and it is being used. It gets information to parents, but it has its downsides. I do not think that testing is good at monitoring standards over time. We are saying, "Take this test, and we will hold you to account for the results and put them in league tables. We will send in an Ofsted inspector and ask you to assess your pupils and send us the results". That is an inherently problematic system. It is a little difficult. Another inherently problematic thing is having qualifications and curriculum in the same body—the QCA. Somebody should design the curriculum and somebody should assess it, but they should be separate bodies. That is an unhealthy way to operate a system. If we want to know what standards are over time, we are far better off with an independent body. If we change the curriculum—we read in The Times that that will happen, and we hear it regularly—and introduce an oral test, suddenly Level 4 will not mean the same thing, because a different curriculum will be assessed. We cannot monitor standards over time, but by having an independent body charged with monitoring standards not just against the national curriculum but against an international concept of mathematics or reading, we can track things over time. We must do different things. I come back to the need to understand the special needs of the child and pick out the child who already has a serious problem. Teachers can assess their children pretty well, but they cannot be expert in all the special needs—varieties of dyslexia, dyscalculia, attention-deficit hyperactivity disorder and so on—nor should they be expected to be. However, they might spot a problem with a child who needs to be assessed in different ways, so tools to help the teacher help the child and identify special needs and things falling back or not going quite right to begin with would make sense. Computerised diagnostic assessments with bespoke tests in which the child uses headphones to listen to the computer and is asked questions according to how they respond is to be the way of the future, but it cannot be the way of the future for statutory assessments, which require a new test every year to maintain security.

  Q16  Lynda Waltho: There would be more tests then.

  Professor Tymms: Different types, and probably less testing. We have more testing if we have league tables. It is the league tables that are our enemy.

  Sir Michael Barber: I think that, on the whole, the national curriculum tests are beneficial. I have a lot of confidence in them, and I am always cautious in advising anybody or any education system to move too rapidly in changing assessment or qualifications, as that involves a lot of risk. Nevertheless, one should not stick with things for all time. I think that they have been good tests and that they have been good for accountability purposes. Along with the supports that I mentioned earlier, they have helped to drive improvement in the system. I agree with Peter about the need for an independent body to monitor standards over time—that is absolutely right. The proposal that is currently being piloted in 400 or 500 schools—progression pilots in which children are tested when they are ready for level tests—is very promising, but it is all in the detail. If that works, it could be beneficial in making sure that children at all stages and ages are making progress. The data show that, at present, there is a bit of drop-off in progress for years 3 and 4, but we would be able to move away from that if we had testing-when-ready tests. There is a lot of promise in them, but, as with any shift in the testing and assessment system, it is all about getting the detail right.

  Q17  Chairman: We can come back to your last point. You mentioned a comment by Professor Schleicher.

  Sir Michael Barber: I do not think that Andreas Schleicher is a professor, but he would be a very worthy one.

  Q18  Chairman: Can you guide us to what you were quoting from?

  Sir Michael Barber: I was quoting from a conversation with him. Before using his comments in the Committee, I checked that he was happy to be quoted on the record. You can put the quote on the record. He is quite happy to be quoted along the lines that I gave.

  Q19  Lynda Waltho: You both discussed whether league tables were an enemy or a friend. It seems that you have completely different ideas. I agree with you, Sir Michael. I think that it is likely that the newspapers will develop their own league tables. If they do league tables about what we spend on our breakfast at the House of Commons, they will do league tables for school results, believe me. Would it not be better if the Government set out explicitly the full range of purposes for league tables; in effect, if they explained the results better? Would that make a difference, or am I just being a bit naive?

  Professor Tymms: It would be interesting to try, but I do not know. If I buy something, I never bother reading the instructions until I get stuck. I would guess that most people would just look down the league tables and read the small print and headlines to find out who is at the top and who is at the bottom. When the league tables come out every year, the major headlines that we see are whether boys have done better than girls, or vice versa, or that one type of school has come top. It is the same old thing time and again, despite great efforts to steer journalists in a different direction. I despair of league tables, but it would certainly be worth trying providing more information. I think that the Royal Statistical Society's recommendation not to give out numbers unless we include the uncertainties around them is a very proper thing to do, but it is probably a bit late. The cat is out of the bag, and people are looking at the league tables. Even if there is more information, people will concentrate on the headline figures.

  Sir Michael Barber: You can always look at how you can improve a data system like that and explain it better. I agree about that. I have been a strong advocate of league tables—and not only in relation to schools—because they put issues out in public and force the system to address those problems. League tables, not just in education, have had that benefit. Going back some time, I remember lots of conversations with people running local education authorities. They would know that a school was poor, and it would drift along being poor. That was known behind closed doors, but nothing was done about it. Once you put the data out in public, you have to focus the system on solving those problems. One reason why we have made real progress as a system, in the past 10 to 15 years, in dealing with school failure—going back well before 1997—is that data are out in the open. That forces the system to address those problems.

  Professor Tymms: Why has it not got better then?

  Sir Michael Barber: It has got significantly better. We have far fewer seriously underperforming schools than we had before.

  Chairman: We do not usually allow one witness to question another, but never mind. You can bat it back.

  Sir Michael Barber: It was a fair question.

1   Progress in International Reading Literacy Study Back

previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2008
Prepared 13 May 2008