The role and performance of Ofsted

Memorandum submitted by Rebecca Allen, Institute of Education, University of London and Simon Burgess, CMPO, University of Bristol

Introduction

1. This submission relates to the following issues highlighted by the Select Committee:

· "The performance of Ofsted in carrying out its work;

· The impact of the inspection process on school improvement"

2. We offer two pieces of evidence:

· We report some simple statistics on the nature of schools failing their Ofsted inspections

· We provide a preliminary report on some work in progress on the causal impact on a school’s progress of failing their Ofsted inspection.

3. This evidence is based on the authors’ analysis of data from the National Pupil Database (NPD) and Ofsted judgment data. This relates to secondary schools only, between the dates of 2003 and 2009 inclusive, this timing being determined by the existence of the NPD data. A few details on the data are provided at the end of this note.

Summary

4. Our preliminary results show that there is a positive causal effect of failing an Ofsted inspection on the school’s subsequent GCSE performance. This effect is statistically and substantively significant, but it appears to be transient. It peaks at two years after the inspection but has disappeared by four years after.

5. We show that schools in affluent neighbourhoods with few poor pupils and high scoring intakes are only rarely judged to be failing by Ofsted. By contrast, schools with less able intakes and greater poverty have much higher failure rates. The bulk of schools judged to be failing look like this. There are two possible interpretations of this: either that Ofsted is truly measuring the quality of teaching and learning in a school, in which case, these results show that the least effective schools are disproportionately to be found in poor neighbourhoods, serving poor, low ability pupils. Alternatively, it could be that Ofsted judgments are strongly influenced by actual grades achieved, rather than the quality of teaching per se. In this case, it would be important to consider what an Ofsted inspection adds to the information available through test score outcomes themselves.

Performance of Ofsted in carrying out its work

6. We consider the social profile of schools by Ofsted judgment, analyzing the distribution of judgments by the intake ability profile of schools, the level of poverty of the students in schools, and the level of neighbourhood poverty around schools.

7. This is informative on Ofsted’s performance because it provides some information on the outcomes of the process.

8. In Table 1 we tabulate the characteristics of schools judged to be failing in comparison to other schools inspected by Ofsted in 2009. In the first column, we see that the schools judged to be failing have similar average Keystage 2 (KS2) test scores to the next category up, but distinctly lower KS2 scores than the two higher categories. In column 2 we report the average number of students eligible for Free School Meals (FSM). There is a strong pattern showing that the schools judged to be failing have more poor students. Finally, in column 3 we report the neighbourhood poverty rates of schools. Specifically, this is the mean IDACI score of students in the schools by Ofsted judgment. Again, students in failing schools tend to live in poorer neighbourhoods than students at schools deemed to be excellent.

9. We examine trends in these comparisons over time in Figures 1 to 3. While there is some variation from year to year, the gap in the mean characteristics of schools deemed to be failing and those deemed excellent remains largely constant.

10. We now cut the data the other way and compute the percentage of schools inspected that are judged to be failing, by the schools’ characteristics. The results are in Table 2. Column 1 shows that 9.7% of schools with student intake ability in the lowest quintile were judged to be failing, while only 2.3% of schools with the highest ability students were so judged. There is a similar gap looking at the poverty rate in schools: 8.3% of the poorest schools were failed relative to 2.2% of the least poor schools. This pattern is also reflected in the final column, looking at the schools’ neighbourhood poverty rates.

11. We examine whether these patterns have changed over time. Figures 4 to 6 display the trends for the lowest and highest quintiles of KS2 scores, school poverty and the poverty of the schools’ neighbourhoods. The failure rate of the most affluent and the high intake ability schools remains constant and low throughout this period, around 2% to 3% throughout. The failure rate of schools in the highest poverty quintile, and the lowest intake ability quintile, are more variable – all having a spike in 2006 – but are uniformly much higher, averaging around 12%.

12. To summarise, our results show that schools in affluent neighbourhoods with few poor pupils and high scoring intakes are only rarely judged to be failing by Ofsted. By contrast, schools with less able intakes and greater poverty have much higher failure rates. The bulk of schools judged to be failing look like this. There are two interpretations of this:

a. Ofsted is truly measuring the quality of teaching and learning in a school. In this case, these results show that the least effective schools are disproportionately to be found in poor neighbourhoods, serving poor, low ability pupils.

b. Ofsted judgments are strongly influenced by actual grades achieved, rather than the quality of teaching per se. In this case, it would be important to consider what an Ofsted inspection adds to the information available through test score outcomes themselves.

13. Both of these interpretations are likely to be true in part. One of the criteria Ofsted use is the level of test scores achieved by the school. There are also a number of reasons why teaching effectiveness may be lower in schools in poorer neighbourhoods. For example, it may be that more effective teachers and headteachers are to be found more often in the more affluent schools.

The impact of the inspection process on school improvement

14. In on-going work, we are analysing the consequences for subsequent GCSE exam performance of failing an Ofsted inspection. This directly addresses the question of the impact of Ofsted on school improvement. This paper will be finished and available in November.

15. There are a number of statistical difficulties in establishing the impact of failing an Ofsted inspection. First, we need to distinguish the true causal impact of that from simple mean reversion: that is, that the least effective schools in one year are almost bound to improve a little in the following year. This is not part of the effect of Oftsed on school improvement and needs to be taken out of the estimate. Second, the schools highlighted by Ofsted are necessarily going to be poorly performing, and are likely to remain quite poorly performing. We therefore need to take account of their circumstances and look for any improvement given those circumstances.

16. To deal with these issues, we need to compare schools that have just failed their Ofsted inspection with very similar schools who just passed. Accordingly, we adopt a Regression Discontinuity Design (RDD), a well established identification technique in economics. The idea is to use as a "control" group for the failed schools the schools that only just passed their inspection. Details of the statistical procedure and the definition of the assignment variable for the discontinuity will be provided in the forthcoming paper. We also focus on changes in GCSE performance, not levels. This approach takes account of the fixed but unobservable characteristics of the schools such as teacher effectiveness, the school’s environment and so on, and also a few observable but changing factors such as the characteristics of the student intake.

17. Before reporting our results, it is worth considering that the outcome could be negative, zero or positive. Negative if there is a major impact on staff morale, or key highly effective staff leave, or dealing with the process diverts resources and time from teaching. Positive if the failure (re-)focuses attention on the right things, and provides increased information and motivation.

18. Our preliminary results show that there is a positive causal effect of failing an Ofsted inspection on the school’s subsequent GCSE performance. This effect is statistically and substantively significant, but it appears to be transient. It peaks at two years after the inspection but has disappeared by four years after. This is the average effect; we are now investigating any heterogeneity in this effect.

19. This seems to us to be a reasonable and credible outcome. The effect is positive not negative on average, which obviously is an encouraging report of the impact on school improvement of Ofsted. On the other hand, failing an Ofsted, and the aftermath of that, are not highly resourced and high impact interventions. It is unlikely that that would have a permanently transformative effect on a school.

20. We have also conducted a preliminary analysis of the question of whether the contemporaneous year 11 cohort suffers in the year of the Ofsted inspection. We find a rather small negative effect.

Further Data Details

21. The number of categories used by Ofsted varies year by year over this period, so we have amalgamated them into four groups, with the bottom category being "judged to be failing".

22. The first set of results here reported adopts a straightforward analysis of the inspections data. We have not adjusted for the fact that schools deemed to be failing are inspected more often than those deemed to be excellent. This is likely to have two offsetting effects on the first part of our results described here. On the one hand, this means that the sample will contain more schools strongly at risk of failing, and these are likely to be schools in poor neighbourhoods. On the other hand, schools having been deemed to be failing are more likely to have improved – either through mean reversion or the impact of the Ofsted judgment – and so are less likely to fail next time around.

October 2010
Tables and Figures

Table 1: Characteristics of Secondary Schools by Ofsted Judgment, 2009

OFSTED judg ment

School means:

Number of inspections

KS2 test score of cohort 1

Eligibility f or FSM (%)

Neighbourhood Poverty (IDACI)

Excellent

0.276

9.5

0.193

151

2

0.052

12.2

0.225

283

3

-0.099

14.6

0.250

192

Fail

-0.073

16.7

0.267

42

Total

0.051

12.5

0.227

668

1. KS2 score: normalised within-year to mean zero and standard deviation 1.

Table 2: Schools judged to be failing, by school characteristics, 2009

Quintiles of School Characteristic s

KS2 test score of cohort 1

Eligibility for FSM (%)

Neighbourhood Poverty (IDACI)

Number of inspections

% of schools judged to be failing:

Lowest

9.7

2.24

2.24

134

2

7.46

5.92

6.72

134

3

7.52

7.58

6.77

133

4

4.48

7.46

8.21

134

Highest

2.26

8.27

7.52

133

Total

6.29

6.29

6.29

668

1. KS2 score: normalised within-year to mean zero and standard deviation 1.


Figure 1: Mean KS2 Test Score of Schools judged to be Failing (F) and Excellent (E)

Figure 2: Poverty Rate of Schools judged to be Failing (F) and Excellent (E)


Figure 3: Neighbourhood Poverty Rate of Schools judged to be Failing (F) and Excellent (E)

Figure 4: Percentage of Schools judged to be failing by school KS2 Test Score


Figure 5: Percentage of Schools judged to be failing by school poverty rate

Figure 6: Percentage of Schools judged to be failing by neighbourhood poverty rate