61.As discussed in the previous chapter, the assessment and accountability systems are inherently linked at primary school. Key Stage 2 assessments are carried out in order to hold schools and teachers to account for the attainment and progress made by their pupils. Professor Harvey Goldstein from the University of Bristol described the situation:
The problem is at the moment the accountability component dominates everything else and it distorts the curriculum, it distorts learning, it distorts children’s behaviour.
62.In primary schools in England, Key Stage 2 results and progress measures are used in a number of different ways to hold schools and teachers to account.
63.With such high-stakes use of data it is unsurprising that there are negative consequences on teaching and learning, as described in Chapter 5. Throughout the inquiry we heard calls for the stakes to be lowered at primary school.
64.In England, performance tables are published each year with data on pupil attainment and progress for reading, writing and maths. Publishing these results gives information to parents and other stakeholders, but also has drawbacks. The limitations were discussed by Professor Harvey Goldstein:
If you are looking at the differential performance comparatively across schools, for those who come in with very low achievement or very high achievement or, for example, for different ethnic groups or for boys versus girls, then you begin to start dealing in comparison with very, very small numbers. [ … ] So there is an inherent difficulty relating to this whole uncertainty associated with small numbers. [ … ] It should be way back in the background, of use as backing up or indicating where there may be issues, but not as the primary source for making judgements about schools.
65.Russell Hobby suggested that the Department should “look at a rolling average across three years or so, particularly when you are talking about 10 or 12 pupils in a sample.” This suggestion was echoed by Catherine Kirkup from National Foundation for Educational Research who told us:
I would retain the national testing but I think, as others have suggested, there is too much focus on one year’s results and I would like a move to rolling averages and trends so that you can look at how a school is performing over time but still look at the overall attainment of all schools.
66.Many of the negative effects of assessment are in fact caused by the use of results in the accountability system rather than the assessment system itself. Key Stage 2 results are used to hold schools to account at a system level, to parents, by Ofsted, and results are linked to teachers’ pay and performance. We recognise the importance of holding schools to account but this high-stakes system does not improve teaching and learning at primary school alone.
67.The Government should change what is reported in performance tables to help lower the stakes associated with them and reduce issues of using data from a small number of pupils. We recommend publishing a rolling three year average of Key Stage 2 results instead of results from a single cohort. Yearly cohort level data should still be available for schools for use in their own internal monitoring.
68.The increased focus on pupil progress is a positive step to make performance tables fairer for schools with more challenging intakes. However, there are still issues with how progress is measured. Currently, Key Stage 1 data is used as a baseline for pupils, which is collected after pupils have already been in school for three years. The Government is consulting on plans to introduce a baseline measure in reception in order to measure progress more effectively. The plans include:
69.The reasons behind this shift are to improve accuracy of the progress measure and to ensure it is a fair depiction of a pupil’s progress throughout the whole of primary school. Measuring progress from Key Stage 1 has been shown to lead to ‘gaming’ of results to increase progress scores. Education Datalab stated in its written evidence:
the replacement of the Key Stage 1 externally marked test with teacher assessment in 2003 led to primary schools depressing their scores, knowing it would be used as a baseline for Key Stage Two value-added measures. This did not happen in infant schools where Key Stage One is the outcome metric.
70.However, introducing a baseline measure for pupils at the beginning of primary school also has challenges. The Government has not outlined how it will ensure that any new baseline measure will not be subject to the same ‘gaming’ that Key Stage 1 results were. There are also other factors to take into account when deciding whether a baseline measure should be introduced in reception:
71.We heard mixed opinions about the introduction of the baseline measure throughout the inquiry. Many early years practitioners are understandably sceptical about the introduction of a test at an age before ‘formal’ schooling has started. ‘Better Without Baseline’, a group of early years organisations and teaching unions who express concern about baseline assessment, argued:
it is crucial that this should not have a negative and distorting effect on the Early Years Foundation Stage, which differs from the national curriculum for sound reasons relating to children’s development.
72.In 2016, the Government carried out a pilot of three baseline measures - one used only teacher observation and the other two used a combination of tests and observation. NFER, which provided one of the pilot baseline measures in 2016, believes that there would be significant benefits in introducing a baseline to better measure pupil progress, although accepts that it may be difficult to achieve. It also calls for any accountability measure to be used alongside a diagnostic tool, like the Early Years Foundation Stage Profile, to gain a more detailed picture of children’s development.
73.However, Dr Mary James, former Professor of Education Research at the University of Cambridge, described unresolved issues with introducing such a baseline, such as measuring small cohorts, or children who move or join schools part way through primary. Dr James also questioned what early years education should be for:
is it just preparation for secondary schools at the age of four? This is where the early years specialists will come down and say, “We are about children’s development, socially, physically, as well as cognitively” and so forth. To narrow it down to preparation for spelling, punctuation and grammar is completely distorting [ … ]
74.Professor Dominic Wyse argued that the focus should be on improving teacher assessment, and not on the introduction of a formal test. The Minister told us he was “open minded” as to whether the assessment was a formal test or an observational model. The consultation proposal states “this assessment would need to be appropriately teacher mediated, given the age of the children.” However, it does not give detail about the nature of the test.
75.The consultation suggests pupils should sit the test at the beginning of the second half term, “after pupils have been given enough time to settle into primary school”. Tim Oates suggested that the assessment could be carried out at different points in the year, as “5/6 year olds take time to settle into school, this can affect their ability to complete tests, and affect their scores, compromising dependability”. It has also been suggested that the measurement could take place in year one, rather than the reception year.
76.We welcome the increased focus on progress in performance measures and the Government’s commitment to introduce an improved baseline measure. However, in its consultation document, the Government fails to appreciate potential harmful consequences of introducing a baseline measure used for school accountability in reception.
77.The Government must conduct a thorough evaluation of potential benefits and harmful consequences of introducing any baseline measure, involving early years experts and practitioners, including impacts on pupil wellbeing and teaching and learning. The primary purpose of a measure of children at age 4 should be a diagnostic tool to help early years practitioners identify individual needs of pupils and should only be carried out through teacher assessment. We welcome the Government’s commitment that no data from a baseline will be used to judge individual pupils or schools.
78.We heard a range of ideas for how accountability measures could be made less high-stakes. A major change would be to replace school performance tables with a system of national sampling. This would remove the pressure on individual schools but still provide the Government with data on the overall performance of the primary education system for different groups of pupils. This approach is currently used for monitoring science performance at primary school. Professor Harvey Goldstein suggested:
If you want a monitoring system of testing for the whole of education, you can do that by sampling. [ … ] You do not need to test every single student several times. The more you have good, formative testing that is used by the teachers to understand where pupils are and what they need to know the better.
79.However, in order to hold individual schools to account for the performance of pupils at Key Stage 2, statutory testing is arguably the best method, as stated by NFER. There are also ways to improve how performance data is used, and what data is published, to lower the stakes associated with performance tables, like publishing three year averages of results.
80.In 2016, the Government raised the expected standard at primary schools, in turn increasing the pressure on schools to achieve higher results. Setting a more difficult target with a short lead-in time for many pupils will not automatically achieve higher standards, as described by Binks Neate-Evans from the Headteachers’ Roundtable:
Children have gone through their entire primary career and then we have the goalposts in February to say, “That is what you are shooting for.” It wasn’t manageable.
81.Dr Mary James was involved with the curriculum review. She told us that “as soon as we say [pupils] have to get 100 then that is what teachers will drill to”. This was not the original aim of the curriculum review, which was to encourage more ‘mastery’ of concepts at primary school. We received evidence suggesting the Government should remove the expected standard threshold completely as it “encourages excessive focus on students at the margin of meeting the standard”.
82.Alongside performance data, Ofsted plays an important role in holding schools to account and giving parents and other stakeholders more detailed information about a school. However, Ofsted has been criticised in the past for focusing too heavily on Key Stage 2 data when making its judgement. Dr Mary James suggested that the Ofsted inspection process should be strengthened by making it based on more qualitative judgements, such as teaching and learning in the classroom.
83.Professor Dominic Wyse agreed that “expert judgement as part of rigorous and perceptive inspection (including observations of teaching) should be a major means for judging school effectiveness”. Professor Harvey Goldstein also supported a change in the Ofsted model:
What you really want is an independent judgement of what is going on inside the school, which you can then put together with the statistical information. [ … ] [Ofsted judgements] confound the measurement the inspectors make when they go into schools, judging classrooms and teachers and so on, with the statistical evidence that is measuring something different. It would be much better and provide much more useful information if those were completely separate.
84.For future reforms, the Government should carefully consider the impact of setting thresholds for schools with short lead in times. We agree with the Government’s aim of raising standards at primary school but think that setting extremely challenging targets only leaves many students feeling they have failed, when in a previous year they would have succeeded. Expected standards should be raised over a much longer time period to give schools a chance to adjust to new expectations.
85.We recommend a thorough review of how Ofsted inspectors use Key Stage 2 data to inform their judgements and whether inspectors rely too heavily on data over observation. This could include a pilot of inspections where data is only considered following the inspection.
69 “New advice to help schools set performance related pay”, Department for Education press release, first published 16 April 2013, updated 29 April 2013
70 For example, Michael Wilson () para 2.3, Simon Nixon () para 10, Sandwell School Improvement Team () para 1.2, Q84
74 Department for Education and Standards & Testing Agency, Primary assessment in England Government consultation, launch date 30 March 2017, closing date 22 June 2017, p 15–20
75 Education Datalab () para 10
76 Better Without Baseline () para 3
77 The three providers of the reception baseline assessments during the pilot that ran during the 2015 to 2016 academic year were the Centre for Evaluation and Monitoring, Durham University; Early Excellence; and the National Foundation for Educational Research.
78 National Foundation for Educational Research ()
84 Department for Education and Standards & Testing Agency, Primary assessment in England Government consultation, launch date 30 March 2017, closing date 22 June 2017, p 16
85 Ibid. p 17
86 Tim Oates () para 10
88 National Foundation for Educational Research () para 2a
92 Education Datalab () para 21
94 Professor Dominic Wyse () para 5
28 April 2017