Memorandum submitted by The Mathematical Association


General Issues


o A centrally run system of testing and assessment is important for the credibility of qualifications, and for the moderation of teacher assessment. Educational institutions also need to be accountable to stakeholders and funders (eg the taxpayer), and a national system of assessment is one (though only one) measure of satisfactory performance. It is therefore important for individuals, at least at critical point(s), for end users of education such as employers, and for stakeholders in the education system.

o Other systems have been well explored, eg in the Tomlinson report.

o Many of our members believe they now have evidence that 'high stakes assessment' is distorting the curriculum and reducing creativity: in particular, the committee is referred to the small-scale evidence submitted to QCA and others by the MA, ATM and NANAMIC: Impact of Assessment on Learning and Teaching Mathematics (front page at

o QCA's effectiveness with respect to mathematics has been reduced by the disbandment of the maths team; this situation has recently been addressed by the appointment of a maths specialist within the leading team. We are concerned about the apparent lack of accountability to, or even cognisance of, the professional community: witness the recent drives to push through 2-tier GCSE, or 3-6-9-12 etc GCE Maths qualifications at the behest of exam boards and in the face of deep-seated unease among the community.

o We also have grave concerns about the development of some current initiatives devolved to within the exam boards: we see little evidence that they have the necessary depth or breadth of expertise for the development of significant changes in mathematics education, and already see signs that assessments are driving the new initiatives (development of Functional Maths qualifications, of changes to GCSE Mathematics, and of a second Mathematics GCSE), rather than being driven by sound mathematics education philosophy and research.


Our evidence suggests that we are increasingly seeing the growth in value of what we can assess, rather than the assessment of what we value: this is a profoundly worrying trend which must be reversed if we are to build our capacity for good mathematics education in our classrooms. We feel it is exacerbated by the existence of a number of commercially-driven exam boards; we deplore the increasing availability of board-endorsed or marketed study materials which undoubtedly compromise the integrity of the system, and in particular we query the need for more than one exam board operating any given qualification. (We have more detailed proposals regarding this proposal, should they be thought useful).




National Key Stage Tests


o How effective are they? In mathematics, the Key Stage tests in themselves are good assessments comprising well-developed questions which probe understanding, that is, when used well they support good mathematics learning. Broadly speaking, they rank students correctly and are useful as diagnostic tools for pinpointing areas of relative strength and weakness. In other words, they have the potential to be used as good instruments for assessment for learning.

o Do they adequately reflect levels of performance? At Key Stages 1, 2 and 3 the tests increasingly inflate the levels 'achieved', even without coaching immediately prior to the test, eg on a level 4-6 paper at Key Stage 3 the marks are broadly divided equally between the 3 levels concerned, yet it is only necessary to answer most of the levels 4 and 5 questions correctly to be 'awarded' a level 6. This causes misunderstanding among all concerned, and is a nonsense.

o Changes over time: The IPPR report, "Assessment and Testing"1, presents evidence to suggest that improvements in National Curriculum Levels overstate underlying improvements in attainment.


"Although the two are not directly comparable, improvements in TIMSS (Trends in International Mathematics and Science Study) are thus much less impressive than the measured improvements in key stage test results. The Statistics Commission considered these issues in 2005 and concluded that: 'The Commission believes that it has been established that (a) the improvement in Key Stage 2 test scores between 1995 and 2000 substantially overstates the improvement in standards in English primary schools over that period, but (b) there was nevertheless some rise in standards.' (Statistics Commission 2005: 4). Looking at the secondary phase, the percentages of pupils attaining the benchmark at Key Stage 3 and Key Stage 4 have continued to rise although progress on international attainment measures has stalled. Evidence from TIMSS for Key Stage 3 (Year 9) does not show any significant change in performance between 1995 and 2003 (Ruddock et al 2004). Analysis of the international study PISA (Programme for International Student Assessment) shows that for a given score at Key Stage 3 or Key Stage 4, pupils attained on average a higher PISA score in 2000 than in 2003 (Micklewright and Schnepf 2006). 2


o Cause and Effect: We know of no evidence showing that the introduction of high stakes testing has in itself raised standards: our members are overwhelming of the opinion that where progress has been made, particularly with non-specialist or inexperienced teachers, it has been supported by the use of the National Strategies, especially where these have been applied intelligently.

o Coaching for the test, now occupying inflated teaching time and effort in almost all schools for which we have information at each Key Stage, is not constructive: short term 'teaching how to' is no substitute for longterm teaching of understanding and relationship within and beyond mathematics part of a broad and balanced curriculum. It is interesting that in Wales, where testing is no longer 'high stakes', apparent (but, according to work samples, not real) attainment has decreased, but healthy practice such as cross-phase moderation, is now being adopted.

o Such testing is marginally effective in concentrating students' minds, but in terms of longterm learning the current practices are destructive, and counterproductive in terms of sustained attainment. Neither are they effective in holding schools accountable for performance: feeder schools with very similar results can produce students with widely varying usable mathematics skills and understanding by the following September. In general only the most confident and competent schools or teachers are able to withstand the perceived pressure to warp teaching according to the tests. In general, teacher assessment appears to have little value to stakeholders, or even to leadership teams in schools.

o We fail to see how the proposals in 'Making Good Progress' would address these concerns: at worst, the difficulties will be exacerbated, with a greater proportion of time devoted to coaching for tests rather than building for longterm robustness and fluency. It will be possible to hothouse students to 'achieve' at higher levels because only a comparatively small range of skills is being tested at any one time, but this is well-known by effective teachers to be counterproductive in the medium term. Further, recent work by Dylan Wiliam has cast doubt on the accuracy of results at KS2 and KS3, estimating that about 32% of KS2 results and 43% of KS3 results are at least one level out. 3

o A move to single-level tests would strongly encourage a move to single-level teaching in order to prepare students for them. This has not been common since the early days of the National Curriculum due to the fragmentation of learning it can lead to. Such a move would be counter to the proposed changes to KS3 and 4 Programmes of Study which allow for an increase, rather than a reduction, in curricular freedom and personalisation; it would undermine existing good practice.

o Increasing the frequency of high-stakes testing for schools will increase the pressure on students to perform. In our view, it is unethical to put children under such pressure when the results are of more importance to the school than they are to individual students.

o We do not believe in any case that it is possible reliably to discriminate at a certain level with a single test: 'levelness' is an amalgam of skills, concepts and knowledge, and the variation with which students 'achieve' a given level at present suggests they each develop in different directions at different speeds. 'Levels' in a given area of mathematics are far more constructively employed as discriminators for progression, that is, for teacher and student assessment for learning. Our concern is not with single-level tests in themselves but with the way in with they are currently used. There have been examples of single-level tests that have worked reasonably well, for example, GAIM and the MEI National Curriculum scheme for GCSE, where they are used 'en passant' as part of the development of a broad palette.

o We see no argument to keep these tests as 'high stakes' assessment in terms of league tables: Northern Ireland and Scotland have well-respected systems without resorting to such, and in Wales the system is developing well without them. As in-school components of formative assessment, alongside teacher assessment, they are valuable. Until GCSE we see no need for blanket formal high-stakes summative assessment.

o We feel the levels are broadly age-appropriate, given that 'the average' should mean perhaps 60% can achieve meaningfully (and sustainably) at or above that level. But achievement should mean broad 'mastery': at present, critical, typically more challenging, areas for understanding and progression are skimped at every stage in the race to 'achieve' more highly, for example, notions of proportionality, and a robust fluency with numbers such that algebra is but a trivial generalisation of understanding, are often under-developed.



Testing and Assessment at 16 and after:


o Is testing and assessment in 'summative' tests fit for purpose? GCEs are not good indicators for successful progression in cognate subjects: the correlation to degree class is low. As good school-leaving certificates giving evidence of learners' achievements and strengths, GCEs tend to focus on a very narrow skill range so give only a partial picture. At GCSE especially, grades are given at such low mark thresholds that they hardly can be said to celebrate learners' achievements, nor, since they do not require mastery, are they of consistent use to end-users.

o Additionally, particularly 'high stakes' qualifications such as Mathematics GCSE are now skewing even provision in the secondary and college curricula, with increasing resources concentrated on borderline C/D students at the expense of those more or less able, and often for short term gain rather than confident mastery of subject material: in other words, coaching for the test rather than teaching for longterm understanding of fundamentals.

o Are the changes to coursework due to come into effect in 2009 reasonable? Changes to coursework (and other GCSE changes in mathematics) have been rushed in without sufficient trialling of their effects on teaching and learning: there is no indication that replacement papers will assess effectively those skills engendered by proper coursework, although in the best classrooms there will now be more time available to properly develop those skills.

o Is holding formal summative tests at ages 16, 17 and 18 imposing too great a burden on students? Too much of the year is taken up with examinations and direct preparation for them. The move to reduce the number of modules will help by cutting the number of assessments and reducing the use of the January sitting. (Just testing at 16, 17 and 18 would be a step forward, at the moment, many students are tested at 14.5, 15, 15.5 and 16 (for modular GCSEs), then at 16.5, 17, 17.5 and 18 for GCEs.)

o If so, what changes should be made? At GCSE there are already too many changes in the pipeline for teachers easily to be able to make good use of them for students: these, and the change to 4 modules at GCE, must first be given time to embed. We must avoid the current destructive practice of making a set of changes then planning the next set before the previous ones have been implemented or evaluated. Longterm, efforts must be made to reduce the present fragmented and time-consuming burden of external summative assessment.

o To what extent is frequent, modular assessment altering both the scope of teaching and the style of teaching? In the worst cases, it has allowed many teachers to teach in a dull and boring way, using the stick of impending examinations to motivate learners rather than inspiring and enthusing them. It has introduced a narrow focus on imparting to learners mark-winning behaviours rather than teaching them a coherent understanding of the subject. Links between different parts of the subject are not examined because the topics are in different modules, and as they are not examined they are not taught. They have contributed to the paucity of teaching multi-stage problem-solving and the synthesis and communication of arguments. The widespread perception is that best practice in teaching and learning is mutually exclusive with optimising module results.

o How does the national assessment system interact with university entrance? There is a tension between how far the system is there to reflect achievement to date, and how far it is to act as selection instrument for progression, where progression can be in a wide variety of directions. In particular, GCE has long since ceased to be primarily a selection instrument for universities. Attempts at bolt-ons, like A* grades or 'stretch and challenge' are unlikely to be effective in meeting that use although they may serve to undermine other uses. In mathematics especially, it is difficult to see how one could set a paper which was accessible across the population now undertaking GCE studies yet providing opportunity for the most able in the subject to shine.

o What does it mean for a national system of testing and assessment that universities are setting entrance tests as individual institutions? The increase in the number of these tests makes it clear that they are not meeting the needs of at least some end-users. Some suggest that a more constructive development might be a coordinated set of papers in each subject, rather than students being tested repeatedly by each of their five university choices. Perhaps we should see the role of GCE as being part of a leaving certificate which wraps up and attests the learner's achievement at school/college and use separate instruments for assessing suitability for progression to higher education in particular subjects, though of course this adds to the 'burden of assessment'. If we did this in maths, we would probably need a suite of papers of increasing difficulty: what would excite Anglia Ruskin in a candidate would not be the same as what would make Trinity, Cambridge sit up and take note, although GCEs probably remain appropriate for selection for some institutions. The key thing would be students would have to sit at most one extra set of papers and it would suffice for all their university applications in a subject (and they would do this at a sensible time of year). GCE is being asked to do an additional job from the one for which it was introduced; in making a fairish job of serving that new role it has ceased to fulfil its original role as well.


June 2007



1: Assessment and Testing: Making space for teaching and learning, IPPR, December 2006

2: Assessment and Testing: Making space for teaching and learning, IPPR, December 2006

3: 'The Reliability of Assessments' in Assessment and Learning, P Black and D Wiliam, Gardner J (ed), London: Sage, 2006