Education CommitteeWritten evidence submitted by Dr Ian Jones, Mathematics Education Centre, Loughborough University

Executive Summary

1. Current exams use short response items that enable a high marking reliability but fail to capture authentic evidence of the conceptual knowledge and higher order thinking skills we value and need. So long as this remains the case any attempt to increase the accuracy of setting papers and marking scripts will result in yet shorter and more predictable items that move us even further away from assessing deep understanding. Fortunately, tightening up awarding bodies’ existing procedures is not the only way to increase accuracy. A new assessment method called Adaptive Comparative Judgements (ACJ) enables the use of authentic evidence and yet achieves an even higher reliability than traditional short response items. ACJ would therefore enable us to ensure accuracy in setting papers and marking scripts in a way that is beneficial rather than detrimental to how we teach and assess our children.

Brief Introduction

2. I am a Senior Research Fellow in Mathematics Education at Loughborough University, currently funded by a Royal Society Shuttleworth Education Research Fellowship and the Nuffield Foundation. My central research interest is in analysing the limitations of current GCSE mathematics exams and testing alternative models of assessment. Prior to this I worked on numerous mathematics education projects and was a school teacher for 10 years.

Factual Information

3. Exams offer a time-honoured and efficient method for gathering evidence about students’ learning and achievement. However, current exams—and mathematics GCSE papers in particular—have been criticised for being too narrow and predictable. They fail to gather evidence of the very conceptual knowledge and higher order thinking skills that our country values and needs. They fail because they comprise short response items that require only the rote recall and application of facts and procedures.

4. Exams comprise short response items because traditionally that is the only way to ensure a high marking reliability (ie that each student gets the grade he or she deserves). This is because assessment practices are dominated by traditional testing theories—namely Classical Testing Theory, Item Response Theory and Generalisability Theory—which assume and require that knowledge and skills be fragmented into atomistic items that can be objectively scored. So long as traditional testing theories continue to dominate assessment then the only way to increase accuracy will be to design papers that are even more fragmented and even more removed from assessing the very conceptual knowledge and higher order thinking skills we value and need.

Recommendation for Action

5. A new method has emerged that offers an alternative to traditional testing theories. The method—called Adaptive Comparative Judgement (see paragraph 6 for details) —is based on a long-standing and well-established psychological principle—Thurstone’s Law of Comparative Judgement—and is now viable for large-scale assessment due to recent technological innovations. Unlike traditional testing theories, ACJ is based on expert judgements of authentic evidence of students’ conceptual knowledge and higher order thinking skills. Moreover, it surpasses traditional methods in terms of reliability and validity. In other words we can now assess the learning we value and need more accurately than we currently assess the learning we do not value or need.


6. Assessment using Adaptive Comparative Judgement (ACJ) involves presenting examiners with pairs of candidates’ work and asking them to decide which of the two demonstrates a better understanding of the subject. Thurstone’s Law of Comparative Judgement states that people are very reliable when deciding which of a pair of objects has more of a given quality such as “understanding of the subject”, and ACJ exploits this principle to construct stable and robust rank orders of candidates’ work. The normal procedures for assigning grades to candidates can then be undertaken, and as with traditional methods this can be criterion referenced or norm referenced as preferred. Previous studies have demonstrated that ACJ is (i) highly reliable and robust; (ii) works with evidence of conceptual knowledge and higher order thinking skills; (iii) can be as time and cost efficient as current practices.

November 2011

Prepared 2nd July 2012