Education CommitteeWritten evidence submitted by the Assessment and Qualifications Alliance (AQA)

1. About AQA

1.1 AQA is an education charity and the leading provider of GCSEs and A-levels. We offer a portfolio of general qualifications (65 A-levels; 64 GCSEs), to suit different teacher and student needs.

1.2 Examination fees1 are overseen by AQA’s Council, an independent body of trustees nominated by education stakeholder groups (school leaders, teachers’ unions, employers and Higher Education). Fees are at a flat rate for almost all subjects.2 Although some subjects aren’t commercially viable at this price (eg smaller languages and sciences), we believe in offering teachers choice based on their students’ needs.

1.3 Any surplus is reinvested to develop the next round of specifications and associated education services.

2. Multiple Awarding Bodies (ABs)

2.1 Having multiple ABs3 promotes several benefits: innovation in assessment and service; risk reduction; choice; accountability to teachers and students; and the embedding of qualification development in the education community.

2.2 ABs thrive through innovation in response to market needs. Recent examples include the provision of: on-line data, giving sophisticated diagnostic feedback to teachers and students; electronic as well as paper results; the AQA baccalaureate with its on-line diary tool,4 and on-screen testing. A monopoly position discourages innovation, which would be viewed merely as risk; in a competitive market there are strong incentives. ABs take no risks in the awarding of grades but take managed risks in striving to improve ancillary services to teachers.

2.3 Competition reduces risk by driving up quality. Further, multiple ABs allow risk to be shared and spread. In the unlikely event of operational failure, not all students would be affected and another AB could intervene. The upheaval associated with changing systems or mergers poses a significant threat to the safe delivery of examinations.5

2.4 Dissatisfied teachers can switch between ABs, thus driving inferior provision out of the market. Teachers can switch easily, especially when specifications are revised (normally every five years). There is no “tie in” and standardised administrative arrangements facilitate switching.6 This creates strong incentives to reduce costs and improve service.

2.5 While some fundamental aspects of general qualifications (eg articulation of their primary purpose) sit best with government, the subtlety of content and approach sit best closer to the education community. Historically, ABs have grown out of Higher Education and subject communities,7 partly explaining the relatively high levels of satisfaction with GCSEs and A-levels amongst teachers and other stakeholders.8

2.6 There is an enduring collaborative relationship between the education community and ABs. ABs support grassroots developments, such as new qualifications9 and new specifications,10 thus enriching the assessment system. They take different perspectives on content depending on their relationships with different learned bodies, universities and expert groups, while the regulatory requirement to demonstrate equivalence of demand ensures comparability of rigour. Thus, teachers can select approaches tailored to the needs and interests of their students.11 Moreover, ABs support the development of subject-specific initiatives which nurture a dynamic subject curriculum. For example, AQA supports the Partnership in English Group, bringing together diverse education and arts bodies12 for the benefit of English teaching.

3. Franchising as an Alternative

3.1 Franchising is often introduced into monopolistic markets to create a proxy for competition. The benefits of introducing franchising into a market that already benefits from competition are unclear. While franchising has advantages over a single-AB system, it has none over the current system and would make the market less efficient. Significant disadvantages of franchising ABs to offer qualifications in particular subjects13 are outlined below.

3.2 AB expertise consists of subject and operations knowledge. Franchising would require this expertise to move between franchisees or be re-established from a potentially low level as each franchise was granted. Experience from national curriculum testing14 indicates that, where a contract moves between franchisees, disruption as staff are transferred or substantial TUPE/redundancy costs ensue, and significant delivery risk is introduced. Contracted markers and subject experts can move around more easily, although not all would move, and recruiting Senior Examiners is difficult and costly.15 Over time this would increase running costs and diminish the stock of assessment expertise available. Additionally, on each tendering process, the incumbent franchisee would be at an advantage over competitor franchisees in not having to bear these costs.

3.3 Currently, small organisations can enter the market, building up entries for their qualifications over time as they gain experience. However, as evident in national curriculum testing in 2008, entry into a substantial assessment market involves heavy start-up costs and risk of reputational damage. Developing, administering and awarding qualifications require a complex, interdependent structure, which doesn’t flex easily according to contracts won and lost. The need to flex would, therefore, leave the system open to very high levels of risk. Costly contingencies, such as those put in place for national curriculum testing, would be needed to mitigate the risk of delivery failure.

3.4 Bidding and contracting costs would be substantial and need to be weighed against any likely savings. Bidding costs are likely to act as a barrier to small providers entering the market. Such high costs are generated by the need for public and explicit processes, legal teams to negotiate contracts, and contract management to ensure obligations are met. The cost of establishing such a system would be high.

3.5 Moreover, franchising wouldn’t necessarily drive quality during the contract’s lifetime. It could support a culture of suppliers meeting contractual requirements but no more. Franchisees would have little incentive to innovate over the comparatively long term of the franchise, up to the point of considering whether to re-tender. Competitive tendering can also encourage potential franchisees to downplay estimated costs or overplay intended level of quality; and then, once the franchise has been awarded and the agreement entered into, reveal the true cost and quality of the contract, leaving the franchisor with the option of remaining tied into a lengthy, unattractive agreement or incurring the considerable expense of re-tendering.

3.6 Finally, in the current multi-AB system, teachers make the decision as to which AB to purchase services from. In franchising, this choice would be removed from teachers and given to a centralised government body with less direct knowledge of teacher and student needs, risking a loss of oversight and accountability.

4. Quality of AB Processes

4.1 While any process contains room for improvement, the quality of assessment at GCSE and A-level will be improved not by changing AB structure, but by increased research and innovation, including attention to international comparisons and hence competitiveness.

4.2 AQA’s question paper production, marking and awarding processes are documented, transparent, and follow the Regulators’ Code of Practice.16 There is, however, public concern that ABs compete by lowering standards and are delivering lower quality products because they are focused on other activities eg publishing. We shall address these concerns below.

5. Question Papers

5.1 In 2011 AQA produced 1,365 question papers and mark-schemes, involving over 1,000 Senior Examiners and advisers.

5.2 These are subject to a detailed, auditable process of revision overseen by the Subject Manager. Production involves at least four subject experts, including practising teachers and lecturers: Chair, Chief (who will also be a Principal Examiner (PE)), Reviser and Scrutineer.17 Extensive training and peer support are provided to the Senior Examining teams.18 Including the Subject Manager and a Proof Reader, at least six people proof-read and check every paper.19

5.3 The PE produces a draft paper and documents how it meets AQA’s quality assurance criteria. It then goes to a Reviser, whose comments go to the Question Paper Evaluation Committee (QPEC) comprising the subject’s Senior Examining team. The QPEC rigorously reviews each paper. Following the meeting, a Scrutineer checks and works through the paper. The Chief and Chair give final approval, and further quality assurance checks are made before printing. External printers are appointed following due diligence and enter into a contract covering all aspects of quality and security.

5.4 Post examination, statisticians produce detailed analyses of paper effectiveness (including measures of reliability, question difficulty and discrimination).20 These analyses, together with teachers’ comments and Senior Examiners’ experience of how students answered the questions, are used to inform the setting of subsequent papers.21 Hence, an effective cycle of feedback and continuous improvement is established. Despite claims to the contrary, evidence suggests that question papers are high quality and not unduly predictable.22

5.5 In 2011, two AQA papers contained printing errors and three questions contained significant errors that impeded students’ ability to respond (0.01% of the total number of questions set). AQA’s investigation into these errors concluded that there was no systemic failure.23 Paper production is susceptible to human error which we work hard to minimise. Where errors occur, we have effective methods to ensure that students are not disadvantaged. These include examiner allowances, marking and awarding adjustments and special consideration provisions.24 We are also investigating the possible benefits of the innovative use of technology in this area, including producing papers using a secure on-line system and a question-authoring approach.25 While AQA pre-tests some questions, experience shows that routine pre-testing isn’t the most effective, economical approach to error-identification. Errors can be reduced by the proper functioning of the scrutiny process. We are reinforcing this by providing enhanced clarity regarding the Scrutineer’s role, and by strengthening the links between the Scrutineer and the other members of the Senior Examining team.

6. Marking

6.1 Marking ranks scripts so that grade boundaries can be applied. Despite contrary perceptions, the marking reliability of general qualifications compares favourably with qualifications internationally.26 This is so even for assessments requiring relatively subjective judgements, eg English.27 Marking reliability could be improved by requiring less extended writing from students, but this would undermine assessment validity.28

6.2 Overall responsibility for marking, which is conducted by teams of examiners, lies with the PE. Examiners must normally have a relevant degree or equivalent, and at least three terms’ teaching experience.29

6.3 Following the examination, the PE identifies and marks a sample of scripts for quality assurance purposes. These are used to train examiners to mark at the same standard as the PE (standardisation) and as comparators for monitoring examiners’ ongoing marking.

6.4 Standardisation takes place on-line where possible, ensuring communication of the correct standard directly from PE to examiner.30 Where standardisation is face-to-face, the PE trains all examiners except where entries are large. Here the PE standardises assistant PEs/Team Leaders. They use materials marked by the PE to ensure the standard is communicated accurately.

6.5 On-screen marking is done question-by-question (rather than script-by-script) which improves reliability.31 Monitoring is through questions included in examiners’ allocations which the PE has already marked (seeds). Examiners are unaware which questions are seeds, allowing marking to be monitored throughout the process, in real time, identifying any deterioration in quality. Monitoring of paper-based marking is through sampling. AQA is rolling out on-screen marking for its examinations, supported by a programme of research evaluation.32

6.6 If an examiner’s marking is consistently severe or lenient and they cannot be re-standardised, their marks are adjusted. If, however, the examiner is inconsistent, they will be stopped and the scripts re-marked.

6.7 If there is doubt about an examiner’s marking, a partial or full re-mark will be undertaken by Senior Examiners. All examiners are classified based on the quality of their marking and administration. Reappointment is based on that classification.

6.8 Teachers may request a re-mark should they be dissatisfied with a student’s mark. While the number of re-mark requests increases year-on-year, evidence suggests this is due to the increasingly high-stakes nature of the qualifications, not deterioration in marking quality.33 Surveys of the users of general qualifications suggest high levels of satisfaction with the reliability of the marking and grading process.34

7. Grading

7.1 After marking, grade boundaries are set to compensate for small year-on-year fluctuations in difficulty (eg a slightly more difficult paper requires slightly lower grade boundaries).35 , 36 Boundaries are set by expert Senior Examiners, supported by technical staff, using statistical and judgemental evidence. This blend of evidence is essential to maintaining standards over time.37

7.2 Statistical analyses measure how the general ability of a cohort of students compares with that of the previous year. For example, in setting A-level boundaries, the average GCSE performance of the students is compared with that of the previous year’s students to help predict likely outcomes. Predictions take into account any year-on-year fluctuations in national GCSE results and are made across large cohorts, making them highly reliable. Crucially, predictions are based on national outcomes and therefore align AB standards.38 Other forms of technical evidence used include comparisons of the outcomes of schools and colleges common to the two years, re-sitting rates and the performance of re-sitters, and teachers’ estimated grades.

7.3 The analyses provide a starting place for Senior Examiners to scrutinise the students’ work. Examiners review the students’ work on the previous year’s grade boundary, compare it with the performance of this year’s students, and so select the most appropriate grade boundary, taking into account any changes in the demand of the question papers. The complexity of this task makes the statistical input essential.39

7.4 Aspects of student performance are carefully documented, particularly if it is believed that there has been a fall or rise in performance unsupported by the statistical evidence. Indeed, changes in outcomes exceeding 1% of statistical predictions trigger an even more robust investigation of both student performance and the reliability of the statistics. Occasionally this involves cross-reference to other ABs’ awarding experiences; an advantage of multiple ABs is the opportunity for illuminating comparison.40

7.5 Joint Council for Qualifications post-award analyses identify any potential misalignment of inter-AB standards. If misalignment is suggested, research is undertaken before the next award so as to understand the causes and achieve future alignment.41 This research uses increasingly sophisticated student-level data, methods and analyses. Hence, the belief that teachers gain advantage by switching to “easier” ABs is unfounded,42 differences in demand are compensated for in the setting of boundaries, which is continually reviewed.

7.6 The progressively high-stakes nature of general qualifications43 partly explains the year-on-year increases44 in results, which amount to a handful of extra students in each school exceeding the grade boundary each year. Teachers focus intensively on supporting borderline students,45 aided by the increased availability of mark-schemes, past papers, information and support, and transparency as to the skills and knowledge required and how to demonstrate them. Examinations focus on tightly defined sets of skills and knowledge. As education becomes increasingly centred on passing examinations, outcomes go up while other measures of learning, such as those measured by international surveys, go down.46

8. Commercial Activities

8.1 AQA works with numerous subject and assessment experts who wish to develop and share expertise through high quality training, resources and support to teachers. AQA’s commercial activities extend to resources considered to be educationally robust, but optional. Resources considered fundamental to teaching a specification are made freely available. The majority of AQA resources are free.

8.2 We offer training courses to support the development of knowledge and expertise specifically in the context of specifications, and more generally across the key stages. The former are free when specifications change or for schools new to a specification.47 Some courses include feedback on examination performance and are designed to improve subject teaching with a strong focus on practical strategies. However, when Senior Examiners are used as trainers, they are contractually prohibited from using any material that would confer unfair advantage. Teachers’ feedback indicates that they find this support invaluable.

8.3 Other free support includes: teaching aids, eg schemes of work and resource lists; examples of marked work with examiners’ reports on student performance; personalised support from coursework advisers; and local network meetings for examinations officers. AQA also provides schools with subject advisers to give specification advice. For example, recognising the high proportion of non-specialists within the mathematics teaching community,48 we provide every school with a dedicated teaching and learning advisor. Teachers use this free service to ensure students are taught the right content at the right level, and to explore innovative ways of delivering the subject.

8.4 AQA doesn’t publish textbooks. It works with a range of publishers to quality-assure endorsed resources for specifications, which helps restrain the most misleading market provision. Endorsement acts as a Kite-mark; quality assurance ensures the text accurately interprets the specification and assessment arrangements. Specifications are, however, sufficiently detailed that there is no need to buy texts to understand what is required. Examiners are leading subject experts and usually practising teachers, so are well-placed to act as authors. However, contractually, they mustn’t make use of their association with AQA for commercial purposes or identify themselves as AQA examiners in texts. In our experience teachers and students value a mixture of texts, some written for specifications and containing examination advice, and some more general.

8.5 AQA offers free Teacher Resource Banks (eg All About Maths) which can be used to supplement texts. Further, through Teachit49 we provide an on-line forum for teachers to share best practice. Teachit offers on-line support materials produced by teachers for teachers. PDFs which provide content equivalent to textbook material are free; only interactive materials are chargeable. These materials are never AB-specific. They reduce the pressure on schools to purchase new texts after significant specification revisions or when changing AB, and so are cost effective.

9. Conclusion

9.1 A different AB structure wouldn’t solve the perceived problems with assessment quality and cost. A multiple-AB structure which encourages competition and innovation is most likely to improve quality and efficiency. Indeed, evidence suggests that a different AB structure would produce problems of its own. However, there are opportunities for improvement and for working with schools and colleges to eliminate costs. We would of course be keen to discuss these and any related issues.

November 2011

1 AQA’s fees for a GCSE in 2011-12 are £28.10 or £28.15, and £75.10 for a four unit A-level; understanding the financial pressure on school budgets, the last increase in fees was below inflation at 1.2%, compared to the current RPI of c 5%.

2 For a very small number of subjects costs are significantly higher, and this is reflected in the fee.

3 The implications of a single awarding body structure for England while the Welsh awarding body WJEC can offer qualifications in England needs consideration.

4 A record of students’ extra-curricular achievements.

5 See: McCaig, C (2003). School Exams: Leavers in Panic. Parliamentary Affairs, 56(3), 471-48.

6 With a single AB structure, boycotting would be a last resort for teachers anxious to signal dissatisfaction, which would of course have a negative impact on pupils.

7 See: AQA. (2003). Setting the Standard: A Century of Public Examining by AQA and its Parent Boards. Nottinghamshire: Linney Print.

8 See: Ipsos MORI. (2010). Perceptions of A-levels and GCSEs—Wave 8. England: Ofqual. http://www.ofqual.gov.uk/downloads/category/106-gq-monitoring?download=464%3Aperceptions-of-a-levels-and-gcses-wave-8

9 For example, Foundation Certificates in Secondary Education which promote uptake and progression to GCSE in subjects such as the Modern Foreign Languages.

10 For example, the Nuffield Foundation 21st Century Science suite of GCSEs.

11 There is very little hard evidence of teachers choosing ABs because they are “easy”, as is sometimes claimed; see for example: Malacova, E & Bell, J (2006). Changing Boards: investigating the effects of centres changing their specifications for English GCSE. The Curriculum Journal, 17(1), 27-35.

12 For example, the Book Trust, the National Association of Writers in English, and the Queen’s English Society.

13 To be attractive, small entry subjects (e.g. A-level Latin) would need to be bundled with large entry subjects (e.g. A-level English Literature), and ideally GCSEs and A-levels in the same subject would be bundled so as best to support progression. Nonetheless, this would undermine a cross-curricular approach and learning from good practice in other subjects areas.

14 Also known as SATs (Standard Assessment Tests).

15 New Senior Examiners also require significant training in assessment techniques.

16 Ofqual, DCELLS & CCEA. (2011). GCSE, GCE, principal learning and project code of practice. http://www.ofqual.gov.uk/for-awarding-organisations/96-articles/247-codes-of-practice-2010

17 See the Ofqual Code of Practice for definitions of these roles.

18 See AQA support materials: AQA. (2011). Writing and revising question papers and mark-schemes for GCSE and A-level specifications; Chamberlain, S (2009a). The essentials of good question writing; Chamberlain, S (2009b). A guide to writing examination questions: theoretical approaches and practical solutions; Pinot de Moira, A. (2011). Effective discrimination in mark-schemes; Spalding, V (2010). Structure and formatting examination papers: examiners’ views of good practice; Spalding, V (2009). Is an exam paper greater than the sum of its parts? A literature review of question paper structure and presentation.

19 Senior Examiners are on one-year contracts, ensuring that any issues around quality can be addressed promptly.

20 Stockford, I, Eason, S, and Taylor, R (2010). Question Paper Functioning Reports. AQA report.

21 As recommended by: Pollitt, A, Ahmed, A, Baird, J, Tognolini, J, and Davidson, M (2008). Improving the quality of GCSE assessment. QCA report.

22 Ofqual. (2008). Predictability studies report: A study of GCE and GCSE examinations.

23 The full report can be found at: http://store.aqa.org.uk/news/pdf/AQA-W-SUMMER-2011-PAPER-ERRORS.PDF

24 http://web.aqa.org.uk/exam-errors.php

25 Rather than a single PE being responsible for an entire paper, a question authoring approach draws on numerous experts to produce a bank of questions from which papers can be constructed. The overall quality of a paper still rests with a named individual and they are free to focus attention on ensuring that quality, as they are not the sole source of all questions.

26 Meadows, M, and Billington, L (2005). A Review of Literature on Marking Reliability. Report produced for the National Assessment Agency.

27 Fowles, D (2009). How reliable is marking in GCSE English? English in Education, 43(1), 50-57.

28 Newton, P, and Meadows, M (2011). Marking quality within test and examination systems. Assessment in Education: Principles, Policy and Practice, 18(3), 213-216.

29 These requirements are considered more than adequate. See: Meadows, M, and Billington, L (2007). The Effect of Marker Background and Training on the Quality of Marking in GCSE English. Report produced for the National Assessment Agency.

30 Chamberlain, S, and Taylor, R (2010). Online or face to face? An experimental study of examiner training. British Journal of Educational Technology, 42(4), 665-675.

31 Pinot de Moira, A. (2009). Marking reliability & mark tolerances: Deriving business rules for the CMI+ marking of long answer questions. AQA report.

32 See for example: Whitehouse, C. (2010). Reliability of on-screen marking of essays. AQA report.

33 See for example: Taylor, R (2007). The impact of e-marking on enquiries after results. AQA report; and relating to NCTs: Newton, P (2009). National Curriculum Test reviews: Trends over time: 2000-7. Ofqual report.

34 Chamberlain, S. (2010). Public perceptions of reliability. Report produced for Ofqual.

35 Pollitt, A, Ahmed, A, and Crisp, V (2007). The demands of examination syllabuses and question papers. In P Newton, J Baird, H Goldstein, H Patrick, and P Tymms (Eds) (2007). Techniques for monitoring the comparability of examination standards. London: QCA.

36 Full details of the awarding process are set out in AQA’s “A basic guide to standard setting”: http://web.aqa.org.uk/over/standards.php?id=03&prev=03

37 See: Newton, P, Baird, J, Goldstein, H, Patrick, H and Tymms, P (Eds) (2007). Techniques for monitoring the comparability of examination standards. London: QCA

38 Benton, T, and Lin, Y (2011). Investigating the relationship between A-level results and prior attainment at GCSE. Report produced for Ofqual.

39 Stringer, N S (2011). Setting and maintaining GCSE and GCE grading standards: the case for contextualised cohort-referencing. Research Papers in Education, 1-20. Good, F J, and Cresswell, M J (1988). Grade awarding judgements in differentiated examinations. British Educational Research Journal, 14(3) 263-281.

40 In 2008 a new suite of GCSE Science specifications were awarded for the first time. Inter-AB differences in outcomes were the first indication that standards had not been maintained across all specifications. This would have remained undetected for longer in a single AB arrangement.

41 Examples include: Taylor, M (2011). GCSE modern foreign languages review: Setting standards in the new specifications. JCQ STAG Report; Keeling, H, and Spalding, V (2010). Level 3 Extended Project Qualification (EPQ) cross-moderation study. AQA Report; Taylor, M (2008). Report on the inter-AB (AQA and Edexcel) GCSE Italian meeting. AQA Report.

42 Teachers often compare the raw results of ABs without understanding the extent to which the general ability of each AB entry varies.

43 Boyle, B, and Charles, M (2011). Re-defining assessment: the struggle to ensure a balance between accountability and comparability based on a “testocracy” and the development of humanistic individuals through assessment. CADMO: An International Journal of Educational Research, 19(1) 55-65.

44 Such increases not are not peculiar to England, see for example: Wikström, C and Wikström, M (2005). Grade inflation and school competition: an empirical analysis based on the Swedish upper secondary schools. Economics of Education Review, 24(3) 309-322.

45 Richmond, T, and Freedman, S (2009). Rising Marks, Falling Standards: An investigation into literacy, numeracy and science in primary and secondary schools. Policy Exchange.

46 See for example: Hodgen, J, Brown, M, Küchemann, D, and Coe, R (2010). Mathematical attainment of English secondary school students: a 30-year comparison. BERA symposium.

47 For example, over the last year we have run more than 140 meetings for GCSE Science teachers and invested more than £500,000.

48 Howe, R (2011). School workforce in England, November 2010 (provisional). DfE report.

49 Teachit is part of the AQA family: http://www.teachit.co.uk/

Prepared 2nd July 2012