National curriculum assessments undergo a high-quality and robust test development process that takes place over three years. This ensures that tests are valid and reliable measures of the national curriculum. The development process consists of five phases:
(1) Question development
Test frameworks are developed for all national curriculum assessments. They explain the purpose and the structure of the tests and outline exactly what could be assessed from the curriculum. Test frameworks include a content domain, cognitive domain and test specification, outlining minimum and maximum marks for different areas of the curriculum to ensure that appropriate breadth and depth of the subject is included in each test. The test frameworks are shared with curriculum and education experts throughout development, including with teachers.
In the first stage of the test development process, questions (known as items) and mark schemes are written by test development experts on the basis of these test frameworks. These questions are subject to a small-scale trial, after which they are refined.
(2) Expert review and item validation trial with a nationally representative sample of schools
During the next stage of the process, all items undergo expert review by three panels. A range of stakeholders, including teachers, headteachers, curriculum experts and inclusion experts, review each question and provide feedback. Experts provide feedback on the suitability of items for the age group, whether items test the desired construct and whether there are any issues with the accessibility of the question. Comments from all three panels are amalgamated and each question is amended accordingly. Any item that is not deemed suitable does not proceed in the process.
All items are then trialled with a nationally representative sample of schools to determine how they function technically, and whether they perform as they were intended to. Each item is trialled with approximately 300 pupils. Following the item validation trial, the data is internally reviewed with curriculum advisers, further amendments are made if required, and any unsuitable items are removed.
(3) Expert review and technical pre-test with a nationally representative sample of schools
Items which have been successful at the previous stage are then collated into booklets for further extensive trialling and review.
These booklets undergo expert review by teachers, headteachers, curriculum experts and inclusion experts. As with the item level expert review, experts provide feedback on how suitable the items and overall tests are for the particular age group, whether it tests the desired construct, and whether there are any issues with accessibility.
These booklets then undergo a wider trial with a nationally representative sample of schools in what is known as a technical pre-test. This is designed to gather reliable statistical data, and each item is trialled with approximately 1,000 pupils. Following the technical pre-test, data is internally reviewed with further amendments made if required and any unsuitable items removed. Only items that are performing well are taken forward for potential live test selection. During both the item validation trial and technical pre-test trial, administrators and teachers complete questionnaires about the materials, including reporting on pupil experience. This information is considered alongside the item performance data.
(4) Live test construction
Following these two rounds of trialling and expert reviews, the live tests are constructed. The proposed live test is subject to further review by teachers, expert reviewers, inclusion specialists and curriculum advisers before being confirmed. Test mark schemes are finalised from earlier versions. Modified versions of the tests are also developed for pupils with access requirements, including large print and braille versions. The live tests are then administered by schools.
(5) Standard setting or maintenance
Standard setting is an internationally recognised process and involves teachers using their professional judgement to determine the mark threshold of the expected standard using the performance descriptors and information from the live administration of the tests. The 2016 tests were the first to assess a new national curriculum in which expectations of pupil achievement had been raised, therefore a new expected standard needed to be set. A national expression of interest was sent to all schools that had a statutory obligation to take the tests. Each subject had two standard setting panels, with between 20 and 30 teachers participating in each panel, and across all subjects and key stages 343 teacher panellists were involved.
Once the standard was set, a raw score to scaled score conversion table was created. Once standards have been set, in all subsequent years it will be necessary to maintain the standard that was initially set. This involves an internationally recognised psychometric process.
28 November 2017