Select Committee on Children, Schools and Families Minutes of Evidence


APPENDIX 3

NATIONAL MONITORING BY COHORT SAMPLING: HOW IT WORKS

  An approach to national monitoring that uses cohort sampling has numerous advantages compared to the testing of whole cohorts of students. The techniques of cohort sampling are well established and are used in studies of international comparisons of student performance such as in the PISA and TIMSS projects. Cohort sampling has also been used in this country from the mid seventies through the eighties by the Assessment of Performance Unit (APU) within DfES. An explanation of the approach used by the APU will serve to illustrate the workings of national monitoring by cohort sampling.

  The APU was set up within DfES in 1975. Its brief was to promote the development of assessing and monitoring the achievement of children at school and to identify the incidence of underachievement.

  The actual monitoring was contracted out. The National Foundation for Educational Research (NFER) was contracted to carry out the monitoring of mathematics, language and foreign languages, a consortium from Leeds University and King's College London was contracted to monitor science, whilst Goldsmiths College was contracted to monitor technology. Surveys of samples of students aged 11 years old were started in 1978 and continued until 1988. Surveys of students aged 13 were started in 1980 and continued until 1985 and surveys of students aged 15 were started in 1978 and continued until 1988. Table 1 gives the subject details and the specific dates of the APU surveys.

Table 1

APU SURVEYS BY SUBJECT, DATE AND AGE OF STUDENTS


Subject
Age 11
Age 13
Age 15

Mathematics
1978-82, 1987
1978-82, 1987
Language
1979-83, 1988
1979-83, 1988
Science
1980-84
1980-84
1980-84
Foreign Languages
1983-85
Design & Technology
1988


  The approach of the APU was to have a light sampling of schools and a light sampling of pupils within schools. Thus, in the case of the mathematics surveys in England a sample of 10,000 students (about 1.5% of population) was used. Each student was given a written test (students did not all have the same written test) and sub-samples of 2-3,000 were also given other assessments such as attitude questionnaires or practical mathematics tests. A linking and scaling structure was built into the written tests so that students could all be placed on a common scale. The structure is a cartwheel design in which common items appeared in any two tests. Table 2 illustrates this structure.

Table 2

LINKING STRUCTURE OF WRITTEN TESTS


Group of Test items
Test 1
Test 2
Test 3
Test 4
Test 5
Test 6

A
A
A
B
B
B
C
C
C
D
D
D
E
E
E
F
F
F


  With reference to Table 2, although each student takes just one of the tests, the common items that appear across any two tests means that the performance of students across the whole six tests can be put onto a common scale.

  It is by this design that a wider coverage of the curriculum can be assessed than is possible from any single test and this can be achieved without putting undue burden on individual schools and students. Furthermore, this approach enables students' performance to be monitored in those areas of the curriculum that it is impracticable to test a whole cohort such as practical mathematics. This can be achieved by setting assessment in these areas for small sub-samples of students.

THE ADVANTAGES

  The approach of cohort sampling combined with a linking and scaling structure for the tests offers numerous advantages for national monitoring.

    1.  As the approach is a light sampling of schools and a light sampling of students within schools this reduces the testing burden on schools and students compared to the present regime.

    2.  Within this approach, schools and students have anonymity; the testing is low stakes and thus should have minimal adverse impact upon the curriculum.

    3.  It is possible to have a wide curriculum coverage that is tested.

    4.  It is possible to have a range of assessment formats, for example some assessment of practical aspect of the curriculum can be addressed.

    5.  Test items can be used repeatedly over time.

    6.  Items can be replaced without the need to develop whole new tests.

    7.  It is relatively inexpensive.

    8.  The outcomes give a good indication of trends in performance.

    9.  It is a tried and tested method that has been used in this country and is still being used in surveys of performance for international comparisons.

THE DISADVANTAGES

  There are some limitations to this approach.

    1.  It does not give ratings for individual schools.

    2.  With light sampling of pupils, it is difficult to give feedback to individual schools.

    3.  The linking and scaling is based on Item Response Theory (IRT), the statistics of which can be difficult to interpret. A simple scale would need to be developed that is adhered to and understood by all. An example of how this might be achieved can be seen in the international assessment projects such as TIMSS, PISA and PIRLS.

June 2007






 
previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2008
Prepared 13 May 2008