APPENDIX 3
NATIONAL MONITORING BY COHORT SAMPLING: HOW
IT WORKS
An approach to national monitoring that uses
cohort sampling has numerous advantages compared to the testing
of whole cohorts of students. The techniques of cohort sampling
are well established and are used in studies of international
comparisons of student performance such as in the PISA and TIMSS
projects. Cohort sampling has also been used in this country from
the mid seventies through the eighties by the Assessment of Performance
Unit (APU) within DfES. An explanation of the approach used by
the APU will serve to illustrate the workings of national monitoring
by cohort sampling.
The APU was set up within DfES in 1975. Its
brief was to promote the development of assessing and monitoring
the achievement of children at school and to identify the incidence
of underachievement.
The actual monitoring was contracted out. The
National Foundation for Educational Research (NFER) was contracted
to carry out the monitoring of mathematics, language and foreign
languages, a consortium from Leeds University and King's College
London was contracted to monitor science, whilst Goldsmiths College
was contracted to monitor technology. Surveys of samples of students
aged 11 years old were started in 1978 and continued until 1988.
Surveys of students aged 13 were started in 1980 and continued
until 1985 and surveys of students aged 15 were started in 1978
and continued until 1988. Table 1 gives the subject details and
the specific dates of the APU surveys.
Table 1
APU SURVEYS BY SUBJECT, DATE AND AGE OF STUDENTS
|
Subject | Age 11
| Age 13 | Age 15
|
|
Mathematics | 1978-82, 1987
| | 1978-82, 1987
|
Language | 1979-83, 1988
| | 1979-83, 1988
|
Science | 1980-84
| 1980-84 | 1980-84
|
Foreign Languages | | 1983-85
| |
Design & Technology | |
| 1988 |
|
The approach of the APU was to have a light sampling of schools
and a light sampling of pupils within schools. Thus, in the case
of the mathematics surveys in England a sample of 10,000 students
(about 1.5% of population) was used. Each student was given a
written test (students did not all have the same written test)
and sub-samples of 2-3,000 were also given other assessments such
as attitude questionnaires or practical mathematics tests. A linking
and scaling structure was built into the written tests so that
students could all be placed on a common scale. The structure
is a cartwheel design in which common items appeared in any two
tests. Table 2 illustrates this structure.
Table 2
LINKING STRUCTURE OF WRITTEN TESTS
|
Group of Test items | Test 1
| Test 2 | Test 3
| Test 4 | Test 5
| Test 6 |
|
A | A |
| | | | A
|
B | B | B
| | | |
|
C | | C
| C | |
| |
D | | | D
| D | |
|
E | | |
| E | E
| |
F | | |
| | F | F
|
|
With reference to Table 2, although each student takes just
one of the tests, the common items that appear across any two
tests means that the performance of students across the whole
six tests can be put onto a common scale.
It is by this design that a wider coverage of the curriculum
can be assessed than is possible from any single test and this
can be achieved without putting undue burden on individual schools
and students. Furthermore, this approach enables students' performance
to be monitored in those areas of the curriculum that it is impracticable
to test a whole cohort such as practical mathematics. This can
be achieved by setting assessment in these areas for small sub-samples
of students.
THE ADVANTAGES
The approach of cohort sampling combined with a linking and
scaling structure for the tests offers numerous advantages for
national monitoring.
1. As the approach is a light sampling of schools and
a light sampling of students within schools this reduces the testing
burden on schools and students compared to the present regime.
2. Within this approach, schools and students have anonymity;
the testing is low stakes and thus should have minimal adverse
impact upon the curriculum.
3. It is possible to have a wide curriculum coverage that
is tested.
4. It is possible to have a range of assessment formats,
for example some assessment of practical aspect of the curriculum
can be addressed.
5. Test items can be used repeatedly over time.
6. Items can be replaced without the need to develop whole
new tests.
7. It is relatively inexpensive.
8. The outcomes give a good indication of trends in performance.
9. It is a tried and tested method that has been used
in this country and is still being used in surveys of performance
for international comparisons.
THE DISADVANTAGES
There are some limitations to this approach.
1. It does not give ratings for individual schools.
2. With light sampling of pupils, it is difficult to give
feedback to individual schools.
3. The linking and scaling is based on Item Response Theory
(IRT), the statistics of which can be difficult to interpret.
A simple scale would need to be developed that is adhered to and
understood by all. An example of how this might be achieved can
be seen in the international assessment projects such as TIMSS,
PISA and PIRLS.
June 2007
|