Memorandum submitted by Cambridge Assessment
Cambridge Assessment is Europe's largest assessment
agency and plays a leading role in researching, developing and
delivering assessment across the globe. It is a department of
the University of Cambridge and a not-for-profit organisation
with a turnover of around £175 million. The Group employs
around 1,400 people and contracts some 15,000 examiners each year.
Cambridge Assessment's portfolio of activities
includes world-leading research, ground-breaking new developments
and career enhancement for assessment professionals. Public examinations
and tests are delivered around the globe through our three highly
respected examining bodies.
The assessment providers in the Group include:
University of Cambridge English for Speakers
of Other Languages (Cambridge ESOL)
Tests and qualifications from Cambridge ESOL
are taken by over 1.75 million people, in 135 countries. Cambridge
ESOL's Teaching Awards provide a route into the English Language
Teaching profession for new teachers and first class career development
opportunities for experienced teachers. Cambridge ESOL works with
a number of governments in the field of language and immigration.
University of Cambridge International Examinations
(CIE)
CIE is the world's largest provider of international
qualifications for 14-19 year-olds. CIE qualifications are available
in over 150 countries. CIE works directly with a number of governments
to provide qualifications, training and system renewal.
The Oxford Cambridge and RSA awarding body (OCR)
provides general academic qualifications and vocational qualifications
for learners of all ages through 13,000 schools, colleges and
other institutions. It is one of the three main awarding bodies
for school qualifications in England.
The Assessment Research and Development division
(ARD) supports development and evaluation work across the Cambridge
Assessment group and administers a range of admissions tests for
entry to Higher Education. The ARD includes the Psychometrics
Centre, a provider and developer of psychometric tests.
A VIEW OF
THE SCOPE
OF THE
ENQUIRY
1. At Cambridge Assessment we recognize
that it is vital not to approach assessment on a piecemeal basis.
The education system is exactly that: a system. Later experiences
of learners are conditioned by earlier ones; different elements
of the system may be experienced by learners as contrasting and
contradictory; discontinuities between elements in the system
(eg transition from primary to secondary education) may be very
challenging to learners.
2. Whilst understanding the system as a
system is important, we believe that the current focus on 14-19
developments (particularly the development of the Diplomas and
post-Tomlinson strategy) can all too readily take attention away
from the serious problems which are present in 5-14 national assessment.
3. Our evidence tends to focus on assessment
issues. This is central to our organisation's functions and expertise.
However, we are most anxious to ensure that assessment serves
key functions in terms of supporting effective learning (formative
functions) and progression (summative functions). Both should
be supported by effective assessment.
4. We welcome the framing of the Committee's
terms of reference for this Inquiry, which make it clear that
it intends to treat these two areas as substantially discrete.
Cambridge Assessment's qualifications deliverers (exam boards),
OCR and University of Cambridge International Examinations, have
tendered evidence separately to this submission. They have looked
chiefly at 14-19 qualifications.
5. This particular submission falls into
two sections: Firstly Cambridge Assessment's views on the national
assessment framework (for children aged 5-14). These are informed
by, but not necessarily limited to, the work which we carried
out through out 2006 in partnership with the Institute for Public
Policy Research (IPPR) and the substantial expertise in the Group
of those who have worked on national tests.
6. The second section is on University Entrance
Tests. Cambridge Assessment has been involved in the development
of these for nearly a decade and uses a research base that stretches
back even further. At first their scope was limited to Cambridge
University but over the last four years it has grown to include
many other institutions. That they are administered under Cambridge
Assessment's auspices (as opposed to those of one of our exam
boards) is a reflection of their roots within our research faculty
and the non statutory nature of the tests themselves.
SECTION 1
1. NATIONAL ASSESSMENT
ARRANGEMENTS
7. In this section we have sought to outline
the problems that have built up around the national assessment
arrangements. We have then gone on to discuss the changes proposed
in our work with the IPPR. We also then discuss the problems that
appear to be inherent in the `Making Progress' model that the
Government is committed to trialling. Our conclusion is that there
is a window of opportunity before us at the present time, just
one of the reasons that the Committee's Inquiry is so timely,
and that the Government should not close it with the dangerous
haste that it seems bent on. There are a range of options and
to pursue only one is a serious mistake.
8. We have included two Annexes:
an overview of the evidence on national
assessment dealing with questions ranging from "teaching
to the test" to "measurement error"; and
a brief discussion of why the sometimes
mooted return of the APU might not deliver all the objectives
desired of it.
2. DIAGNOSIS
OF THE
CHALLENGECRITIQUE
AND REVISION
OF NATIONAL
ASSESSMENT ARRANGEMENTS
9. It is important to note that Cambridge
Assessment is highly supportive of the principle of a National
Curriculum and related national assessment. The concept of "entitlement"
at the heart of the National Curriculum has been vital to raising
achievement overall; raising the attainment of specific groups
(eg females in respect of maths and science); and ensuring breadth
and balance. We recognise that enormous effort has been put in,
by officials and developers, to improving the tests in successive
years. We support the general sentiment of the Rose Reviewthat
the system has some strong characteristicsbut it is clear
that deep structural problems have built up over time.
10. Whilst being concerned over these problems,
Cambridge Assessment is committed to the key functions supported
by national assessment: provision of information for formative
and diagnostic purposes to pupils, teachers and parents; information
on national standards, and accountability at school level. We
return to these key functions in more detail below. However, Cambridge
Assessment is critical of the way in which national assessment
has been progressively and successively elaborated into a system
which appears to be yielding too many serious and systemic problems.
Accumulating problems in National Assessmenta
vessel full to bursting point?
11. There are two particularly significant
problems in the highly sensitive area of technical development
of national assessment arrangements. Firstly, previous statements
by agencies, departments and Government have exaggerated the technical
rigour of national assessment. Thus any attempts to more accurately
describe its technical character run the risk of undermining both
the departments and ministers; " . . . if you're saying this
now, how it is that you said that, two years ago . . .".
This prevents rational debate of problems and scientifically-founded
development of arrangements.
Secondly, as each critique has become public, the
tendency is to breathe a sigh of relief as the press storm abates;
each report is literally or metaphorically placed in a locked
cupboard and forgotten.
12. In contrast, we have attempted here
to take all relevant evidence and integrate it; synthesising it
in such a way that underlying problems and tendencies can accurately
be appraisedwith the intention of ensuring effective evaluation
and refinement of systems.
14. Put simply, if a minister asks a sensible
question: " . . . are attainment standards in English going
up or down and by how much? . . ." there is no means of delivering
a valid and sound response to that question using current arrangements.
This is a serious problem for policy formation and system management.
It is not a position which obtains in systems which use independent
light sampling methods such as the US NAEP (National Assessment
of Educational Progress).
Functions
15. Current national curriculum assessment
arrangements within England have attracted increasing criticism
in respect of the extent to which they are carrying too many purposes
(Brooks R & Tough S; Bell J et al; Daugherty R et al). Since
1988 a substantial set of overt and tacit functions have found
themselves added. The original purposes specified in the TGAT
Report (Task Group on Assessment and Testing) comprised:
1. formative (diagnostic for pupils; diagnostic
for teachers);
2. summative (feedback for pupils and parents);
3. evaluative (providing information at LEA
and school level); and
4. informative (providing information on
educational standards at system level).
16. The following have been added, as increasingly
elaborated uses of the flow of detailed data from national assessment:
departmental accountability;
apportionment of funds;
inspection patterns and actions;
upwards pressure on standards/target
setting;
structuring of educational markets
and school choice;
emphasis of specific curriculum elements
and approaches;
detailed tracking of individual attainment,
strengths and weaknesses; and
quantification of progress.
17. Unsurprisingly, many educationalists
have expressed the view that the current tests carry too many
functions and that the underlying management processes are too
elaborated. To carry this broad range of functions, the system
of assessing every child at the end of each Key Stage is dependent
on maintaining test standards over time in a way which is in fact
not practical.
18. If you want to measure change, don't
change the measure. But the nation doesand shouldchange/update
the National Curriculum regularly. Whenever there is change (and
sometimes radical overhaul) the maintenance of test standards
becomes a particularly aggressive problem. It does, of course,
remain a constant problem in areas such as English Literature
when one could be pretesting a test on Macbeth which will be taken
in 2008 but the pupils are currently studying As You Like it when
they sit the pretest. There are remedies to some of the problems
this createsnamely switch to different sampling processes;
announcing radical recalibration, or switch to low stakes sampling
of children's performance, using a NAEP or a modernized APU-style
model (Assessment of Performance Unitsee Annexe 2).
19. Attempting to use national assessment
to measure trends over time has produced some of the most intense
tensions amongst the set of functions now attached to national
testing. Stability in the instruments is one of the strongest
recommendations emerging from projects designed to monitor standards
over time. Running counter to this, QCA and the DfES havein
line with commitments to high quality educational provision, the
standards agenda and responses from review and evaluation processessought
to optimize the National Curriculum by successive revision of
content, increasing the "accessibility of tests", and
ensuring tight linkage of the tests to specific curriculum content.
20. These are laudable aimsand the
emphasis on the diagnostic function of the data from tests has
been increasing in recent innovations in testing arrangements
However, pursuit of these aims has led to repeated revision rather
than stability in the tests. The Massey Report suggested that
if maintenance of standards over time remained a key operational
aim, then stability in the test content was imperative (Massey
A et al). In the face of these tensions, a light sampling survey
method would enable de-coupling of national assessment from a
requirement to deliver robust information on national educational
standards. This would enable testing to reflect curriculum change
with precision, to optimize the learning-focussed functions of
testing, and enable constant innovation in the form of tests to
optimize accessibility.
21. It is therefore clear that the current
functions of national testing arrangements are in acute and chronic
tension. Using the pragmatic argument that "every policy
should have a policy instrument" we conclude that national
arrangements should indeed support school accountability and improvement,
report to parents and monitor national standards but that a change
of arrangements is required to achieve this. A range of approaches
are necessary to deliver these functions and we outline some viable
options below.
3. ALTERNATIVE
APPROACHES TO
NATIONAL ASSESSMENT
(KS1, KS2, KS3)
Objectives
22. There is a need to conceptualise a number
of possible models for consideration in an attempt to address
the problems of "multipurpose testing". It is vital
to note that we present here three alternatives. We do this to
show that there are credible alternatives for delivering on the
key objectives of national assessmentit is simply not the
case that there is only one way of moving forward.
23. We believe the aims should be to:
reduce the assessment burden on schools;
provide formative assessment for
teaching and learning;
provide information for school accountability;
and
provide information on national standards.
24. In order to secure widespread support
within the education community (including parents) a firm re-statement
of educational purpose (values) and a commitment to high degrees
of validity is essential. It is not enough to initiate changes
merely because of concerns about the defects of existing arrangements.
We do not here outline values and validity in detail, but recognise
that this is a vital precondition of designing revised arrangements,
putting them in place, and monitoring their operation. It is important
that a full discussion of these matters precedes any executive
decision regarding revised arrangements.
Alternative models for national assessment
Model 1: Validity in monitoring plus accountability
to school level
25. The aim of this approach is to collect
data using a national monitoring survey and to use this data for
monitoring standards over time as well as for moderation of teacher
assessment. This would enable school performance to be measured
for accountability purposes and would involve a special kind of
criterion referencing known as domain referencing.
26. Question banks would be created based
on the curriculum with each measure focusing on a defined domain.
A sample of questions would be taken from the bank and divided
into lots of small testlets (smaller than the current KS tests).
These would then be randomly allocated to each candidate in a
school. Every question is therefore attempted by thousands of
candidates so the summary statistics are very accurate and there
are summary statistics on a large sample of questions. This means
that for a particular year it is known, for example, that on average
candidates can obtain 50% of the marks in domain Y.
27. The following year it might be found
that they obtain 55% of the marks in that domain. This therefore
measures the change and no judgement about relative year on year
test difficulty is required. Neither is there a need for a complex
statistical model for analysing the data, although modelling would
be required to calculate the standard errors of the statistics
reported. However, with the correct design they would be superfluous
because they would be negligible. It would be possible to use
a preliminary survey to link domains to existing levels and the
issue of changing items over time could be solved by chaining
and making comparisons based on common items between years. Although
each testlet would be an unreliable measure in itself, it would
be possible to assign levels to marks using a statistical method
once an overall analysis had been carried out. The average of
the testlet scores would be a good measure of a school's performance
given that there are sufficient candidates in the school. The
appropriate number of candidates would need to be investigated.
28. The survey data could also be used to
moderate teacher assessment by asking the teacher to rank order
the candidates and to assign a level to each of them. Teacher
assessment levels would then be compared with testlet levels and
the differences calculated. It would not be expected that the
differences should be zero, but rather that the need for moderation
should be determined by whether the differences cancel out or
not. Work would need to be done to establish the levels of tolerance
and the rules for applying this process would need to be agreed.
The school could have the option of accepting the statistical
moderation or going through a more formal moderation process.
29. There would be a number of potential
advantages related to this model. Validity would be increased
as there would be greater curriculum coverage. The data would
be more appropriate for the investigation of standards over time.
The test development process would be less expensive as items
could be re-used through an item bank, including past items from
national curriculum tests. There would also be fewer problems
with security related to "whole tests". No awarding
meetings would be needed as the outcomes would be automatic and
not judgemental. Since candidates would not be able to prepare
for a specific paper the negative wash-back and narrowing of the
curriculum would be eliminated (ie the potential elimination of
"teaching to the test"). There would also be less pressure
on the individual student since the tests would be low stakes.
30. Given that there are enough students
in a school, the differences in question difficulty and pupil
question interaction would average out to zero leaving only the
mean of the pupil effects. From the data it would be possible
to generate a range of reports eg equipercentiles and domain profiles.
Reporting of domain profiles would address an issue raised by
Tymms (2004) that "the official results deal with whole areas
of the curriculum but the data suggests that standards have changed
differently in different sub-areas".
31. Work would need to be done to overcome
a number of potential disadvantages of the model. Transparency
and perception would be important and stakeholders would need
to be able to understand the model sufficiently to have confidence
in the outcomes. This would be a particularly sensitive issue
as students could be expected to take tests that prove to be too
difficult or too easy for them. Some stratification of the tests
according to difficulty and ability would alleviate this problem.
There is an assumption that teachers can rank order students (Lamming
D) and this would need to be explored. Applying the model to English
would need further thought in order to accommodate the variations
in task type and skills assessed that arise in that subject area.
32. Eventually the model would offer the
possibility of reducing the assessment burden but the burden would
be comparatively greater for the primary phase. Although security
problems could be alleviated by using item banking, the impact
of item re-use would need to be considered. Having items in the
public domain would be a novel situation for almost any other
important test in the UK (except the driving test).
33. Discussion and research would be needed
in a number of areas:
scale and scope eg number and age
of candidates, regularity and timing of tests;
formal development of the statistics
model;
simulation of data (based on APU
science data initially);
stratification of tests / students;
and
pilots and trials of any proposed
system.
Model 2: Validity in monitoring plus a switch
to "school-improvement inspection"
34. Whilst the processes for equating standards
over time have been enhanced since the production of the Massey
Report, there remain significant issues relating to:
teacher confidence in test outcomes;
evidence of negative wash-back into
learning approaches;
over-interpretation of data at pupil
group level; inferences of improvement or deterioration of performance
not being robust due to small group size;
ambiguity in policy regarding borderlining;
no provision to implement Massey
recommendations regarding keeping tests stable for five years
and then "recalibrating" national standards; and
publishing error figures for national
tests.
35. In the face of these problems, it is
attractive to adopt a low-stakes, matrix-based, light sampling
survey of schools and pupils in order to offer intelligence to
Government on underlying educational standards. With a matrix
model underpinning the sampling frame, far wider coverage of the
curriculum can be offered than with current national testing arrangements.
36. However, if used as a replacement for
national testing of every child at the end of KS1, 2 and 3, then
key functions of the existing system would not be delivered:
data reporting, to parents, progress
for every child at the end of each key stage; and
school accountability measures.
37. In a system with a light sampling model
for monitoring national standards, the first of these functions
could be delivered through (i) moderated teacher assessment, combined
with (ii) internal testing, or tests provided by external agencies
and/or grouped schools arrangements. The DfES prototype work on
assessment for learning could form national guidelines for (i)
the overall purpose and framework for school assessment, and (ii)
model processes. This framework of assessment policy would be
central to the inspection framework used in school inspection.
38. The intention would be to give sensitive
feedback to learners and parents, with the prime function of highlighting
to parents how best to support their child's learning. Moderated
teacher assessment has been proven to facilitate staff development
and effective pedagogic practice. Arrangements could operate on
a local or regional level, allowing transfer of practice from
school to school.
39. The second of these functions could
be delivered through a change in the Ofsted inspection model.
A new framework would be required since the current framework
is heavily dependent on national test data, with all the attendant
problems of the error in the data and instability of standards
over time. Inspection could operate through a new balance of regional/area
inspection services and national inspectioninspection teams
operating on a regional/area basis could be designated as "school
improvement teams". To avoid competition between national
and regional inspection, national inspections would be joint activities
led by the national inspection service.
40. These revised arrangements would lead
to increased frequency of inspection (including short-notice inspection)
for individual schools and increased emphasis on advice and support
to schools in respect of development and curriculum innovation.
Inspection would continue to focus on creating high expectations,
meeting learner needs, and ensuring progression and development.
Model 3: Adaptive, on-demand testing using IT-
based tests
41. In 2002, Bennett outlined a new world
of adaptive, on-demand tests which could be delivered through
machines. He suggests that "the incorporation of technology
into assessment is inevitable because, as technology becomes intertwined
with what and how students learn, the means we use to document
achievement must keep pace". Bennett (2001) identifies a
challenge, "to figure out how to design and deliver embedded
assessment that provides instructional support and that globally
summarises learning accomplishment". He is optimistic that
"as we move assessment closer to instruction, we should eventually
be able to adapt to the interests of the learner and to the particular
strengths and weaknesses evident at any particular juncture .
. .". This is aligned to the commitments of Government to
encourage rates of progression based on individual attainment
and pace of learning rather than age-related testing.
42. In the Government's five year strategy
for education and children's services (DfES, 2004) principles
for reform included "personalisation and choice as well as
flexibility and independence". The White Paper on 14-19 Education
and Skills (2005) stated, "Our intention is to create an
education system tailored to the needs of the individual pupil,
in which young people are stretched to achieve, are more able
to take qualifications as soon as they are ready, rather than
at fixed times . . ." and "to provide a tailored programme
for each young person and intensive personal guidance and support".
These intentions are equally important in the context of national
testing systems.
43. The process relies on item-banking,
combining items in individual test sessions to feed to students
a set of questions appropriate to their stage of learning and
to their individual level of attainment. Frequent low-stakes assessments
could allow coverage of the curriculum over a school year. Partial
repetition in tests, whilst they are "homing in" on
an appropriate testing level, would be useful as a means of checking
the extent to which pupils have really mastered and retained knowledge
and understanding.
44. Pupils would be awarded a level at the
end of each key stage based on performance on groups of questions
to which a level has been assigned. More advantageously, levels
could be awarded in the middle of the key stage as in the revised
Welsh national assessment arrangements.
45. Since tests are individualised, adaptivity
helps with security, with manageability, and with reducing the
"stakes", moving away from large groups of students
taking a test on a single occasion. Cloned items further help
security. This is where an item on a topic can include different
number values on a set of variables, allowing the same basic question
to be systematically changed on different test administrations,
thus preventing memorisation of responses. A simple example of
cloning is where a calculation using ratio can use a 3:2 ratio
in one item version and 5:3 ratio in another. The calibration
of the bank would be crucial with item parameters carefully set
and research to ensure that cloning does not lead to significant
variations in item difficulty.
46. Reporting on national standards for
policy purposes could be delivered through periodic reporting
of groups of cognate items. As pupils nationally take the tests
and when a critical nationally representative sample on a test
is reached, this would be lodged as the national report of standards
in a given area. This would involve grouping key items in the
bank eg on understanding 2D representation of 3D objects and accumulating
pupils' performance data on an annual basis (or more or less frequently,
as deemed appropriate) and reporting on the basis of key elements
of maths, English etc.
47. This "cognate grouping" approach
would tend to reduce the stakes of national assessment, thus gauging
more accurately underlying national standards of attainment. This
would alleviate the problem identified by Tymms (2004) that "the
test data are used in a very high-stakes fashion and the pressure
created makes it hard to interpret that data. Teaching test technique
must surely have contributed to some of the rise, as must teaching
to the test".
48. Data could be linked to other cognate
groupings, eg those who are good at X are also good at Y and poor
on Z. Also, performance could be linked across subjects.
49. There are issues of reductivism in this
model as there could be a danger to validity and curriculum coverage
as a result of moving to test forms which are "bankable",
work on-screen and are machine-markable. Using the Cambridge taxonomy
of assessment items is one means of monitoring intended and unintended
drift. It is certainly not the case that these testing technologies
can only utilise the most simple multiple-choice (mc) items. MC
items are used as part of high-level professional assessment eg
in the medical and finance arenas, where well-designed items can
be used for assessing how learners integrate knowledge to solve
complex problems.
50. However, it is certainly true that,
at the current stage of development, this type of approach to
delivering assessment cannot handle the full range of items which
are currently used in national testing and national qualifications.
The limitation on the range of item types means that this form
of testing is best used as a component in a national assessment
model, and not the sole vehicle for all functions in the system.
51. School accountability could be delivered
through this system using either (i) a school accumulation model,
where the school automatically accumulates performance data from
the adaptive tests in a school data record which is submitted
automatically when the sample level reaches an appropriate level
in each or all key subject areas, or (ii) the school improvement
model outlined in model 2 above.
52. There are significant problems of capacity
and readiness in schools, as evidenced through the problems being
encountered by the KS3 ICT test project which has successively
failed to meet take-up targets. It remains to be seen whether
these can be swiftly overcome or are structural problems eg schools
adopting very different IT network solutions and arranging IT
in inflexible ways. However, it is very important to note that
current arrangements remain based on "test sessions"
of large groups of pupils, rather than true on-demand, adaptive
tests. These arrangements could relieve greatly the pressures
on infrastructure in schools, since sessions would be arranged
for individuals or small groups on a "when ready" basis.
53. There are technical issues of validity
and comparability to be considered. The facility of a test is
more than the sum of the facility on the individual items which
make up each test. However, this is an area of intense technical
development in the assessment community, with new understanding
and theorisations of assessment emerging rapidly.
54. There are issues of pedagogy. Can schools
and teachers actually manage a process where children progress
at different rates based on on-demand testing? How do learners
and teachers judge when a child is ready? Will the model lead
to higher expectations for all students, or self-fulfilling patterns
of poor performance amongst some student groups? Theseand
many more important questionsindicate that the assessment
model should be tied to appropriate learning and management strategies,
and is thus not neutral technology, independent of learning.
Overall
55. Each of the models addresses the difficulties
of multipurpose testing. However, each model also presents challenges
to be considered and overcome. The Statistics Commission (2005)
commented that "there is no real alternative at present to
using statutory tests for setting targets for aggregate standards".
The task is to find such an alternative. The real challenge is
to provide school accountability data without contaminating the
process of gathering data on national standards and individual
student performance. All three models have their advantages and
could lead to increased validity and reliability in national assessment
arrangements andcruciallythe flow of reliable information
on underlying educational standards; something which is seriously
compromised in current arrangements.
4. NEW PROGRESS
TESTSSERIOUS
TECHNICAL PROBLEMS
56. As a possible line of development for
new arrangements, the DfES recently has announced pilots of new
test arrangements, to be trialled in 10 authorities. Cambridge
Assessment has reviewed the proposals and, along with many others
in the assessment community, consider that the design is seriously
flawed. The deficiencies are significant enough to compromise
the new model's capacity to deliver on the key functions of national
assessment; ie information on attainment standards at system level;
feedback to parents, pupils and teacher; and provision of school
accountability.
57. Cambridge Assessment's response to the
DfES consultation document on the progress tests covered the subject
in some detail and we reproduce it below for the Select Committee:
i We welcome the developing debate on the
function and utility of national assessment arrangements. We applaud
the focus on development of arrangements which best support the
wide range of learning and assessment needs amongst those in compulsory
schooling.
ii As specialists in assessment, we have
focused our comments on the technical issues associated with the
proposals on testing. However, it is vital to note that Cambridge
Assessment considers fitness for purpose and a beneficial linkage
between learning and assessment to be at the heart of sound assessment
practice.
iii We consider effective piloting, with
adequate ethical safeguards for participants, to be essential
to design and implementation of high quality assessment arrangements.
It is essential that evaluation method, time-frames, and steering
and reporting arrangements all enable the outcomes of piloting
to be fed into operational systems. There is inadequate detail
in the document to determine whether appropriate arrangements
are in place.
iv We remain concerned over conflicting public
statements regarding the possible status of the new tests (TES
30 March), which make it very unclear as to whether existing testing
arrangements will co-exist alongside new arrangements, or whether
one will be replaced by the other. This level of confusion is
not helpful.
v We see three functions as being essential
to national assessment arrangements:
Intelligence on national standardsfor
the policy process.
Information on individual pupil performancefor
the learner, for parents, for teachers.
Data on school performancefor accountability
arrangements.
We do not feel that the new model will meet
these as effectively as other possible models. We would welcome
discussions on alternatives.
vi We believe that, by themselves, the new
test arrangements will not provide robust information on underlying
standards in the education system. With entry to single-level
tests dependent on teachers' decisions, teachers in different
institutions and at different times are likely to deploy different
approaches to entry. This is likely to be very volatile, and effects
are unlikely to always cancel out. This is likely to contaminate
the national data in a very new ways, compared with existing testing
arrangements. There are no obvious remedies to this problem within
the proposed arrangements, either in the form of guidance or regulation.
vii Teachers are likely to come under peculiar
pressures, from institutions wishing to optimise performance-table
position, from parents of individual children etc. This is an
entirely different scenario to the "all to be tested and
then a level emerges" character of current arrangements.
Tiering invokes a similar, though not as here all-pervasive, effect.
viii Although advanced as "on-demand"
testing, the regime is not an "on-demand" regime, and
it is misleading to promote it as such. It provides one extra
test session per year.
ix The frequency of testing is likely to
increase the extent to which testing dominates teaching time.
This is not a problem where the majority of washback effects from
testing are demonstrably beneficial; we believe that other features
of the tests mean that washback effects are likely to be detrimental.
It is not clear what kind of differentiation in teaching will
flow back from the tests. Ofsted and other research shows differentiation
to be one of the least developed areas of teaching practices.
We are concerned that the "grade D" problem (neglect
of this those not capable of getting a C and those who will certainly
gain a C) will emerge in a very complex form in the new arrangements.
x The tests may become MORE high stakes for
learners. Labelling such as " . . . you're doing Level 2
for the third time! . . ." may emerge and be very pernicious.
Jean Rudduck's work shows such labelling to be endemic and problematic.
xi We are unclear regarding the impact on
those learners who fail a test by a small marginthey will
wait six months to be re-tested. Do teachers judge that they should
"lose six months of teaching time" to get them up to
the required level or just carry on with no special support. If
special support is given, what is the child not doing which they
previously would have done? This is a key issue with groups such
as less able boysthey will need to take time out of things
which they are good at and which can bolster their "learning
identities". Those who are a "near miss" will need
to knowthe document does not make clear whether learners
will just "get a level back"; will get a mark; or an
item-performance breakdown.
xii Testing arrangements are likely to become
much more complexsecurity issues, mistakes (such as wrong
test for a child) etc are likely to gain in significance.
xiii The length of the tests may be an improvement
over existing tests, but further investigative work must be done
to establish whether this is indeed the case. 45-minute tests
may, or may sample more from each subject domain at an appropriate
level, compared with existing national tests. This is an empirical
question which needs to be examined. Lower sampling would reduce
the reliability of the tests. Compounding this, the issue of pass
marks must be addressedcompensation within the tests raises
not only reliability questions but also washback effects into
formative assessment. People who pass may still need to address
key areas of learning in a key stage, if compensation and pass
marks combine disadvantageously. The length of the tests and the
need to cover the domain will tend to drive tests to a limited
set of item types, raising validity issues. This in turn affects
standards maintenanceif items are clustered around a text,
if the text is changed (remembering test frequency is increased
100%) then all the items are no longer usable. This represents
a dramatic escalation of burden in test development. Constructing
and maintaining the bank of items will be very demanding.
xiv If a high pass mark is set (and the facility
of items tuned to this) there will be little evidence of what
a child cannot do. Optimising the formative feedback elementincluding
feedback for high attainersin the face of demand for high
domain coverage, reasonable facility, and accessibility (recognisable
stimulus material etc) will be very demanding for test designers.
Level-setting procedures are not clear. The regime requires a
very fast turnaround in resultsnot least to set in place
and deliver learning for a "re-take" in the next test
session (as well as keeping up with the pace of progression through
the National Curriculum content). This implies objective tests.
However, some difficult factors then combine. The entry will be
a volatile mix of takers and re-takers.
xv While calibration data will exist for
the items, random error will increase due to the volatility of
entry, feeding into problems in the reliability of the item data
in the bank. Put crudely, with no awarding processes (as present
in existing national tests) there will be a loss of control over
the overall test dataand thus reliability and standards
over time will become increasingly problematic. As one possible
solution, we recommend the development of parallel tests rather
than successively different tests. Pre-tests and anchor tests
become absolutely vitaland the purpose and function of
these must be explained clearly to the public and the teaching
profession. More information on this can be provided.
xvi Having the same tests for different key
stages (as stated by officials) is problematic. There is different
content in different stages (see English in particular). QCA has
undertaken previous work on "does a Level 4 mean something
different in different key stages"the conclusion was
that it did.
xvii The 10-hour training/learning sessions
are likely to be narrowly devoted to the tests. This may communicate
strong messages in the system regarding the importance of drilling
and "surface learning"exactly the opposite of
what the DfES is supporting in other policy documents. Although
superficially in line with "personalisation", It may
instil dysfunctional learning styles.
xviii We applaud the sensitivity of the analysis
emerging from the DfES in respect of the different populations
of learners who are failing to attain target levels. We also support
the original Standards Unit's commitment to a culture of high
expectations, combined with high support. However, this level
of sensitivity of analysis is not reflected in the blanket expectation
that every child should improve by two levels.
xix We do not support "payment by results"
approachesin almost any form these have successively been
found wanting. Undue pressure is exerted on tests and test administrationmaladministration
issues escalate.
xx In the face of the considerable challenge
of developing a system which meets the demanding criteria which
we associate with the operation of robust national assessment,
we would welcome an opportunity to contribute to further discussions
on the shape of enhanced national arrangements.
5. THE WAY
FORWARD FOR
NATIONAL ASSESSMENT
58. What is needed is a new look at optionsand
both the technical and political space for manoeuvre. Cambridge
Assessment has not only attempted to assemble the evidence but
have produced a "3 option" paper which outlines possible
approaches to confront the very real problems outlined above.
We commend a thoroughgoing review of the evidence. Not a"single
person review" like "Dearing" or "Tomlinson",
but a more managed appraisal of options and a sober analysis of
the benefits and deficits of alternatives. For this, we believe
that a set of clear criteria should be used to drive the next
phase of development:
technically-robust arrangements should
be developed;
the arrangements should be consistent
with stated functions;
insights from trialling should fed
into fully operational arrangements;
unintended consequences are identified
and remedied;
full support from all levels of the
system is secured in respect of revised arrangements;
a number of models should be explored
at the same time, in carefully designed programmesin other
words there should be parallel rather than serial development,
trialling and evaluation; and
appropriate ethical safeguards and
experimental protocols should be put in place during development
and trialling.
59. It is, of course, vital to consider
not only the form of revised arrangements which better deliver
the purposes of national assessment but also to consider the methods
and time frame for development arrangements, as well as the means
of securing genuine societal and system support.
60. The last two elements listed above are
critical to this: currently, there are no plans for trialling
more than one revised model for national testing. However, a cursory
glance in the education research field shows that there is a range
of contrasting approaches to delivering the key functions of national
testing, many of which may well be presented to this Inquiry .
. . It therefore would seem important to trial more than one model
rather than "put all eggs in one basket" or take forward
only modifications of existing arrangements.
61. It is unclear whether adequate safeguards
have been put in place to protect learners exposed to revised
national assessment arrangements. Cambridge Assessment recommendsin
line with the standards being developed by the Government's Social
Research Unitthat new protocols should be developed, as
a matter of urgency for the trialling of revised arrangements.
|