Select Committee on Education and Employment Appendices to the Minutes of Evidence


Memorandum from Professor Paul Black and Professor Dylan Wiliam (HE 73)



  Whilst it is widely used, the SAT is a focus of public controversy and professional concern in the USA.

  Private test preparation agencies can enhance SAT scores by ad hoc training programmes, which are open to those who can afford their fees.

  SAT results show considerable bias against those from low-income groups and/or from ethnic minorities.

  SAT scores are weak predictors of college success, and are less effective for this purpose than school grades, despite the wide variety of school systems across the USA.

  A significant number of universities and colleges in the USA have abandoned the use of SAT scores for selection, on the grounds that the pressures on schools for ad hoc preparation for those who are to take it distort the education of their future students.

  2.  Following the contributions of Professor Wiliam to the discussions of the Committee on Tuesday, 18 July, we have noted the interest of the Committee in the possible use of the USA's SAT as a selection test for tertiary education in this country. We offer this background note as a contribution to the Committee's consideration of the SAT, particularly because we are aware that the use of the SAT is a matter of controversy in the USA. Whilst both of us have interest and expertise in the field of testing and assessment, Professor Black has extensive first-hand knowledge of the situation in the USA, being a visiting professor at Stanford University, a contributor to education studies of the National Academy of Sciences, through work on their Board of Testing and Assessment, and in their preparation of three documents of significance for national policy : the USA National Science Standards, a new addendum to those standards on Classroom Assessment, and a forthcoming study on the Cognitive Foundations of Assessment.

  3.  The issues raised in this note are based on research and professional literature, as set out in a list of references at the end. The issues are further illustrated by two extracts from the press, one being an article in the New York Times (10 July 2000) and the other an editorial debate in USA Today (11 July 2000), copies of which are also appended.[29]

  4.  The SAT was founded in 1926 under the name of Scholastic Aptitude Test. It was adapted from the IQ tests first used extensively for selection purposes by the USA military in World War I. It was composed then and ever since, solely of multiple choice items. Some of those who developed it had a vision of a range of such tests being used as tools for social engineering on a large scale, and were closely associated with the eugenics movement which burgeoned at that time (see Hanson 1993, Lemann 1999). The SAT, set up for the College Entrance Examination Board, is produced and administered by the Educational Testing Service (ETS) a private "not-for-profit" organisation.

  5.  Early impetus was given by the president of Harvard, J B Conant, whose aim was more restricted and less alarming—to select, for the privileged universities, the most "intelligent" applicants in a way that would negate the effects of the privileged educational and social backgrounds enjoyed by some. The SAT subsequently changed character, particularly after World War II, into a programme to promote equality of access for all. The size of the USA, the difficulties of communication, the multiplicity of curriculum influences in over 40 states and the heterogeneous background of a country still welcoming large numbers of immigrants, meant that test methods familiar elsewhere, notably in European countries, were unsuitable.

  6.  Despite its remarkable expansion and the lead that it gave to the growth of a major test industry in the USA (private test agencies provide several hundred million multiple choice tests each year), the SAT has attracted a range of serious criticisms which call in question the viability of selection for higher education based on the measurement of "intelligence". Four of these attack the basic claims upon which the SAT is founded, as follows.

  7.  One claim that was essential to its reputation was that the previous experiences and education of candidates would not affect the measurement, so that no amount of coaching could enhance one's score on the test. This claim was severely dented very early in its history by private agencies who convinced the public that they could raise the scores of SAT candidates by ad hoc drilling with questions similar to those used in the test: as a consequence, those who can afford the fees for private preparation can enhance their SAT scores. The claim that such enhancement was not possible was abandoned by ETS in 1979: there is well researched evidence that ad hoc test preparation does yield significant score increases (Bond, 1989) and the recent editorial debate in USA Today (attached herewith)[30] illustrates the current public interest in the challenge to this claim.

  8.  A second claim is that tests should be free from bias, in that inequalities associated with irrelevant effects of the family origin, gender, race and so on of candidates will not affect their scores. A great deal of effort has been invested in exploring this problem in order to alleviate its effects, but it cannot be claimed that bias has been eliminated. There is a vast history of legislative battles in the USA over the problems of alleged bias in standardised tests, and the SAT has endured its shared of these (Cole & Moss 1989, Heubert & Hauser, 1999). The results of the 1999 tests for college-bound students shows, for example, that the mean SAT score for white students is much greater than that for African American or Black students (1024 points against 856 points), and the score for those with family incomes over $70,000 a year (about 1070 points) it is greater than for those with incomes under $20,000 a year (about 900 points) (see College Board 1999). Thus the evidence shows that privileged candidates do secure, on average, higher SAT scores than those less privileged. A difference in marks in A-level of a size comparable to a SAT score difference of 200 points would produce a change of more than one A-level grade.

  9.  A third claim is that the IQ test, or the numerical and verbal parts of the SAT, measure well-defined, underlying and central components of human capacity and potential. Ironically, Brigham, the inventor of the SAT, who came to be one of its harshest critics, foresaw challenges to this claim when he wrote in 1929:

    The more I work in this field the more I am convinced that psychologists have sinned greatly in sliding easily from the name of the test to the function or trait measured.

    (Quoted by Lemann 1999, p 33).

  Most psychologists do not now accept the notion of the single unitary trait that the IQ claims to measure and argue for more complex measures of human thinking (Gardner, 1993: Sternberg, 1997). It should be noted that the title Scholastic Aptitude Test was changed by ETS to Scholastic Achievement Test, and subsequently the test has come to be called the SAT without connection of the initials to particular words. Recently it seems that ETS has said that it is a test of mental dexterity (see the recent New York Times article attached herewith[31]) this concept is not recognised in the field of psychology of learning, and it is doubtful if it has any clear meaning.

  10.  It may be that, despite the ambiguity about what exactly is measured, the SAT does measure a set of aspects of the complex of human thinking which are relevant for prediction of achievement in a particular sphere, notably tertiary level study, and may function usefully because it reflects a relevant combination of these aspects. This leads into the fourth claim, namely that the SAT is a good predictive measure. The ETS has to be able to show evidence of strong correlation between the SAT scores of college applicants and their subsequent performance, in order to convince tertiary institutions to require applicants to take the SAT, thereby requiring them to pay the fees which are the main source of income for the ETS.

  11.  Correlations between the SAT score and the performance of college students at the end of their first year are of the order of 0.5 which means that the SAT scores account for about 25 per cent of the variance of college results. What a correlation of 0.5 means in practice is that if, for example, we have to choose between two candidates, then if we know nothing about them, we can only choose at random and then have a 50 per cent chance of choosing the one best qualified. If we know the SAT scores, we can then choose the one with the higher score—in that case the chance of having chosen the one most likely to succeed will have risen to 67 per cent. If the intrinsically better candidate were from a group known to be disadvantaged by the SAT, we might, by using the SAT as a basis for choice, end up selecting the better candidate in less than 50 percent of the cases.

  12.  Thus a correlation of 0.5 is not very impressive, and has usually been less than the correlation obtained with school grades (which can account for about 33 per cent of the variance). The SAT scores do however add to the power of the school grades—the optimum combination of school grades and SAT can give correlations of more than 0.6, so accounting for about 40 per cent of the variance (Morgan, 1989)[32] . Some now argue that the cost and the undesirable effects of the SATs cannot be justified given that they do not add a great deal to the predictive power of school grades (Crouse & Trusheim 1988). A British attempt to use a version of the SAT to check its predictive value for degree results against that of the UK's A-level examinations showed that it was no better, and that it added very little predictive power when added to the A-level results (Choppin & Orr, 1976).

  13.  Alongside these challenges to the basic justifications of the SATs and similar tests, the last 20 years have seen the emergence in the USA of increasingly severe criticism from teachers and educational researchers who deplore the narrowing and atomisation of learning that follows from the intensive training given to help students increase their SAT scores (see eg Clifford & O'Connor, 1992, Linn 2000). The flavour of this current concern is strongly conveyed in a prophetic piece written by Brigham in 1938, 12 years after he had helped to invest the test :

    If the unhappy day ever comes when teachers point their students towards these newer examinations, and the present weak and restricted procedures get a grip on education, then we may look for the inevitable distortion of education in terms of tests. And that means that mathematics will be completely departmentalised and broken into disintegrated bits, that the science will become highly verbalised and that computation, manipulation and thinking in terms other than verbal will be minimised, that languages will be taught for linguistic skills only without reference to literary values, that English will be taught for reading alone, and that practice and drill in writing of English will disappear.

    (Quoted by Lemann 1999, pp 40-41)

  14.  The SAT always relied for its viability on convincing tertiary institutions of its value for their selection processes, so that as many institutions as possible would require applicants to take the test—and to pay the fees involved. The recent article in the New York Times that is attached herewith reports on a trend for colleges, including some of the most prestigious private colleges, to abandon the requirement that applicants take the SAT, on the grounds that it adds little in predictive validity to school grades, whilst distorting the educational preparation of future students. It must be noted that these decisions are being made in a country which has no national curriculum, and no national systems of testing and certification, such matters being in the hands of the—now 50—states for them to handle in their own, and diverse, ways.

Professor Paul Black and Professor Dylan Wiliam

July 2000


  BOND, L (1989) The Effects of Special Preparation on Measures of Scholastic Ability pp 429-444 in Linn, R L (ed) Educational Measurements (3rd edn) (New York, Macmillan).

  CHOPPIN B & ORR, L (1976) Aptitude Testing at 18+ (Windsor, NFER Publishing Co Ltd)

  COLE, N S & MOSS, PA (1989) Bias in Test Use pp 201-219 in Linn, R L (ed.) Educational Measurement (3rd edn) (New York, Macmillan).

  COLLEGE BOARD (1999). College Board Seniors National Report 1999 (from FairTest web-site

  CROUSE, J & TRUSHEIM, D (1988) The case against the SAT (Chicago, University of Chicago Press).

  GARDNER, H (1993) Multiple intelligences : the theory in practice. New York : Basic Books.

  GIFFORD, B R O'CONNOR, MC (eds.) (1992) Changing Assessments : Alternative Views of Aptitude, Achievement and Instruction. (Boston MA, Kluwer).

  HANSON, F A (1993) Testing Testing : Social Consequences of the Examined Life. Berkeley CA : University of California Press.

  HEUBERT, J P & HAUSER, R M (eds) (1999) High Stakes ; Testing for Tracking, Promotion and Graduation (Washington DC, National Academy Press).

  LEMANN, N (1999) The Big Test (New York, Farrar, Strauss & Giroux).

  LINN, R L (2000). Asessments and Accountability. Educational Researcher. 29(2) pp 4-16.

  MORGAN, R (1989) Analyses of the Predictive Validity of the SAT and High School Grades From 1976 to 1985. College Board Report No 89-7 (New York, College Entrance Examination Board).

  PELLEGRINO, J W, BAXTER, G P & GLASER, R (1999) Addressing the "Two Disciplines" problem: Linking Theories of Cognition with Assessment and Instructional Practice. Review of Research in Education. 24 pp 307-353.

  STERNBERG, R J (1997) Thinking Styles. Cambridge: Cambridge University Press.

29   Not Printed. Back

30   Not Printed. Back

31   Not Printed. Back

32   The correlation coefficients quoted here were (in Morgan's analyses) adjusted to allow for attenuation of range, ie whereas the data available are, of necessity, limited to those who passed the admission hurdles set by the tests, a theoretical model is used to estimate what the correlation would have been if all had been admitted to college regardless of their test score. Back

previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2001
Prepared 8 February 2001