Peer review

Written evidence submitted by Donald Gillies, Emeritus Professor, University College London (PR 22)

1. In my academic career, I started as a graduate student in Professor Sir Karl Popper’s department at the London School of Economics in 1966, and retired as Professor in University College London in 2009. During this time, I continuously carried out research in history and philosophy of science and mathematics – publishing 9 books and numerous articles on the subject. I also edited a leading academic journal in the field (The British Journal for the Philosophy of Science). My research, and my practical experience with the peer review system, as editor, author, and reviewer, gradually convinced me that peer review is highly defective. Historically there are innumerable examples of what we now regard as major advances in science and mathematics which were judged at the time by the researcher’s peers to have no value. Philosophically one can explain why this was so. The net effect of an extensive use of peer review is to stifle innovation, and hold up the progress of science. So my recommendation would be to eliminate the use of peer review as much as possible. In this submission I will summarise my main arguments against peer review. I also believe that it would be easy to eliminate most of the present use of peer review. However, I cannot, for reasons of space, give here the details of how this might be done. They are to be found in my 2008 book, Part 3, pp. 63-130.

Defects in Peer Review

2. Peer review means that the value of a researcher’s work is judged by a group of researchers working in the same field – the ‘peers’ of the given researcher. There is, however, a major problem with peer review. A study of history shows that it can in some cases go very wrong. It can happen that the majority of contemporary researchers in a field can judge as worthless a piece of research which is later, with the benefit of historical perspective, seen as constituting a major advance. In my book (Gillies 2008), I consider in detail several examples of major research advances which were judged by contemporary researchers to be valueless. Here I will briefly mention two examples.

3. The first is Frege’s introduction of modern mathematical logic, which has become an essential tool for computers. Frege published his new system of logic in 1879 in a short monograph entitled: Begriffsschift (Concept Writing), Historians of logic give very favourable judgements on this work. For example Bochenski in 1962 wrote (Gillies, 2008, p. 15):

‘Among all these logicians, Gottlob Frege holds a unique place. His Begriffsschift can only be compared with one other work in the whole history of logic, the Prior Analytics of Aristotle.’

In the same year William and Martha Kneale wrote (Gillies, 2008, p. 15):

‘Frege’s Begriffsschrift is the first really comprehensive system of formal logic. … 1879 is the most important date in the history of the subject.’

4. These then are the judgements of later historians, but what were the views of Frege’s peers, i.e. contemporary researchers in the field of logic? We know what these were because there were 6 reviews of the Begriffsschrift, and the general consensus of these reviewers, who included some of the best logicians of the day, was that the Begriffsschrift contained nothing of any value. For example Hoppe wrote (Gillies, 2008, p. 16):

‘ … we doubt that anything has been gained by the invented formula language … .’

Venn wrote (Gillies, 2008, p. 18):

‘ … it does not seem to me that Dr. Frege’s scheme can for a moment compare with that of Boole. … Dr Frege’s system … seems to me cumbrous and inconvenient.’

5. The second example is Semmelweis’ research into the causes of puerperal fever which led him to recommend the introduction of antiseptic precautions in hospitals, such as washing the hands with antiseptic.

6. The hand washing recommended by Semmelweis, is now absolutely standard in hospitals. Medical staff have to wash their hands in antiseptic soap (hibiscrub), and there is also a gelatinous substance (alcogel) which is squirted on to the hand. Naturally a doctor’s hands must be sterilised in this way before examining any patient – exactly as Semmelweis recommended. Recently new regulations have been introduced in hospitals in the UK requiring visitors also to wash their hands in disinfectant.

7. This then is the modern point of view, but how did Semmelweis’s contemporaries react to his new theory of the cause of puerperal fever and the practical recommendations based on it? The short answer is that Semmelweis’s reception by his contemporaries was almost exactly the same as Frege’s. Semmelweis did manage to persuade one or two doctors of the truth of his findings, but the vast majority of the medical profession rejected his theory and ignored the practical recommendations based upon it. Here I will only mention one typical reaction. After Semmelweis had made his discovery in 1848, he and some of his friends in Vienna wrote about them to the directors of several maternity hospitals. Simpson of Edinburgh replied somewhat rudely to this letter saying that its authors obviously had not studied the obstetrical literature in English. Simpson was of course a very important figure in the medical world of the time. He had introduced the use of chloroform for operations, and had recommended its use as a pain-killer in childbirth. His response to Semmelweis and his friends is very similar in character to Venn’s review of Frege’s Begriffsschrift.

8. Many further examples of this sort could easily be given, and I will describe one recent example in the next section. To make matters worse, what the study of history shows is that peer reviews most often go wrong for the really important research advances. Suppose a researcher makes a small, but competent, advance of a routine kind. Peer reviews in such circumstances will usually be able to give his or her work a reasonable evaluation. When, however, a researcher makes an advance which is later seen as a key innovation and a major breakthrough, peer review may very well judge it to be absurd and of no value. How is this strange phenomenon to be explained? It is to the credit of the philosophy of science that it can provide a very convincing explanation.

Explanation of the Defects in Peer Review. Paradigms and Research Programmes

9. So how is it possible for peer reviews to go so wrong, and to judge as worthless what are later seen as major advances in the subject? At first it may seem paradoxical that this should occur. After all, the peers, who do the reviewing, are all experts in the field and active researchers. Surely they, of all people, should be able to recognise good research when they see it. Despite the apparent strangeness of this situation, the reasons why it occurs can in fact be quite well explained using ideas from the philosophy of science, more specifically using Kuhn’s paradigms, and Lakatos’ research programmes.

10. According to Kuhn, in a scientific revolution a paradigm which previously dominated the field is replaced by a new paradigm. It is a consequence of Kuhn’s theory that any researcher who puts forward a revolutionary new view is likely to be judged harshly by his peers. This explains why the big innovations of figures like Frege and Semmelweis were judged so harshly by their contemporaries, and also why the contemporary criticisms of their views now seem to us so absurd. The work of such figures appears to us to be a major advance because we have been trained in, and implicitly accept, the new paradigm, while to the original peer reviewers it appeared to be absurd because they had been trained in and implicitly accepted the old paradigm.

11. However, the failure of peer review need not be exclusively associated with scientific revolutions, and paradigm shifts. It can occur in what Kuhn calls ‘normal science’ as well. To see this, let us suppose that research is being carried out on some problem and that four different research programmes have been proposed to solve it. We can further suppose that all four of the programmes are compatible with the dominant paradigm, so that we are not dealing with revolutionary science. It may be almost impossible to say at the beginning which of the four programmes is going to lead to success. Suppose it turns out to be programme number 3. Let us suppose further (which indeed is often the case) that initially programme 3 attracts many fewer researchers than programmes 1, 2 & 4. Now it is characteristic of most researchers that they think their own approach to the problem is the correct one, and that other approaches are misguided. If a peer review is conducted by a committee whose researchers are a random sample of those working on the problem, then the majority will be working on programmes 1, 2 & 4, and are therefore very likely to give a negative judgement on programme 3. As the result of the recommendation of such a peer review, funding might be withdrawn from programme 3, and the solution of the problem might remain undiscovered for a long time.

12. An important principle emerges from this, namely that research assessment based on peer review, is likely to concentrate funding on the most popular, or mainstream, research programmes, while withdrawing funding from, and sometimes closing down altogether, minority research programmes on which few researchers are working. Actually the Kuhnian examples of revolutionary science and paradigm shifts are only a special case of this principle. In a scientific revolution, the scientists who introduce a new research programme based on a new paradigm, or who are among the first to start working on it, will almost certainly be a minority within the research community of the time.

13. Now sometimes the mainstream research programme cracks the problems being tackled. Sometimes, however, it is a very minority research programme only adopted by a few researchers which leads to the major advances. A recent example of this is the discovery that a form of cervical cancer is caused by a preceding infection by the papilloma virus. In 2008, Zur Hausen was awarded the Nobel prize for this discovery. In the research which led to the discovery, however, the majority of researchers favoured the view that the causal agent for cervical cancer was a herpes virus and not a papilloma virus. Zur Hausen was one of the very few who favoured the papilloma virus. The dominance of the herpes virus approach is shown by the fact that, in December 1972, there was an international conference of researchers in the area at Key Biscayne in Florida, which had the title: Herpesvirus and Cervical Cancer. Zur Hausen attended this conference and made some criticisms of the herpes virus approach (cf. Goodheart, 1973, p. 1417). It is reported that the audience listened to zur Hausen in stony silence (Mcintyre, 2005, p.35). The summary of the conference written by George Klein (Klein, 1973) does not mention zur Hausen. Clearly at that time, peer reviews of zur Hausen’s research programme would not have been very favourable, although in the long run zur Hausen proved to be right.

Type 1 and Type 2 Errors. Throwing away the Pink Diamonds

14. Let us next examine the effects of an extensive use of peer review for research evaluation such as occurs today. To do so, it will be useful to make a distinction which is analogous to one made in the theory of statistical tests. Statistical tests are said to be liable to two types of error (Type 1 error, and Type 2 error). A Type 1 error occurs if the test leads to the rejection of a hypothesis which is in fact true. A Type 2 error occurs if the test leads to the confirmation of a hypothesis which is in fact false. Analogously we could say that a research evaluation commits a Type 1 error if it leads to funding being withdrawn from a researcher or research programme which would have obtained excellent results had it been continued. A research evaluation commits a Type 2 error if it leads to funding being continued for a researcher or research programme which obtains no good results however long it goes on. This distinction leads to the following general criticism of research evaluation based on peer review. Research evaluation based on peer review concentrates exclusively on eliminating Type 2 errors. The idea is to make research more cost effective by withdrawing funds from bad researchers and giving them to good researchers. No thought is devoted to the possibility of making a Type 1 error, the error that is of withdrawing funding from researchers who would have made important advances if their research had been supported. Yet the history of science shows that Type 1 errors are much more serious than Type 2 errors. The case of Semmelweis is a very striking example. The fact that his line of research was not recognised and supported by the medical community meant that, for twenty years after his investigation, thousands of patients lost their lives and there was a general crisis in the whole hospital system.

16. In comparison with Type 1 errors, Type 2 errors are much less serious. The worst that can happen is that some government money is spent with nothing to show for it. Moreover Type 2 errors are inevitable form the very nature of research. We can see this by considering again the example involving competing research programmes, introduced in paragraph 11. Suppose research is required on some problem, and there are four different approaches to its solution which lead to four different research programmes. It may be almost impossible to say at the beginning which of the four programmes is going to lead to success. Suppose it turns out to be research programme number 3. The researchers on programmes 1, 2 & 4 may be just as competent and hard-working as those on programme 3, but, because their efforts are being made in the wrong direction, they will lead nowhere. Suppose programme 3 is cancelled in order to save money (Type 1 error), then all the money spent on research in the problem will lead nowhere. It will be a total loss. On the other hand if another programme (5) is also funded, the costs will be a bit higher but a successful result will be obtained. This shows why Type 1 errors are much more serious than Type 2 errors, and why funding bodies should make sure that some funding at least is given to every research school and approach rather than concentrating on the hopeless task of trying to foresee which approach will in the long run prove successful.

17. It is sometimes difficult to keep in mind the exact distinction between a Type 1 and a Type 2 error. This task will, I think be made easier by introducing an analogy. Suppose we have a system to separate flawed diamonds, which have little value, from clear diamonds which are valuable. This system, let us suppose, works very efficiently in eliminating worthless flawed diamonds, but then it turns out to have a crucial defect. As well as eliminating the flawed diamonds, it eliminates the pink diamonds, and pink diamonds have a value between five hundred and a thousand times greater than that of the ordinary (white) clear diamonds. Once our system had been found to have this defect by diamond producers, they would hastily stop using it. My claim is that research evaluation based on peer review has exactly the same defect. It is liable to throw away the pink diamonds. The pink diamonds in this case are characters like Frege, Semmelweis, Zur Hausen, or, more generally, researchers working on minority research programmes, which are unpopular for the moment, but destined to yield brilliant results in the future. Such people are at risk, in research evaluation based on peer review, of being unable to publish, or of having their funding or research time reduced, or cut off completely, thereby holding up the progress of knowledge.

Support from Nobel Prize Winners

18. My criticisms of peer review are based mainly on the study of the history of science and mathematics, but they are fortunately supported by many Nobel Prize winners. A notable case is Sir James Black who won his Nobel Prize for the discovery of two of the most successful blockbuster medicines that the pharmaceutical industry ever developed. He read my 2008 book when it came out, and we discussed it in March 2009. He told me that he thought he could not have made his discoveries in the present system dominated by peer review. In fact he had earlier declared: "The anonymous peer review process is the enemy of scientific creativity. … Peer reviewers go for orthodoxy." (Black, 2009)

Declaration of Interests

19. I have no commercial interests relating to the question of peer review.

References

Black, Sir James (2009) Interview by Andrew Jack (An acute talent for innovation), Financial Times, 2 February 2009.

Gillies, D. (2008) How Should Research be Organised? College Publications.

Goodheart, C.R. (1973) Summary of informal discussion on general aspects of herpesviruses, Cancer Research, 33(6), p. 1417.

Klein, G. (1973) Summary of Papers Delivered at the Conference on Herpesvirus and Cervical Cancer (Key Biscayne, Florida), Cancer Research, 33(June 1973), pp. 1557-1563.

McIntyre, P. (2005) Finding the viral link: the story of Harald zur Hausen, Cancer World, July-August, pp. 32-37.

Professor Donald Gillies

3 March 2011