10.The Concordat to Support Research Integrity lists the core elements of research integrity as:
11.Our inquiry focused on understanding and categorising the problem of research or researchers falling short of these expectations—that is, where there is a lack of research integrity. Within this, the written evidence we received encouraged us to distinguish between:
12.We were also encouraged to distinguish between the integrity of researchers and the integrity of the research itself. Indeed, while there have been high-profile examples of deliberate fraud, there are also examples of researchers correcting their own honest errors in order to protect the integrity of the published research base. Professors Lewandowksy and Bishop told us that “errors are unavoidable in any human endeavour”, and that while “in the past, discovery of an error in a scientist’s work was regarded as a source of shame”, it is now recognised that “the sign of a good scientist is one who promptly recognises errors and corrects the public record; this is increasingly seen as a sign of integrity that actually can enhance a person’s reputation”.
Box 1: Questionable Research Practices
Questionable Research Practices (QRPs) include design, analytic, or reporting practices that could be employed with the purpose of presenting biased evidence in favour of an assertion. Examples include selectively reporting hypotheses with a preference for those that are statistically significant, and ‘‘cherry picking’’ data. Other typical QRPs might include rounding down a ‘p value’ (a measure of statistical significance) in order to report a significant result. These practices can occur with or without an intent to deceive; for instance, deliberately excluding an outlier from an analysis could change the conclusions, but this could be done for sound methodological reasons and be reported transparently, or could be employed for the express purpose of turning non-significant result into a significant one.
We were pointed specifically to the practices of ‘p-hacking’ and ‘HARKing’ as examples of QRPs. P-hacking refers to the practice of running multiple tests, looking for a statistic that surpasses the threshold for statistical significance, and reporting only this. The problem is that by running multiple analyses, a researcher will increase the likelihood of finding a statistically significant result by chance alone. For example, if a researcher was studying the relationship between a gene and a set of 20 different personality questionnaires (all filled in by multiple participants) and did not adjust their significance threshold to take into account the fact that they are running so many tests, it would be expected that at least one of the personality questionnaires would have a statistically significant relationship to the gene at the 0.05 level, even if in reality there is no relationship. The likelihood that none of the variables will reach the 0.05 level of significance is (1−0.95)N, where N is the number of measures. So with 10 measures, there is a 40% chance that at least one measure will be ‘significant’; with 20 measures this rises to 64%. There are various ways of correcting for this issue of multiple comparisons.
P-hacking is often coupled with HARKing, i.e. hypothesising after the results are known—here, the researcher invents a plausible-sounding explanation for the result that was obtained, after the data have been inspected.
Sources: Banks et al, , Journal of Business Psychology, 20116; Academy of Medical Sciences, Reproducibility and reliability of biomedical research: improving research practice (October 2015), p20
13.A further threat to the integrity of research is ‘publication bias’—the tendency for positive results to be published over negative or inconclusive findings, and therefore for some research outcomes to go unpublished altogether. Dr Ben Goldacre argued that this distortion of the published evidence base was a more significant issue than academic misconduct:
Fraud is not the most important issue. The culture of incomplete and inaccurate reporting of research has greater impact on patients and society […] [Clinical] Trials are large expensive research projects used to generate knowledge that is then used, in clinical practice, to make vitally important decisions; and yet trials are commonly left unreported, or misreported. This is a waste of money, and distorts the evidence underpinning medical practice.
Indeed, the Concordat to Support Research Integrity (see Chapter 3) notes that “refusing to publish negative research findings” is “harmful to the reputation and quality of UK research, and to the research record”. Publication bias arises particularly in relation to clinical trials, and the drivers of non-publication of results can include commercial interests and the relative value placed on “flashy breakthrough-type results” over other outcomes (see Chapter 4). Our predecessor Committee examined the issue of clinical trials transparency in 2013 and, given the significance of this topic as a public health issue, we will publish a separate report on clinical trials transparency and medical research later this year, drawing on the evidence we received during this inquiry.
14.Universities UK told us that “data and evidence on the scope and prevalence of research misconduct is limited”, and urged caution in attempting to assess the extent of the research integrity ‘problem’. Similarly, Professor Bishop said that “we have a very poor idea of how much [misconduct] is actually going on” and that “it is all very indirect evidence”. The available data comes from five sources, which are discussed below:
15.The written evidence we received frequently cited a 2009 meta-analysis of international surveys as an insight into the prevalence of research misconduct. This analysis found that globally about 2% of scientists had falsified data at least once in their career, with around a third admitting to other questionable research practices. The main survey-based insight into UK research integrity comes from work by the Nuffield Council on Bioethics in 2014. UK researchers were asked about the temptations or pressures to compromise research integrity and standards, rather than the extent to which these occurred. Nevertheless, 58% of respondents reported that they were aware of scientists feeling tempted to compromise on research integrity, with 26% themselves feeling tempted or under pressure.
16.Earlier this year, a survey conducted by the journal Nature found that, of the 2,632 ‘non-PI’ researchers responding (i.e. those who are not the ‘principal investigator’ for a research project), just 43% felt that their research group ‘never’ or ‘rarely’ condones research practices that ‘cut corners’, such as valuing speed over quality, or fundability over accuracy. The survey also identified a small subgroup of 376 non-PI researchers (one in seven) who were consistently negative about their lab culture, 70% of whom said that in the previous 12 months they had ‘often’ or ‘occasionally’ felt pressured to produce a particular result.
17.Problems with research can lead to a journal article being retracted (see Chapter 5), and the rate at which articles are retracted is increasing. A 2012 study of journal retractions around the world found that the number of retractions per year had increased by a factor of 19 between 2001 and 2010. Even after adjusting for the growth in the published literature during the period, there had been an 11-fold increase in the retraction rate.
18.We received many submissions which discussed how to interpret the increase in retractions. Retraction Watch, an American website documenting retractions and corrections, told us that the rise in retractions reflected several factors, including “a greater willingness of journals to withdraw problematic papers; a growing reliance on software tools to detect plagiarism; and more attention to manipulated or otherwise inappropriate images”. Similarly, a 2013 academic study of retractions concluded that:
The rising number of retractions is most likely to be caused by a growing propensity to retract flawed and fraudulent papers, and there is little evidence of an increase in the prevalence of misconduct. Statistics on retractions and findings of misconduct are best used to make inferences about weaknesses in the system of scientific self-correction.
19.Some of our witnesses suggested that an increase in journal article retractions should be seen as a positive indicator of increased detection of problems, and noted that the reasons for retracting a paper included honest error. As Dr Elizabeth Moylan, representing the Committee on Publication Ethics, put it:
If something has gone wrong with the experiment, or, oops, the sampling was wrong and not quite what somebody anticipated, the publisher has a duty of care to make that correction or retraction as they see fit. It is not necessarily all bad. The way research operates is inherently messy; mistakes happen. We have to be comfortable with that. We are all human. How we fix it and make that transparent is the key.
20.Nevertheless, Retraction Watch noted that “we cannot rule out the possibility that scientists are more willing to commit misconduct”. Dr Ivan Oransky, co-founder of Retraction Watch, told us that around half of retractions were due to misconduct, and that there had been around 400 journal article retractions in the UK since 1977. He told us that the retraction rate was “doubling every few years”, but that this was also the case globally. Retraction Watch calculated that in the UK there were 0.75 retractions per billion US dollars spent on research, compared with 0.44/$bn globally. This suggests that retractions are a rare event in the context of research spending. The higher UK rate could reflect a lower cost of research programmes or a higher number of papers per grant rather than a higher propensity for retractions.
21.A 2016 study of ‘image manipulation’ in journal articles in the fields of microbiology and immunology, cancer biology, and general biology revealed an increasing trend in this practice. Researchers reviewed over 20,000 biomedical research papers published in 40 scientific journals from 1995 to 2014 and found that “3.8% of published papers contained problematic figures, with at least half [of those] exhibiting features suggestive of deliberate manipulation”. The study also found that “the prevalence of papers with problematic images has risen markedly during the past decade”.
22.We asked Damian Pattinson, of Research Square, about what this might mean in the context of research integrity. He explained that altering an image was a common practice, which could be done for legitimate reasons, but that this needed to be made clear in the research paper:
Authors tend to touch up images fairly frequently. It is very rarely deliberate misconduct, but the rate at which figures are tweaked a little bit to make them look nicer is remarkably high. […] The vast majority of it is just adjusting the contrast a bit to try to make something a bit clearer. […] If you are looking at a field of cells and the pieces you are interested in are at the far sides of the picture, you might try to condense the middle, for example. It is not deliberately misleading, but […] if you have chopped out the middle, for example, you do not know whether that middle had something they wanted to hide, or whether it was just a blank space they wanted to cover up. […] As long as you say what you have done in any kind of manipulation you have performed, it is reasonable. General practice now is that you either make it clear with a line or you state in your legend exactly what has happened.
23.The ability to recreate research findings could provide some indication of the reliability of the research. A submission from the medical journal the BMJ outlined the difference between ‘replication’ and ‘reproducibility’ in this context:
Replication—where multiple investigators aim to yield the same findings using independent data, analytical methods, laboratories, and instruments—does remain the gold standard in laboratory and other non-human research […] Reproducibility means that independent investigators do not try to rerun a whole study, but they subject the original data to their own analyses and interpretations. This can verify the published findings but can also—crucially—extend and add to them through new research, hence making the most of the money and effort that researchers, clinicians, patients, funders, and the public put into the original study.
The journal also explained that the extent to which replication was possible varied by discipline and research methods:
In clinical and public health research, replication is often impossible and the best we can hope for is that research is reproducible. […] In principle, replication is also desirable in epidemiological studies (such as large population-based observational studies) particularly when they affect health policy or regulators’ decisions about drugs and other treatments. But the long term, complex, and extensive nature of such research means that replication would all too often require many years and considerable new funding. For studies with patients and populations reproducibility is a much more attainable and affordable standard.
24.According to a 2016 survey conducted by the journal Nature, more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and 52% of researchers believe that there is “a significant reproducibility crisis”. On the other hand, “less than 31% think that failure to reproduce published results means that the result is probably wrong, and most say that they still trust the published literature”.
25.Our witnesses had a range of views on how to interpret the reproducibility ‘crisis’ in the context of research integrity. Professor Lewandowksy and Professor Bishop explained that “irreproducible research may reflect a lack of integrity of researchers, for example through manipulation or invention of data to achieve a particular result, or cherry-picking of results or literature analyses to hide unwanted or uninteresting results”, but that irreproducible results may also be produced by honest researchers “because they are poorly trained or are using methods that they do not fully understand”. Catriona Fennell, representing the Publishers Association, advised that
It is important to make a distinction between reproducibility to some extent and research integrity; it would be damaging if there was a perception that, because your work could not be replicated, you did something unethical. That could be the case for a very small percentage, but normally it is not. It could be for another reason outside your control; it could be an antibody that was not as stable as you would like. Some of it could be very much improved with education and more focus on transparent methodology.
We return to the issue of reproducibility in Chapter 4, in the context of ensuring research methods are adequately described in order to assist researchers looking to reproduce the same results.
26.One of the recommendations of the Concordat to Support Research Integrity is that employers of researchers should produce an annual statement on the number of misconduct investigations, although not all universities comply with this (see Chapter 3). We reviewed the available statements for 2015/16, along with other information provided in response to our survey of UUK members, and found that 51 universities undertook no investigations that year. This figure may reflect in part the fact that universities vary in size and in the volume of research activity they undertake. Reflecting what we were told about journal article retractions, Sheffield Hallam University suggested that “rises in allegations and cases of research misconduct are healthy signs of research communities that appreciate the importance of research integrity and that are beginning to police themselves”, and that “given human fallibility, universities and other research institutions where there are consistent nil returns are of more concern”. A lack of published information makes it difficult to assess exactly how many UUK members consistently report ‘nil returns’ over several years, although 20 universities told us that they had not undertaken any investigations since the Concordat was signed.
27.Dr Elizabeth Wager estimated that “most research institutions should expect at least one investigation every year, and those with many thousands of researchers should expect to perform several. Lack of investigations should not inspire confidence, but may indicate that institutions prefer not to address issues properly and would rather ignore them”. Indeed, Dr Peter Wilmshurst, who himself has acted as a whistleblower in several cases, argued that “Universities and journals are […] as likely to admit the full magnitude of research misconduct as church leaders are to confess the extent of child abuse by priests”. Similarly, James Parry, the Chief Executive of the UK Research Integrity Office (see Chapter 4), warned that:
If an institution reported year after year that it never had any allegations, I would be somewhat sceptical about whether that figure was accurate. […] If the number is zero consistently, you may need to look at your practices and overall research culture.
Dr Patrick Vallance, the Government’s Chief Scientific Adviser, agreed:
Would I be concerned if universities over a long period reported zero [investigations]? I think it would be odd. It is unusual to have nothing at all.
28.The available data on misconduct investigations suggest that serious research misconduct is rare, but it is impossible to be certain without better data. There is a mismatch between the number of investigations and the scale of reported temptations to compromise on research standards, the ‘reproducibility crisis’ in some disciplines, the growth in journal article retraction rates, and trends in image manipulation. We hope that most researchers will never succumb to the temptations to compromise on research standards, and some of these trends may be the product of increased detection and correction of honest errors. Nevertheless, it is worrying that there seem to be so few formal research misconduct investigations conducted by universities. Increases in the number of investigations should be seen as a healthy sign of more active self-regulation. Further work is needed to determine the scale of the problem.
16 Office of Research Integrity, , (accessed 10 May 2018)
17 [Professor Dorothy Bishop]
18 Professor Stephan Lewandowsky and Professor Dorothy Bishop () para 20
19 Academy of Medical Sciences, Reproducibility and reliability of biomedical research: improving research practice (October 2015), p20
20 Dr Ben Goldacre ()
22 [Professor Dame Ottoline Leyser]
25 Fanelli, D., , PLoS One, 29 May 2009
27 Richard Van Noorden, , Nature, 557, 294–296 (2018)
28 Richard Van Noorden, , Nature, 557, 294–296 (2018)
29 Grienesen, M.L. and Zhang, M. “”, PLoS One, October 2012
30 Retraction Watch and The Center For Scientific Integrity ()
31 Fanelli, D., , PLoS Medicine, 2013
32 [Dr Wager]
33 Universities UK () para 25
38 Bik EM, Casadevall A, Fang FC., “”, mBio (2016)
40 BMJ ()
41 BMJ ()
42 Monya Baker, , Nature, 2016
43 Monya Baker, , Nature, 2016
44 Professor Stephan Lewandowsky and Professor Dorothy Bishop () paras 13–14
46 See Annex 1, and ()
47 Sheffield Hallam University () para 1
48 Dr Elizabeth Wager () para 2.3
49 Dr Peter Wilmshurst () para 6
Published: 11 July 2018