Memorandum submitted by Professor Darrel
Ince (CRU 34)
I am Professor of Computing at the Open University
and the author of 18 books and over 100 papers on software topics.
My submission to the Committee is an expanded version of an article
that I wrote for the Guardian and was published on 5 February
2010.
1. First a disclosure: I am not a fan of
computer modelling. However, most of the modelling work that has
been carried out is in a sense irrelevant in that there is plenty
of evidence that the earth is changing and that a potential result
of this could be cataclysm. Because of the high stakes I support
some of the efforts to bring our planet back to what it was 40
years ago.
2. My favourite quote about science is by
Karl Popper: almost certainly the most influential philosopher
of science to this day
"Every intellectual has a very special responsibility.
He has the privilege and opportunity of studying. In return, he
owes it to his fellow men (or `to society') to represent the results
of his study as simply, clearly and modestly as he can. The worst
thing that intellectuals can dothe cardinal sinis
to try to set themselves up as great prophets vis-a-vis their
fellow men and to impress them with puzzling philosophies. Anyone
who cannot speak simply and clearly should say nothing and continue
to work until he can do so."
3. This is one of the reasons why I feel
strongly about one or two of the issues you will be considering.
4. One of the spin-offs from the emails
that were leaked from the Climate Research Unit at the University
of East Anglia is the light that was shone on the role of program
code in climate research. There is a particularly revealing set
of emails that were produced by a programmer at UEA known as Harry
ReadMe. The emails indicate someone struggling with undocumented,
baroque code and missing data which forms part of one of the three
major climate databases used by researchers throughout the world.
5. A number of climate scientists have refused
to publish their computer programs; what I want to suggest is
that this is both unscientific behaviour and, equally importantly
ignores a major problem: that scientific software has got a poor
reputation for error.
6. There is enough evidence for us to regard
a lot of scientific software with worry. For example Professor
Les Hatton, an international expert in software testing resident
in the Universities of Kent and Kingston, carried out an extensive
analysis of several million lines of scientific code. He showed
that the software had an unacceptably high level of detectable
inconsistencies. For example, interface inconsistencies between
software modules occurred at the rate of one in every seven interfaces
on average in the programming language Fortran, and one in every
37 interfaces in the language C. This is hugely worrying when
you realise that just one errorjust onewill often
invalidate a computer program. What he also discovered, even more
worryingly, is that the accuracy of results declined from six
significant figures to one significant figure during the running
of programs.
7. Hatton and other researchers' work indicates
that scientific software is often of poor quality. What is staggering
about the research that has been done is that it examines scientific
software that is commercial: produced by software engineers who
have to undergo a regime of thorough testing, quality assurance
and a change control discipline known as configuration management.
Scientific software developed in our universities and research
institutes is often produced by scientists with no training in
software engineering and with no quality mechanisms in place and
so, no doubt, the occurrence of errors will be even higher. The
Climate Research unit Harry ReadMe files are a graphic indication
of such working conditions.
8. Computer code is also at the heart of
a scientific issue. One of the key features of science is deniability:
if you erect a theory and if anyone produces evidence that it
is wrong then it falls. This is how science works: by openness,
by publishing minute details of an experiment, some mathematical
equations or a simulation; by doing this you embrace deniability.
This does not seem to have happened in climate research. Researchers
have refused to release their computer programseven though
they are still in existence and not subject to commercial agreements.
For example, Professor Mann's initial refusal to give up the codes
that were used to construct the hockey stick model that demonstrated
that human-made global warming is a unique artefact of the last
few decades (He has now released all his code).
9. The situation is by no means bad across
academia: most academics release code and data. Also, a number
of journals, for example those in the area of economics and econometrics,
insist on an author lodging both the data and the programs with
the journal before publication. There's also an object lesson
in a landmark piece of mathematics: the proof of the four colour
conjecture by Apel and Haken. They showed that in a map the regions
can be coloured using at most four colours so that no two adjacent
regions have the same colour. Their proof was controversial in
that instead of an elegant mathematical exposition they partly
used a computer program. Their work was criticised for inelegance,
but it was correct and the computer program was published for
checking.
10. The problem of large-scale scientific
computing and the publication of data is being addressed by organisations
and individuals that have signed up to the idea of the fourth
paradigm. This was the idea of Jim Grey, a senior researcher at
Microsoft, who identified the problem well before the Climategate
affair. There is now a lot of R and D work going into mechanisms
whereby the web can be used as a repository for scientific publications
and more importantly the computer programs and the huge amount
of data that they use and generate. A number of workers are even
devising systems that show the progress of a scientific idea from
first thoughts to the final published papers. The problems with
climate research will do doubt provide an impetus for this work
to be accelerated.
11. I believe that, if you are publishing
research articles that use computer programs, if you want to claim
that you are engaging in science, the programs are in your possession
and you will not release then you are not a scientist; I would
also regard any papers based on the software as null and void.
There are of course some exceptions which would apply both now
and in the past and would excuse many of those who have refused
to release code and will in the future refuse: for example, a
scientist may have a commercial agreement with some body for the
whole software, or part of the code is commercial; another issue
which complicated Prof Mann's position is that of intellectual
property rights. Another issue is the fact that developing software
is hard to do and considerable effort goes into it. There should
be a period in which it is not released so that a researcher can
make the most of its efforts by, for example, publishing more
papers. Steve Schneider of MIT has suggested two years.
12. There are a number of ways that this
can be enforced: by journals insisting that code and data be lodged
with them; by the research councils insisting that as a condition
of granting research funds that all data and software be lodged
somewhere and a failure to do this would result in no further
funding while this occurs; and our universities making it a clause
in an academic's terms and conditions that lodging data and software
should occur.
13. I would be happy to meet the committee.
February 2010
|