Memorandum submitted by the School of
Computer Science, Cardiff University
This submission from the School of Computer
Science at Cardiff University, is made by staff who are members
of the School's Knowledge and Information Systems Research Group,
whose work provides design research and practical software and
database resources to support the Species 2000 Catalogue of Life,
a UK-led international programme constructing and making available,
in digital form, the first comprehensive listing of all the world's
species of biological organisms (animals, plants, fungi and micro-organisms).
The edition of the Catalogue of Life to be released in April 2008
documents 1.1 million of the estimated 1.8 million known species.
The Catalogue is an essential tool for organising biodiversity
data, improving retrieval and minimising the loss of data which
can occur because of the necessary changes in the names and classification
of organisms as knowledge improves, and is an essential core component
of international biodiversity knowledge organisations such as
GBIF (Global Biodiversity Information Facility) and the Encyclopedia
of Life.
As non-taxonomists ourselves, we wish to emphasise
some of the ways in which systematics and taxonomy research in
the UK is linked to other kinds of research, and to the sources
of information used by scientists and professionals in other disciplines.
We have become associated with the Species 2000 Catalogue of Life
and its Secretariat at the University of Reading through such
linkages. Richard White is a member of the Species 2000 Project
Team (the Executive), and is the Convenor and Andrew Jones is
a member of the Species 2000 Information Systems Group, which
oversees the technical computing aspects of the Catalogue of Life,
such as its adherence to international standards to enhance its
interoperability with other information and knowledge systems.
Alex Gray is a Director of Species 2000 which, although an international
co-operative programme, is registered as a UK not-for-profit organisation
in order to handle matters of finance and ownership.
Response to question 9
There are two interpretations of the phrase
"web-based taxonomy", deriving from the dual meaning
of "taxonomy" as both the science and process of carrying
out taxonomic revisions and also the result of carrying out these
processes on a particular group of organisms, which usually results
in revised and improved classifications. To make a mechanism for
carrying out taxonomic revisions accessible on the Web to those
actually performing it (taxonomists and other providers of the
information they use), is different from and more challenging
than the delivery of the results of a taxonomic revision on the
Web (to scientists, professionals and the general public). It
is important to make this distinction clear in the context of
what is meant by "web-based taxonomy". We will refer
to them as "web-based revision" and "web-based
delivery" respectively.
CARRYING OUT
TAXONOMIC REVISIONS
ON THE
WEB
Web-based revision is in its infancy, and working
taxonomists are not all convinced of its value. But research in
progress shows how it could be done. It has many parallels with
performing other complex collaborative tasks on the Web. It will
be able to make use of principles and practice being developed
in other disciplines, especially in commerce and education. Unlike
web-based blogs, wikis and the like to create what are essentially
simple documents collaboratively, the process of taxonomic revision
requires rigorous recording of "provenance" (the originators,
dates and details of data values, analyses, decisions and changes)
and the ability to back-track to substantiate or reverse past
decisions. The NERC CATE project is beginning to tackle these
issues.
These requirements can in turn be addressed,
for example, by a suite of techniques collectively known as "virtual
organisation" facilities, which are being developed in collaborations
between computer scientists and business and commercial organisations.
The overall goal is to allow partners to discover each other and
work together in a secure Internet environment to achieve more
through their collaboration than they could have achieved separately.
There is much here of potential mutual benefit to taxonomic practitioners
and computer scientists, and this is one of the reasons for the
joint activities of our group with those who are creating and
distributing taxonomic products such as the Catalogue of Life.
In Cardiff, we are involved in initiatives and programmes which
will put in place elements of a system which may make web-based
taxonomic revision widely available in the future.
DELIVERING TAXONOMIC
OUTPUTS ON
THE WEB
At its simplest, web-based delivery of taxonomic
results is a much easier task, and many organisations, projects
and individuals are doing this already. Web pages are much easier
for scientists, other professional users and the general public
to find than the printed publications in which taxonomic revisions
and classifications are traditionally published. The Species 2000
Catalogue of Life is delivering taxonomic outputs (the Catalogue
of Life itself) on the web, at http://www.sp2000.org.
However, there is a translation and packaging
process which is necessary if user communities are to make full
use of the results of taxonomic revisions. This point also addresses
questions 2 and 3 in the request for submissions. What most users
want to use is not the taxonomic revisions themselves but improved
and reliable resources based on them: outputs and services such
as stable nomenclature, checklists and improved classifications
which can be used as the framework for assembling information.
What is important to them is a stable framework of classification
and nomenclature organised and made available on top of the foundation
established by the taxonomic revisions. These resources are often
not created by the taxonomists themselves, but by organisations
such as Species 2000 who understand the need for them and the
data, information and knowledge they will help to organise.
In delivering these outputs and services, the
Catalogue of Life supports an increasing variety of user communities,
and also demonstrates the need for continuing taxonomic and computing
activities to complete them. It provides a consensus view of the
taxonomic outputs which makes them easier to use, by effectively
filtering out the "noise" in the process of delivering
taxonomic summary outputs to the users, so that they need not
consider individually every revision and name change or worry
about whether it is accepted by all taxonomists before they adopt
it.
Despite the vital role of the Species 2000 Catalogue
of Life in helping to organise biodiversity knowledge, it currently
receives little funding for either its data content (filling in
the taxonomic groups which still lack reliable checklists) or
its computing infrastructure; improved techniques and the software
to implement them can accelerate its completion and increase its
usability for many users).
Response to question 10
Both of our interpretations of "web-based
taxonomy" involve processes which encourage the full exposure
to scrutiny that always tends to improve quality, reliability
and user-friendliness. If web-based, every step in the processes
of both taxonomic revision and delivery can be made open and accessible
to scrutiny. Taxonomy should not be seen as an impenetrable process
of preparation carried out by experts before their conclusions
are finally revealed. What makes information useful is not hiding
it away until it is deemed to be complete and finished, but providing
the right access methods to give taxonomists and users the views
of the data that they individually want and can understand, even
while the taxonomists are still working on it. After all, many
taxonomic revisions take a long time, but the data which is being
used by the taxonomists and their preliminary conclusions may
be of use to users, who may even be able to add to them. The intermediate
layer of "resources" between the taxonomic revisions
and the knowledge layers that user communities are building, described
in the previous section, can be seen as a set of tools to provide
the views that the users need.
Openness encourages the development of these
tools. Standards for the various levels of data and computing
interoperability are important to encourage diversity and innovation
in tool development, rather than dependence on one supplier of
tools, and diversity of use of the tools and data will lead to
broad knowledge development.
International organisations, especially GBIF
and the Biological Information Standards organisation (TDWG),
have a vision for the delivery of taxonomic information to users
as part of a complete, organic, dynamic, distributed information
system. This will facilitate the growth of both interpretations
of web-based taxonomy. They have activities and plans for assisting
with the growth of such a system, involving the Catalogue of Life,
and are also encouraging the development of open standards for
data and information exchange and the use of software tools to
create and deliver the resources that users need. This was very
clear at the recent European EDIT project's Symposium "Future
Trends of Taxonomy" and General Meeting in January 2008,
both in the talks and in the corridors. There is a clear and timely
opportunity for the UK to maintain and demonstrate its lead in
these areas, with relatively small amounts of additional funding.
4 February 2008
|