Examination of Witnesses (Questions 1
- 19)
WEDNESDAY 8 NOVEMBER 2000
MR IAIN
FINDLAY, MR
ANDY MOONIE,
MR ANGUS
MACCORMICK,
MR LAURENCE
KING AND
MR NICK
EWING
Chairman
1. Gentlemen, can I welcome you most warmly
to the Committee and ask you if would identify yourselves for
the record?
(Mr Findlay) I am Iain Findlay, National Officer of
the Institution of Professionals, Managers and Specialists.
(Mr Moonie) I am Andy Moonie, I am Branch
Secretary of the Telecoms Branch of the Institution of Professionals,
Managers and Specialists.
(Mr MacCormick) I am Angus MacCormick, I am Branch
Secretary of Air Traffic Control Officers, of the Institution
of Professionals, Managers and Specialists.
(Mr King) I am Laurence King, I am Assistant Branch
Secretary of Air Traffic Control Officers of the Institution of
Professionals, Managers and Specialists.
(Mr Ewing) I am Nick Ewing from the Institution of
Professionals, Managers and Specialists.
2. Thank you. Gentlemen, those of you who have
been before know that these rooms do rather eat sound. Although
the microphones are in front of you they are to record what you
say, they do not project your voices. I am afraid you are going
to have to break a habit of a lifetime and raise your modulated
tones. Can I ask you first, Mr Findlay, do you have any general
remarks you would like to make?
(Mr Findlay) We have some general remarks, and I will
give a copy to the Clerk at the end. NATS operate systems which
are both safety critical and safety related. Increasingly, as
technology and automation advances, air traffic systems will come
to be classed as safety critical rather than safety related. This
requires a new approach, both for the design and technical management
of those systems and the training needs of the staff who maintain
and support these systems. Of particular concern to us is the
concentration of expertise on the current LATCC system on small
numbers of people. The loss of key experienced personnel, either
to new career opportunities within the company, or elsewhere,
causes a potential risk to the continued ability to maintain operational
services. We are currently aware of difficulties in recruiting
and retaining engineering staff, especially, given the buoyant
nature of the computing and telecoms jobs market. The desire of
staff to work on new technology, rather than the legacy systems
that we have, and the wish for a long-term career makes continuing
employment at West Drayton unattractive for operational engineers.
In order to protect its ability to maintain these services NATS
must consider some new ways to address this problem. It should
be said, however, that when the recovery from failures has come
about this was a demonstration of both the professionalism and
dedication of staff in all areas. Indeed it is a tribute to the
NATS staff that such failures remain a remarkably rare occurrence.
The failure of the NATS computer system which carries out flight
data and radar data processing can present a safety issue, although
there are alternative systems available to provide services, such
as radar data processing and display. The hardware of the NATS
system was significantly upgraded in the late 1980s after it had
become unreliable. The central processes are in the process of
a further upgrade to more modern replacements. The combination
of flight plan and radar data processing which a NATS system provides
enables the tracking of aircraft and flight progress estimates
to be determined. Automatic data links between the NATS system
at LATCC, airports and other air traffic centres, both domestic
and international, provide a mechanism to exchange flight plan
data and notification of aircraft movements. These automatic methods
allow current ATC system capacity to be attained. Manual methods
used as a fallback during system failure are both time-consuming
and increase the risk of error. In addition, in the tools to detect
potential separation loss and alert control there is a conflict
alert in the NATS computer system and they are, therefore, unavailable
at a time when, of course, they would be most beneficial. In an
environment with high and growing traffic levels system failures
of any type during peak hours present increased risks to the normal
handling of traffic in a safe and expeditious manner. Although
aircraft departures can be suspended or reduced to alleviate this
immediately following failure there is an impact on the workload
of all operational staff. These impacts on the control of workload
increase the likelihood, however small, that an ATC error may
occur with possibly disastrous consequences. In particular, problems
were experienced with aircraft call sign allocations during the
June failure. I am sure that the Committee want to ask us questions
on that.
3. The matters you are referring to are tremendously
important and are of real concern to the Committee. We will want
to go into some detail on some of the points you have already
made. If everybody agrees then we can take it that one voice will
be sufficient. I will leave it to you designate which of your
colleagues you want to do the work. Tell me about this National
Airspace System, why did it fail on 17th June?
(Mr Moonie) As far as I can tell, I am not a particular
expert on that system, I speak for engineers as a whole, we understand
that it was some software problem. I am sure my colleagues could
go into that in more detail.
4. Mr Moonie, I need a lot more voice. I do
not mind you not understanding it but it is very wearing if I
do not.
(Mr Moonie) We understand it is a combination of software
errors and multiple adaptation errors and restarts to the processors.
After a number of these occurrences
Mr Olner: I am not getting this, it is a bit
quick and it is still a little bit soft.
Chairman
5. Nice simple words, Mr Moonie. How did the
thing collapse and what did it do?
(Mr Moonie) There were software errors, which are
what we call restarts to the processors.
6. Had they not been in the system before?
(Mr Moonie) That is not an easy question for me to
answer as I do not work on that system. We are in an environment
where the systems are being modified continually to cope with
both the increase of traffic and in order to be able to bring
on the introduction of Swanwick. In order to be able to do that
we they need to modify the current system. With the volume of
change happening at the moment we understand that that can lead
to some of these errors occurring.
7. You thought it might be possible. You thought
before 17th June there might be the possibility of a collapse,
is that true?
(Mr Moonie) I would not have said that. There is always
a possibility of software errors in the complex systems we have
nowadays. Our members working in NATS spend a lot of time testing
modifications. It is such a complex area. There is no testing
of a system like real life testing.
8. Your instinct would tell you that no system
is totally perfect. Because of the extra pressure being put on
it and because of the changes that were taking place it was quite
possible a failure would take place.
(Mr Moonie) There is always a possibility.
9. Were particular precautions taken in case
that should happen?
(Mr Moonie) I could not answer that.
10. What did this do to the staff, this particular
failure on 17th June, were the pressures that were imposed on
them then acceptable?
(Mr Moonie) It was a failure which occurred at the
weekend, outside normal office hours. Although we had the frontline
maintenance personnel there they did not have as much support
as they would have had if it was during Monday to Friday office
hours. Therefore, there was a certain amount of pressure on them.
11. Just a moment, you are not telling me, I
hope I am misunderstanding, the back-up systems are not there
over the weekend because everyone goes home on a Friday night?
I really will stop flying. I am not imaginative but some things
do put the fear of God in me.
(Mr Moonie) There are engineering personnel who work
a 24 roster system to cover and respond to system failures, and
they did that. During Monday to Friday the people who are particularly
expert in their systems would have been available, because that
is their normal working hours, and that would be an extra resource.
12. You are saying it would be the software
people that would not be available.
(Mr Moonie) Yes. The difficulty you have when you
have a major problem of the nature that occurred in June was that
everybody, whether they are within West Drayton or outside, would
be telephoning up to say, "What is the problem with the system,
when is it going to be restored?" It is very difficult to
deal with all of that pressure with the continuous ringing of
telephones and getting on with fixing the system.
13. It also has an effect on the speed at which
you work, if you are slowing down because you are going back to
older systems that rely on people writing things down on a paper
based system?
(Mr King) I think you are right, Madam Chairman, as
that system failed, I think it was a progressive failure, you
are unable as an air traffic controller to continue with your
normal operation. You get intermittent or even loss of strip production,
which is absolutely vital. In air traffic control you have to
have that information when you need it.
14. It becomes a cumulative matter, you are
already slower because you are using a paper based system and
the sheer bulk of work building up means that the system is less
able to cope as you go on.
(Mr King) Absolutely. One of the other things you
can possibly lose as well is call signs on radar screens, which
is a very difficult thing. When you only get the raw squawk, which
is the numbers from the aircraft, that can cause a rapid increase
in the work load for air traffic controllers. When you start to
lose the strip reduction
15. What is a raw squawk?
(Mr King) It is just the code that is sent away from
the aircraft. In normal operations for air traffic controllers
the computer converts that into the call sign. It is much more
difficult to try and find what the squawk is and then find what
the call sign is. It is much easier to have the call sign on the
radar screen itself. As the computer gradually fails you need
to hand write strips, which is obviously a much longer process
than the computer printing that information. There is a lot more
telephone calls to make to pass information because the computer
itself is no longer passing information on to adjacent sectors.
That does cause a considerable increase in work load. There are
immediate attempts then to restrict the number of aircraft within
the system. Because of the rapid increase of work load you cannot
continue at the same pace as you were working at before. It does
become more pressurised and stressful during these problems. The
staff do absolutely all they can do to maximise safety, first
of all, and then try and rebuild the service levels.
16. I am sure you understand, we make no criticism
of the staff nor are any of our questions based on the assumption
that staff were not doing their very best under very difficult
conditions. We need to know how, why and when. Would you say that
British Airways are wrong when they say that part of your problem
is that you do not have any understanding of the commercial pressures
that are put upon airlines and pilots? Some of your decisions
and your working practices do not respond to their particular
needs.
(Mr Findlay) Can I respond to that? I think the problem
is that the airlines always think that their particular needs
are first and foremost. The particular needs are, in fact, for
safety; that is the only reason that NATS exists, for safety reasons.
Airlines do not always see that they should not come first in
any queue.
Chairman: That is a very unusual attitude in
human beings, I have never encountered it before!
Mr Olner
17. It concerns me somewhat, are you really
saying to me that NATS will introduce new software without it
being thoroughly tested, without the old software that was being
used in place? Do you not run things in tandem?
(Mr Findlay) Things are run in tandem. The software
is so complex that at various points in time the new software
has been tested as much as it can be but there is a conflict somewhere
in the old software, but it could be way down the line.
18. I am struggling to grasp this. Is this because
one of your guys has pressed the wrong button? I am a computer
person who struggles with it. We all press the wrong button and
we think, goodness me, it is gone now. It is my fault, not the
computer's fault. If you have a proven system that has been working
and an updated system is brought in, how long do you keep the
old system in as a back-up until you have gone through all of
the glitches?
(Mr Moonie) It does depend on which computer system
you are changing to, because by nature of the design of the system
it would depend on how you do software updates. It certainly is
the case that they do keep old versions of the software available.
In terms of the testing of new software the general process is
that when you decide you need to make a change, the first thing
you have to do is specify what the change is there to achieve.
19. Who determines that? Who determines that
the software is going to be changed? Is it you, the controllers,
who say, "We want something better", or is it management
saying, "This is what you are going to have because we have
been convinced that it is better"?
(Mr Moonie) The change is either determined because
there is a requirement to have new features or new systems introduced
but also software change is introduced to fix previously identified
problems. Some of those problems may be of a serious nature and
some of them may be of a minor nature. In terms of the testing,
the starting point is when you design a change you first want
to prove that the change does that which you designed it for,
then you have to consider which ways this could this go wrong.
You have to consider putting in what you might call rogue inputs
or failure modes. We can spend a lot of time thinking up ways
in which to test this. There is always a possibility of a particular
combination of events occurring that we had not thought up when
we designed our testing, and that could lead to the situation
we were in.
|