Select Committee on Environment, Transport and Regional Affairs Minutes of Evidence


Examination of Witnesses (Questions 1 - 19)

WEDNESDAY 8 NOVEMBER 2000

MR IAIN FINDLAY, MR ANDY MOONIE, MR ANGUS MACCORMICK, MR LAURENCE KING AND MR NICK EWING

Chairman

  1. Gentlemen, can I welcome you most warmly to the Committee and ask you if would identify yourselves for the record?
  (Mr Findlay) I am Iain Findlay, National Officer of the Institution of Professionals, Managers and Specialists.

  (Mr Moonie) I am Andy Moonie, I am Branch Secretary of the Telecoms Branch of the Institution of Professionals, Managers and Specialists.
  (Mr MacCormick) I am Angus MacCormick, I am Branch Secretary of Air Traffic Control Officers, of the Institution of Professionals, Managers and Specialists.
  (Mr King) I am Laurence King, I am Assistant Branch Secretary of Air Traffic Control Officers of the Institution of Professionals, Managers and Specialists.
  (Mr Ewing) I am Nick Ewing from the Institution of Professionals, Managers and Specialists.

  2. Thank you. Gentlemen, those of you who have been before know that these rooms do rather eat sound. Although the microphones are in front of you they are to record what you say, they do not project your voices. I am afraid you are going to have to break a habit of a lifetime and raise your modulated tones. Can I ask you first, Mr Findlay, do you have any general remarks you would like to make?
  (Mr Findlay) We have some general remarks, and I will give a copy to the Clerk at the end. NATS operate systems which are both safety critical and safety related. Increasingly, as technology and automation advances, air traffic systems will come to be classed as safety critical rather than safety related. This requires a new approach, both for the design and technical management of those systems and the training needs of the staff who maintain and support these systems. Of particular concern to us is the concentration of expertise on the current LATCC system on small numbers of people. The loss of key experienced personnel, either to new career opportunities within the company, or elsewhere, causes a potential risk to the continued ability to maintain operational services. We are currently aware of difficulties in recruiting and retaining engineering staff, especially, given the buoyant nature of the computing and telecoms jobs market. The desire of staff to work on new technology, rather than the legacy systems that we have, and the wish for a long-term career makes continuing employment at West Drayton unattractive for operational engineers. In order to protect its ability to maintain these services NATS must consider some new ways to address this problem. It should be said, however, that when the recovery from failures has come about this was a demonstration of both the professionalism and dedication of staff in all areas. Indeed it is a tribute to the NATS staff that such failures remain a remarkably rare occurrence. The failure of the NATS computer system which carries out flight data and radar data processing can present a safety issue, although there are alternative systems available to provide services, such as radar data processing and display. The hardware of the NATS system was significantly upgraded in the late 1980s after it had become unreliable. The central processes are in the process of a further upgrade to more modern replacements. The combination of flight plan and radar data processing which a NATS system provides enables the tracking of aircraft and flight progress estimates to be determined. Automatic data links between the NATS system at LATCC, airports and other air traffic centres, both domestic and international, provide a mechanism to exchange flight plan data and notification of aircraft movements. These automatic methods allow current ATC system capacity to be attained. Manual methods used as a fallback during system failure are both time-consuming and increase the risk of error. In addition, in the tools to detect potential separation loss and alert control there is a conflict alert in the NATS computer system and they are, therefore, unavailable at a time when, of course, they would be most beneficial. In an environment with high and growing traffic levels system failures of any type during peak hours present increased risks to the normal handling of traffic in a safe and expeditious manner. Although aircraft departures can be suspended or reduced to alleviate this immediately following failure there is an impact on the workload of all operational staff. These impacts on the control of workload increase the likelihood, however small, that an ATC error may occur with possibly disastrous consequences. In particular, problems were experienced with aircraft call sign allocations during the June failure. I am sure that the Committee want to ask us questions on that.

  3. The matters you are referring to are tremendously important and are of real concern to the Committee. We will want to go into some detail on some of the points you have already made. If everybody agrees then we can take it that one voice will be sufficient. I will leave it to you designate which of your colleagues you want to do the work. Tell me about this National Airspace System, why did it fail on 17th June?
  (Mr Moonie) As far as I can tell, I am not a particular expert on that system, I speak for engineers as a whole, we understand that it was some software problem. I am sure my colleagues could go into that in more detail.

  4. Mr Moonie, I need a lot more voice. I do not mind you not understanding it but it is very wearing if I do not.
  (Mr Moonie) We understand it is a combination of software errors and multiple adaptation errors and restarts to the processors. After a number of these occurrences—

  Mr Olner: I am not getting this, it is a bit quick and it is still a little bit soft.

Chairman

  5. Nice simple words, Mr Moonie. How did the thing collapse and what did it do?
  (Mr Moonie) There were software errors, which are what we call restarts to the processors.

  6. Had they not been in the system before?
  (Mr Moonie) That is not an easy question for me to answer as I do not work on that system. We are in an environment where the systems are being modified continually to cope with both the increase of traffic and in order to be able to bring on the introduction of Swanwick. In order to be able to do that we they need to modify the current system. With the volume of change happening at the moment we understand that that can lead to some of these errors occurring.

  7. You thought it might be possible. You thought before 17th June there might be the possibility of a collapse, is that true?
  (Mr Moonie) I would not have said that. There is always a possibility of software errors in the complex systems we have nowadays. Our members working in NATS spend a lot of time testing modifications. It is such a complex area. There is no testing of a system like real life testing.

  8. Your instinct would tell you that no system is totally perfect. Because of the extra pressure being put on it and because of the changes that were taking place it was quite possible a failure would take place.
  (Mr Moonie) There is always a possibility.

  9. Were particular precautions taken in case that should happen?
  (Mr Moonie) I could not answer that.

  10. What did this do to the staff, this particular failure on 17th June, were the pressures that were imposed on them then acceptable?
  (Mr Moonie) It was a failure which occurred at the weekend, outside normal office hours. Although we had the frontline maintenance personnel there they did not have as much support as they would have had if it was during Monday to Friday office hours. Therefore, there was a certain amount of pressure on them.

  11. Just a moment, you are not telling me, I hope I am misunderstanding, the back-up systems are not there over the weekend because everyone goes home on a Friday night? I really will stop flying. I am not imaginative but some things do put the fear of God in me.
  (Mr Moonie) There are engineering personnel who work a 24 roster system to cover and respond to system failures, and they did that. During Monday to Friday the people who are particularly expert in their systems would have been available, because that is their normal working hours, and that would be an extra resource.

  12. You are saying it would be the software people that would not be available.
  (Mr Moonie) Yes. The difficulty you have when you have a major problem of the nature that occurred in June was that everybody, whether they are within West Drayton or outside, would be telephoning up to say, "What is the problem with the system, when is it going to be restored?" It is very difficult to deal with all of that pressure with the continuous ringing of telephones and getting on with fixing the system.

  13. It also has an effect on the speed at which you work, if you are slowing down because you are going back to older systems that rely on people writing things down on a paper based system?
  (Mr King) I think you are right, Madam Chairman, as that system failed, I think it was a progressive failure, you are unable as an air traffic controller to continue with your normal operation. You get intermittent or even loss of strip production, which is absolutely vital. In air traffic control you have to have that information when you need it.

  14. It becomes a cumulative matter, you are already slower because you are using a paper based system and the sheer bulk of work building up means that the system is less able to cope as you go on.
  (Mr King) Absolutely. One of the other things you can possibly lose as well is call signs on radar screens, which is a very difficult thing. When you only get the raw squawk, which is the numbers from the aircraft, that can cause a rapid increase in the work load for air traffic controllers. When you start to lose the strip reduction—

  15. What is a raw squawk?
  (Mr King) It is just the code that is sent away from the aircraft. In normal operations for air traffic controllers the computer converts that into the call sign. It is much more difficult to try and find what the squawk is and then find what the call sign is. It is much easier to have the call sign on the radar screen itself. As the computer gradually fails you need to hand write strips, which is obviously a much longer process than the computer printing that information. There is a lot more telephone calls to make to pass information because the computer itself is no longer passing information on to adjacent sectors. That does cause a considerable increase in work load. There are immediate attempts then to restrict the number of aircraft within the system. Because of the rapid increase of work load you cannot continue at the same pace as you were working at before. It does become more pressurised and stressful during these problems. The staff do absolutely all they can do to maximise safety, first of all, and then try and rebuild the service levels.

  16. I am sure you understand, we make no criticism of the staff nor are any of our questions based on the assumption that staff were not doing their very best under very difficult conditions. We need to know how, why and when. Would you say that British Airways are wrong when they say that part of your problem is that you do not have any understanding of the commercial pressures that are put upon airlines and pilots? Some of your decisions and your working practices do not respond to their particular needs.
  (Mr Findlay) Can I respond to that? I think the problem is that the airlines always think that their particular needs are first and foremost. The particular needs are, in fact, for safety; that is the only reason that NATS exists, for safety reasons. Airlines do not always see that they should not come first in any queue.

  Chairman: That is a very unusual attitude in human beings, I have never encountered it before!

Mr Olner

  17. It concerns me somewhat, are you really saying to me that NATS will introduce new software without it being thoroughly tested, without the old software that was being used in place? Do you not run things in tandem?
  (Mr Findlay) Things are run in tandem. The software is so complex that at various points in time the new software has been tested as much as it can be but there is a conflict somewhere in the old software, but it could be way down the line.

  18. I am struggling to grasp this. Is this because one of your guys has pressed the wrong button? I am a computer person who struggles with it. We all press the wrong button and we think, goodness me, it is gone now. It is my fault, not the computer's fault. If you have a proven system that has been working and an updated system is brought in, how long do you keep the old system in as a back-up until you have gone through all of the glitches?
  (Mr Moonie) It does depend on which computer system you are changing to, because by nature of the design of the system it would depend on how you do software updates. It certainly is the case that they do keep old versions of the software available. In terms of the testing of new software the general process is that when you decide you need to make a change, the first thing you have to do is specify what the change is there to achieve.

  19. Who determines that? Who determines that the software is going to be changed? Is it you, the controllers, who say, "We want something better", or is it management saying, "This is what you are going to have because we have been convinced that it is better"?
  (Mr Moonie) The change is either determined because there is a requirement to have new features or new systems introduced but also software change is introduced to fix previously identified problems. Some of those problems may be of a serious nature and some of them may be of a minor nature. In terms of the testing, the starting point is when you design a change you first want to prove that the change does that which you designed it for, then you have to consider which ways this could this go wrong. You have to consider putting in what you might call rogue inputs or failure modes. We can spend a lot of time thinking up ways in which to test this. There is always a possibility of a particular combination of events occurring that we had not thought up when we designed our testing, and that could lead to the situation we were in.


 
previous page contents next page

House of Commons home page Parliament home page House of Lords home page search page enquiries index

© Parliamentary copyright 2001
Prepared 17 January 2001