IT failures in the Financial Services Sector Contents

5Operational resilience and incident management

Firms’ management of operational resilience


157.Improving operational resilience will require a level of investment by financial services firms. As previously described in Chapter 3, PwC commented that “Since the financial crisis profitability in the banking sector in the UK (and globally) has been depressed” which resulted in “reduced expenditure on technology upgrades and other important infrastructure improvements”.199 A lack of investment can increase the risk of incidents. PwC added that “Many financial services have complex legacy technology without ongoing investment to upgrade or replace these systems the risk of issues increases over time”.200

158.Some firms explained that they are increasing their investment in operational resilience capability. For example, Graham Bastin, Head of Operational Resilience at Barclays, highlighted that they “have probably spent upwards of £1 billion over the last three years”.201 PwC explained that it expects investment to increase, and that “The Q4 2018 results of our latest PwC & CBI Financial Services survey found spending on IT, which was already strong in the previous quarters, is expected to increase further”.202

159.Whether or not investment in operational resilience is sufficient at present, in its recent report TheCityUK commented that “in contrast to the investments already made across financial services to address financial resilience, the cost of achieving operational resilience will be small”.203

160.We heard that the level of investment in technology following the financial crisis has been affected by cost-cutting by financial services firms. Whilst some firms argued that they have invested in technology, many consumers would be disappointed that cost control has affected important investment in firms’ IT and operational resilience. Given the profits generated by the financial services sector, this is not an acceptable position.

Industry skills and experience

161.The increasing reliance on technology in the financial services sector, and the complexity of firms’ IT architecture has created greater demand for technical skills in the sector. Marcus Scott, TheCityUK, told us that the volume of skills available is an issue as:

One of the biggest challenges now is that our industry is competing with the rest of the economy for, more or less, the same skillset. This is only anecdotal, but one of our members, which was a bank, lost a team of web developers to [ … ] [a] takeaway company. 204

162.The demands for skills in the financial services sector may necessitate looking further afield in different sectors to recruit staff. Sarah Isted, PwC, explained that that firms might need to bring in “people from outside financial services to bring a different perspective and particularly to bring the customer view to it”.205

163.In response to the increasing demand for skills that Barclays were facing, Graham Bastin, Head of Operational Resilience, explained that it had “set up a technology campus in the north-west of England” where there are now 5,000 people.206 This included hiring about 600 apprentices as Barclays “saw the need to bring in talent where we could do the knowledge transfer between some of the older technologies and move towards digital and mobile [ … ] and give those people a career path where they would stay with us for a long period of time”.207

164.There is also the need for operational resilience skillsets and experience on the boards and senior management teams of financial services firms. Sarah Isted told us that “Given it is a newer area [ … ] making sure you have the right people with the right skills both to do the work and to review and oversee it will be critical”.208

165.During oral evidence in January 2019, Sam Woods, Deputy Governor Prudential Regulation and Chief Executive Officer of the PRA, was asked whether the PRA had rejected people at interviews for Senior Managers Regime functions for a lack of operational resilience experience. He replied that:

Coming to cyber and [operations] in particular, I am not aware and I do not think we have rejected anyone on those grounds alone. However, we have made a point to a number of boards that we think they need to build up their expertise in this area, as indeed many other institutions and we ourselves are building it up. That is a concern about the degree of experience at the top of these institutions in that particular field.209

166.Firms face challenges in hiring skilled and experienced staff to manage technology related risks, and we were encouraged to hear about some of the programmes that firms are investing in to train and develop staff. The financial services industry should work with universities and further education providers to develop the skills they need. There is an opportunity for firms to develop their own talent, and to recruit from a broad and diverse pool to improve their operational resilience capability.

167.Given the PRA’s concern about the level of operational resilience experience on the boards of some financial services firms, we expect the Regulators to ensure that firms are focussed on recruiting the right skills and experience for their boards and senior management and that they are developing diverse pipelines of talent for the future.

Industry collaboration

Collaboration and information sharing

168.There is a level of coordination within the financial services industry, through which firms share experience to improve sector resilience. The Regulators explained the importance of this collaboration:

We firmly believe that strengthening operational resilience requires collaboration. Regulators, firms, FMIs and technology providers should continue to work together to address the opportunities and risks presented by technology with respect to operational resilience, as part of the wider co-operation and collaboration on operational resilience being advocated by the Authorities.210

169.Financial services sector firms described how they collaborate with others within the industry. Asked whether firms work together to deal with incidents, Ian Lundberg, Chief Officer, Senior Vice President, Client Services Europe, Visa, told us that “From a Visa perspective, the answer is yes. The network is connecting the issuers to the acquirers, so we do work together. Across the board, we need to work collectively”.211 Anne Boden, CEO of Starling Bank, told us that “when we had a Visa incident, UK Finance gave us advice, and we collaborated in that environment to ensure that all customers had the right information”.212

170.We also heard of the role of coordinating bodies and working groups to facilitate collaboration. Ian Lundberg, Visa, outlined that:

UK Finance [ … ] has got roundtables on incident management communications. We are involved in them, with a number of other members of UK Finance. There is a cross-market operational working group led by both the Bank and UK Finance, and we participate in that. Also, the British Standards Institution is in the process of putting together an ISO on operational resilience.213

171.Even so, evidence was presented to us that there is scope for more collaboration. In its recent cyber and technology survey the FCA highlighted that firms are not contributing as much as they could be:

Similarly, the Regulators explained that the Cross Market Operational Resilience Group215 “concluded that there is a need for greater co-ordination and more rapid information sharing”216 during incidents. The Regulators also explained the Bank of England’s role in co-chairing the Cross Market Operational Resilience Group, and that they “recognise that we can go further”.217

172.We received evidence from a number of respondents highlighting that the most collaborative approach was taken to cyber risk. Graham Bastin, Barclays, thought that there was the most collaboration on cyber risk and that “We could do more along those lines in some of the other areas of resilience”.218 Visa also supported this approach, explaining that:

There are a number of lessons to learn from the UK’s cyber strategy, such as the industry’s collaboration across the public and private sector with the National Cyber Security Centre (NCSC), that can be adopted in the field of operational resilience.219

173.Internationally there are examples of private sector collaboration to facilitate improvements in the operational resilience of the sector. One commonly cited example is Sheltered Harbor. PwC explained that “The Sheltered Harbor initiative in the US, where institutions backup critical customer account data each night in an encrypted, separate data centre is an example of an initiative that could be explored in the UK”.220 Regarding Sheltered Harbor and industry collaboration, Lyndon Nelson, Deputy Chief Executive Officer and Executive Director for Regulatory Operations and Supervisory Risk Specialists, PRA, commented that industry needs to collaborate as whilst the Central Bank has tools to use for financial resilience, in the case of operational resilience “if a firm the size of Barclays or HSBC said that our retail banking system isn’t working, there is nothing the Central Bank can do.221

174.There are benefits to industry taking a collaborative approach, sharing information and working together to improve the resilience of the sector. Cross-industry bodies such as UK Finance and TheCityUK should work with industry to identify and facilitate further areas of collaboration.

175.In their response to this report, we expect the Regulators to set out their plans to build on their existing work facilitating industry collaboration. This should include encouraging participation of firms of all sizes, and highlighting where they think industry could go further. Where firms are reluctant to collaborate due to competitive pressures or commercial interest, such as becoming more secure but not sharing best practice in order to develop a commercial advantage, there is a role for the Regulators to encourage collaboration.

176.It is not acceptable for customers to be at risk of severe operational disruption to their banking services for an indefinite period, and for there to be no way to for the Regulators to help them, due to there being “nothing the central bank can do” as we have heard. If the industry is unwilling or incapable of collectively preventing such disruption, for example by creating critical data backups and operational plans to mitigate against the consequences of cyber attack, then the Regulators must act. In the absence of market initiative, the Regulators should take stronger action to foster market solutions, or to enforce regulatory ones, to mitigate the risks of severe operational disruption.

Sector exercises

177.In preparation for incidents, many firms222 run exercises to practice scenarios that could occur and rehearse responses. Visa emphasised the importance of financial institutions working together with other providers in the “ecosystem” as part of such scenario testing. Visa explained that it is:

Critical to consider and plan [ … ] which approach to use in different scenarios, including the roles and responsibilities of different parties and how best to communicate with end-users. We would also support and encourage cross-industry planning to develop and formalise these arrangements, coupled with ongoing joint scenario testing and table-top exercises.223

178.The Regulators have a role to play in setting up sector exercises. Lyndon Nelson, PRA, told us that they have run at least 10 exercises.224 He described one scenario whereby they simulated a cyber-incident spreading from parts of the G7, which was:

Testing protocols, communications, how we would deal with issues and how do we inform people about the tools that we would use. [ … ] We essentially ran the same scenario in what we call our simex—simulation exercise. [ … ] About 70 banks, insurance companies and other companies, and FMIs were involved in the simex. [ … ] We also do desktops, so we did a desktop with the US Treasury Secretary, where we go through some of the issues that principals—the Chancellor was there—might face.225

Graham Bastin, Barclays, explained that they are currently working with the PRA on a future industry-wide payments stress test.226

179.Following sector exercises, the PRA has shared lessons learnt. Lyndon Nelson PRA, told us that:

We issue a report. [ … ] That will have within it a number of work programmes—we will look at one on data integrity, and one on what we would do if a major institution was incapacitated. We are looking at communications. [ … ] We will obviously talk to individual firms as supervisors, and about how they would deal with an incident and what actions they need to take.227

180.Sector exercises are a valuable tool for improving the industry’s preparedness for incidents and identifying any potential areas of weakness. Such exercises can provide the opportunity for firms to rehearse responses to incidents and share best practice.

181.The Regulators should continue to facilitate sector exercises and should seek, in collaboration with industry and industry bodies, to expand the programme, in particular where new risks are identified, and where it is reasonably practical to include a wider range of firms. The Regulators should ensure that lessons learnt reports are shared with industry promptly after exercises.

Firm’s Incident management

Best practice in incident management

182.It is widely accepted that operational incidents will happen irrespective of how much a firm invests in prevention, and the Regulators have stated they believe that “disruptions and failures will inevitably occur”.228 This means that firms need to prepare in advance for how they would respond to an incident or multiple simultaneous incidents. For example, firms may have in place Business Continuity Plans to guide incident responses, including processes for convening crisis leadership teams. Alison Barker, Director of Specialist Supervision at the FCA, told us:

Although we want firms to focus on prevention of disruption and plan properly, [ … ] we also have strong messages around, “Be prepared. Make sure you understand how you will respond and recover from an incident. [ … ].” That is because firms that are not well prepared cause more disruption to consumers.229

183.When firms experience incidents, the severity, length and impact on customers is highly dependent on the ability of the firm to manage the incident. The Regulators highlighted that the key factors assisting in successful incident recovery were “strong communication and coordination; senior management involvement, visibility, and strategic direction; and a controlled management of fixes.”230 However, they also brought to our attention that they “recognise that more can and should be done to share best practice in relation to change and incident management”.231

184.Where there have been significant IT failures, it has been apparent in many cases that adequate recovery plans did not exist, resulting in significant disruption. Lyndon Nelson, PRA, emphasised the importance of firms having alternative arrangements.

In some of the instances we have seen, there really was not an adequate plan B. [ … ] If the hypothetical service—somebody getting their mortgage granted or their deposit paid—is primarily delivered through a computer system that is now out, what is the plan B.232

185.Banks described how alternative channels for delivering services to customers can help them minimise the impact of an incident on their customers. Barclays told us that its multi-channel strategy:

Ensures customers have alternative access to our services, in the event of unavoidable outages that affect their preferred channel. Very specifically, these channels (mobile, online, phone, etc.) are supported on different technology systems to ensure we can continue to service our customers through one channel in the event of difficulty in another.233

186.Firms agreed that there was a need to prioritise bringing services back on line after an incident, to ensure that the most critical services were available for customers.234 Graham Bastin, Barclays, explained that:

“Get my balance” is probably the most used and the most important service to our customers, along with “Make a payment” and so on, so the level of resilience we put around those services is higher than that for some of the other services, such as “Change of address”, which is less time critical.235

187.Firms are right to adopt a ‘when not if’ mindset on operational incidents. Given this, and the impact on customers when incidents occur, it is vital that firms have robust procedures in place to be followed in the event of an incident and a viable ‘Plan B’. The Regulators should ensure that assessing the adequacy of both the incident management procedures and evidence of exercising them, forms a fundamental part of their supervisory engagement. To drive up standards, the Regulators, or industry bodies, should issue best practice guidance against which firms can assess their own procedures.

Customer communications

188.When communications are not well handled and timely, incidents can escalate. Poor communications have affected firms in recent high-profile incidents. PwC explained that “In some cases the level of distress is exacerbated by a lack of clear communication from the firm to its customers or when customers cannot reach a member of staff”.236 For example, Visa was criticised for the handling of communications during its June 2018 incident, and was subsequently subject to specific directions from the Payment Systems Regulator.237 Similarly, the FCA criticised the quality of TSB’s communications following its IT migration. Andrew Bailey wrote to us saying:

The FCA has been dissatisfied with TSB’s communications with its customers and we have had concerns that TSB was not being open and transparent about the issues experienced. [ … ] TSB referred to “the vast majority” of customers being able to access their online accounts, at a time when there was a successful first-time login rate of only 50 per cent on the web channel.238

189.A number of respondents highlighted the value of proactive communications. The Regulators explained how some firms approach communications during an incident.

Some firms take a proactive approach, communicating with customers clearly through all available channels (applications, web pages, call centre messages, email, signing up for text updates) to keep them informed or advise them of the alternative channels that are continuing to operate and have capacity. [ … ] Effective communication early on allows customers to understand and manage the consequences of an incident.239

190.Graham Bastin, Barclays, emphasised the value of communicating with customers in the event of an incident. He told us that:

If you send a text or some other kind of alert, even when a feature on the digital mobile app is unavailable, or if you signpost that there is going to be a planned disruption at a weekend, which we have done fairly recently, that is really welcomed.240

191.However, UK Finance highlighted a trade-off firms face when communicating with customers.

At the outset of an incident, the cause and impacted parties are not always fully clear and there is a balance between early and accurate communication. For example, there is a risk of broadcasting to all customers about a service issue when only some may be affected. This could generate a spike in calls to contact centres by concerned customers, adversely impacting a firm’s ability to help those actually affected.241

Therefore, UK finance argued that:

While there may be common principles that can be drawn up concerning operational incidents, the form and timing of customer interaction should remain a matter of proprietorial judgement—in the full knowledge of the potential for customer detriment and, in extreme circumstances, reputational damage for the firm.242

192.There may be misleading information circulated during incidents—whether deliberate or accidental—which can heighten their impact, and could be exploited by malicious actors. Customers need to be able to trust the information they receive when an incident is happening. Equifax commented that:

In our experience, many consumers look for official government information to validate the authenticity of any communications they receive from a company in such circumstances. Throughout our incident, the National Cyber Security Centre (NCSC) website provided the most useful and understandable information to which we could signpost consumers.243

Given this, Equifax suggested a solution:

We believe a central portal could help UK consumers to verify information about an incident that they may have received directly via email, phone or letter. [ … ] At present there are a number of organisations that provide such information, but this is often disparate, incomplete and lacking profile or trust among consumers. We would welcome the Treasury Committee’s consideration of whether a visible Central Government-led portal could help signpost, reassure and better equip consumers.244

193.Lyndon Nelson, PRA, explained that senior figures in the Regulators and Government would communicate externally if necessary. In response to a question from us on who would reassure the population, he told us that “In the exercises that we run, we absolutely contemplate asking the Governor, the Chancellor or whoever to do these things, because that is what we think is the right response at the time”.245

194.Poor customer communications can exacerbate the impact of an operational incident, and previous high-profile outages have demonstrated this all too clearly. Clear, timely and accurate communications must ensure that customers are aware of the incident and that they receive advice on remediation timelines and alternative access. Customers have the right to this information.

195.While accuracy of communications is important in order to avoid misinformation, firms should not unnecessarily delay or withhold information, even where reports of an incident may risk their reputation. It should not be left to a firm’s discretion as to whether to communicate to customers or not. If in rare circumstances there is a valid reason not to inform customers, this should require regulatory permission, and must not cause greater harm to customers.

196.Customers need to be able to trust the information they receive during an IT incident from a financial services provider. Where communications are ineffective, or in major incidents where there is the need for a central source of trusted information, the Regulators should step in, which might include circulating information via a centralised portal.

Customer complaints and compensation

197.A number of firms explained in evidence to us that they compensate customers affected by IT incidents. RBS wrote that “Any customer who advises us that they have been affected by an incident will not be left out of pocket”.246 Additionally, the FCA explained that “there are clear rules setting out timescales that we expect firms to follow when a complaint is received. For most complaints about payment services, firms have 15 days to resolve a complaint, and up to eight weeks to resolve all other complaints”.247

198.While firms described thorough compensation processes for when customers had been impacted by an incident, in practice, customers have struggled to make contact with firms and have had to wait for prolonged periods of time for compensation to be paid. For example, TSB reported on 27 July 2018 that only 37 per cent of the complaints related to its IT migration (in April 2018) had been resolved.248 In February 2019 TSB announced that they had resolved “90 per cent (181,000) of the 204,000 customer complaints received since migration”.249

199.We are shocked to hear of the time taken for some customers to have complaints answered following an IT failure. This is an unacceptable position for customers and could lead to greater harm. Firms must act swiftly and fairly in responding to complaints and awarding compensation where customers have experienced harm or financial loss as a result of an IT incident. Given increasing demand on complaints teams following an incident, firms must be able to quickly scale up their capability. The FCA must ensure that firms are resolving complaints and awarding any compensation quickly and take action where this is not the case.

199 PwC (OPR0008)

200 PwC (OPR0008)

202 PwC (OPR0008)

209 Treasury Committee: Oral evidence: The work of the Prudential Regulation Authority, HC 704, 23 January 2019 [Q165]

210 Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

215 The Cross-Market Operational Resilience Group (CMORG) “is chaired by the Bank and the industry, and provides a platform for co-ordinating and promoting work both aimed at strengthening the resilience of the financial sector and improving its ability to respond to operational incidents.” Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

216 Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

217 Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

219 Visa (OPR0007)

220 PwC (OPR0008)

222 Equifax (OPR0006), Barclays (OPR0009), RBS (OPR0004)

223 Visa (OPR0007)

228 Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

230 Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

231 Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

233 Barclays (OPR0009)

236 PwC (OPR0008)

237 Payment Systems Regulator, PSR PS19/3, Specific Direction 9, May 2019

239 Financial Conduct Authority, Bank of England and Prudential Regulation Authority (OPR0012)

241 UK Finance (OPR0005)

242 UK Finance (OPR0005)

243 Equifax (OPR0006)

244 Equifax (OPR0006)

246 RBS (OPR0004)

Published: 28 October 2019