How Could a Greater Focus on Operational Resilience Have Reduced The High Number of Outages Across Financial Services in 2020?

ITRS Group: How Could a Greater Focus on Operational Resilience Have Reduced The High Number of Outages Across Financial Services in 2020?

Financial
07.04.2021 08:26 pm

Over the past few years, operational resilience has been rising up the priority list for financial institutions and regulators alike, with the pandemic only accelerating this process further. Despite significant strides being made, the market volatility that has defined the past year has shone a light on how much work firms still have to do to become operationally resilient.

Firms across financial services suffered from outages in 2020, whether they were an exchange, asset manager, or retail trading platform. The most recent major outage, however, and one that has raised many questions on wall street and across the globe, was the Fed outage.

For four hours, the Federal Reserve systems that execute millions of transitions a day were disrupted. The Fed have been tight lipped at what caused the problem but given the scale of the outage – it affected both the automated clearinghouse system FedACH and the Fedwire Funds interbank transfer service – we can assume that this wasn’t simply a failure within an isolated system but an issue with a core part of the Fed’s IT infrastructure.

However, we don’t need to know exactly what went wrong to understand how the Fed could have got it right. Central to any strategy to minimise outages has to be comprehensive IT monitoring. IT monitoring allows a firm internal oversight into their IT estate, meaning they can quickly identify issues and solve them before they cause outages.

As companies further enhance their digital service, their estates grow even more complex. This means they will be using more and more monitoring tools, since these solutions are typically tailored to certain systems/processes. This can result in firms not having a total overview of their entire system, but rather small snapshots, meaning that if a problem occurs in one system, they cannot track its effect across their entire estate. The solution to this is having a single monitoring tool that compiles all of the different tools into a unified view. If a problem then occurs, the IT manager can identify the source, the underlying cause and the affected areas, enabling it to be solved faster.

In addition to complete system oversight, capacity planning must also be a priority for CIOs. Many of the outages we’ve seen this year have been from companies offering new digital services but not knowing how much traffic they can handle in a certain timeframe. Capacity planning at a basic level allows firms to identify what a system can handle. At a more advanced level, it can identify specific pinch points as well as model future scenarios to see how your system handles them.

Firms are no longer able to simply apologise for an outage and move on. Not only are regulators cracking down, but customers are more willing than ever to switch. It’s time to get ahead of the curve now or risk falling behind permanently.