Building Operational Resiliency in Payments

  • Atul Verma, Senior Payments Architect at Icon Solutions

  • 05.07.2022 12:00 pm
  • #payments

Although IT resiliency has long been a common theme and practice for financial institutions, the rapid digitalisation of financial services is underscoring its importance.

Over time, the financial system has become progressively more connected and in turn the risk of operational disruption more acute. As a result of threats to financial stability, resiliency has become a key focus for regulators. Most recently, in the EU and UK, the introduction of new rules will soon require financial institutions to take a more prescriptive approach to operational resiliency, by understanding how they provide their business to their clients, including operational risks and how prepared they are to manage them when disruption strikes. 

The problem for banks is that while modernising payments architecture is operationally disruptive, it is key to meeting growing customer needs. Equally, outsourcing services or relying on third-party providers can enable agility but it also has the potential to create Service Level Agreement (SLA) challenges. And although API convergence, Open Banking, and 24x7 system availability are opportunities to embrace innovation and connect with customers, they demand higher levels of IT resilience than ever before.

As banks lift and shift the legacy systems and applications that process payments to respond to the demands of the digital economy, what are the key considerations when it comes to their resiliency frameworks?

Developing a payments strategy

Before embarking on a digital transformation programme, banks need to really understand the tapestry of their existing payment systems and how any changes could impact resilience. This requires a clear vision and roadmap for legacy payment applications. While developing a strategy can be a tricky equation, as challenges around cost and complexity will mean tactical changes along the way, having a clear roadmap in place from the outset will make it easier for banks to analyse, estimate and mitigate risks.

Determining ‘High Availability’ requirements

Functional and non-functional requirements are usually documented very well during the design and development phase of a payment application. Operational ones, on the other hand, tend to receive less attention. Considering all incumbent banks and financial institutions have legacy systems, BaU operations and support processes in place, it is very important to consider the ‘as-is’ functions and inputs from these areas. In fact, a very well-captured operational requirement is a key driver for ensuring ‘high availability.

Designing a highly available payments system requires an assessment of all interfacing applications, their complexity and their affinity with the business. This in turn helps to determine SLAs. As payment processing systems are highly modular in design, it also helps to assess the requirements for each application and then categorise them into a critical graph to define the highly available environment that is needed. This in turn makes it possible to fine-tune the payment application and set the priority of execution and further processing, for example, Order Management → Payment Execution → Gateway → Scheme. 

Governance and risk management

In the wake of the pandemic, banks are building flexibility into their products and services to adapt quickly to changing customer needs and market dynamics. This is moving resiliency beyond the traditional parameters of fault-tolerance, technical failure and fail-overs, to include processes and people. It is also emphasising the important role technical authorities play in ascertaining the resiliency of payment applications before they move into production. Every business needs IT to support its goals, and the design and development of payment applications must be aligned with the overall strategy.

Furthermore, payments have high-risk areas which should be understood, assessed, monitored and communicated to Governance boards early in the design phase. Any unidentified risk may affect the operational resiliency of the application, so regular assessment of actions and controls should also be carried out, and a strategy in place for any known and/or accepted gaps.

Service and incident management

Banks’ payment processing environments are a complex patchwork of systems and integrated applications.  Some of which are operated outside of a bank’s own network, usually through a cloud service or third-party vendor. When any critical application is hosted on a shared resource or server, capacity planning is an important tool to avoid critical issues caused due to a lack or misconfiguration of resources. Having SLAs in place with such third parties is therefore paramount for maintaining the quality of service.

Incident management is another key consideration. Payment applications are always designed with high availability, usually with ‘zero’ RTO and RPO requirements, and so incident management plays a crucial role in fixing production issues. Although banks have traditionally focused incident monitoring on infrastructure health, monitoring and alerts must be enabled at the application, transaction, infrastructure, and network-level of the payments stream. This is particularly important for low latency applications to meet the requirements of the UK’s Faster Payments Service (FPS), and other real-time payment schemes around the world. It can also provide valuable insight into trends over time which can be used to proactively avoid SLA breaches and incidents in the future.

Related Blogs

Other Blogs