Everyone suffers from having too much information. There is no shame in that. There is only so much you can deal with any time, and under bombardment we become confused. This is not a new phenomenon. When it comes to decision making, as the age-old phrase has it, we can’t see the wood for the trees.
In fact, recognition is the first step to recovery. It’s the way we deal with that information, storing it efficiently and processing it accurately, that is the mark of progress.
There have evolved, by stages, two schools of thought about how we deal with masses of data. These methods - data warehousing with ETL and data virtualisation – both have their pros and cons.
The first is an older generation of technology, although still extensively used. The handling of data, which by the day becomes both more vital and more complicated, is still a work in progress. Data virtualisation is the more recent iteration and incorporates some of the lessons learned from previous efforts.
Though data warehousing delivered much, in terms of broad-brush strokes, it didn’t give the full picture that everyone needs if they are to work together. Indeed, it gave a picture that was never quite complete, nor fully up-to-date, and thus open to different interpretations by different departments. To put that in context, a bank’s traders might have been singing from one song sheet, while the compliance department had a different version of the truth. Neither is necessarily wrong, they just have been briefed differently about the state of the assets within their organisation. This can lead to conflict within the company.
In a bank, for example, data warehousing could lead to complications if used today, because it is relatively unwieldy to adapt and thus less responsive to change. It was fine for a less complicated time when financial services were less complex and competitive.
Traditionally the bank's products were arranged along separate lines of business. There would be, say Fixed Income products, Equities, Futures and Options, Brokerage and so on. Since each had its own organisation systems and services, they could be analysed separately in distinct warehouses. So, the relative lack of a unified view of bank-wide data was less of a limitation, as queries were on each one’s sales, research, positions management, risk, books and records, clearing, settlements, payments and reconciliations.
Each line of business had its own controlled distribution of its data. However, the increasing complexity of business meant there was ‘too much information’ for even the data warehouse to handle. There are ever more customers, products and mergers. As if that was not complicated enough, there is a huge amount of crossover for clients who are increasingly asking for product interactions that straddle different lines of business.
Banks needed a new system that could give both unified and comprehensive versions of events, and at a much faster rate.
Data virtualisation improves on traditional methods because the extract, transform, and load (ETL) into the data warehouse process did not give the inquirer instant (AKA real-time) access to the source system for the data. It couldn’t and it wasn’t designed to work that way.
This led to data errors. Worse still, it involved extraction from one place which meant a massive workload of moving data. Further, this secondary copy now needed to be managed, audited and secured. This was a waste of resources.
Neither did data warehousing give a single version of the truth, even though it could have done. Data virtualisation can bridge data across all the original data sources as well as any data warehouses, across lines of business. The beauty is that it doesn’t need a whole new integrated physical data system. The existing data infrastructure can continue performing its core functions, while the data virtualisation layer uses the data from those sources.
So, data virtualisation becomes a complement to all the existing data sources. It sits on top and, by improving the communications, makes the organisation’s data more available for usage.
Data virtualisation is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project as required by the ETL/warehouse alternative. In the old days, when staff sought client reference data, they would have to go to each line of business separately, requesting data and tracking their request until it was delivered.
This led to a sort of black market of second-hand data, with distribution teams emerging who would take original copies of data from the systems of record and add their own information for their own local needs. They were acting almost as brokers. This practice increased risk and led to audit problems over data quality. There was a terrible price to pay, through lost productivity, compliance infractions and very high data integration costs.
The great thing about data virtualisation is that it can bring everything together, drawing data from all sources and straddling every line of business.
From multiple sources, data is aggregated into a single logical entity which can be read by everyone, whether they are coming in from a portal, a report, an application or a search.
What does this all mean in the ‘cold hard cash’ alluded to in The Wolf of Wall Street?
Typically, a data virtualisation can create a 300 per cent return on investment. The cost of data distribution will fall by 83 per cent and, finally, there will be a 25 per cent improvement in data quality. Given the value of financial data, that is a substantial dividend indeed.