Blockchain and big Data: A great mariage
- Carlo R.W. de Meijer, Economist and Researcher at De Meijer Independent Financial Services Advisory (MIFSA)
- 05.02.2019 07:15 am undisclosed
Blockchain and Big Data are among the emerging technologies that are high on many companies’ agendas. Both are expected to radically transform the way businesses and organizations are run in the upcoming years. Long-time developing in a separate way, at first sight one might assume that these technologies are mutually exclusive. But that idea is rapidly changing.
There are growing expectations that distributed ledgers will help enterprises finally get to grips with Big Data, which thus far is struggling with a number of challenges. They are both powerful on their own, however when combined they may bring a large number of opportunities. Some even say that blockchain and Big Data are made for one another.
“Big Data is an incredibly profitable business, with revenues expected to grow to $203 billion by 2020. The data within the blockchain is predicted to be worth trillions of dollars as it continues to make its way into banking, micropayments, remittances, and other financial services. In fact, the blockchain ledger could be worth up to 20% of the total big data market by 2030, producing up to $100 billion in annual revenue.” Chris Neimeth, COO of NYC Data Science Academy.
In this blog I will look at what the interception of these two innovations may bring. Could blockchain be the solution for the existing Big Data issues and challenges?
Big data and data science/analytics: present challenges
Big Data is one of the fastest growing sectors in the world. Every business wants to get insights into usage patterns of their consumers. Massive datasets are thereby analysed using advanced statistical models and data mining. These Big Data sets will become even more prevalent over the coming years.
“It’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analysed for insights that lead to better decisions and strategic business moves.” Data Analytics Company SAS
“Data analytics has become the key to corporate competitive advantage because of its role in identifying emerging market trends. In turn, companies can use this information to make quicker and better decisions that help them drive profitability”. EY
The rise of Big Data has presented a slew of issues for both big businesses and everyday consumers. With the growth in data good analytics is becoming all the more problematic. Some major problems to data management and analytics include so-called dirty data, inaccessible data, and privacy issues. And as Big Data increases in size and the web of connected devices explodes, it exposes more of companies data to potential security breaches..
With the advent of Big Data, data quality management is both more important and more challenging than ever. Companies that are dealing with large datasets should ensure that the data are clean, secure and not been modified and come from an authentic source. They have to make sure that the latest version is synchronized among all of the data centres in real time. It should also be ensured that these data are accessible. For most, however, the data silos are still a major issue and a full company-wide digital transformation is still more concept that reality.
Blockchain and Big Data: two sides of the same coin
Main question is: how do both technologies relate to each other, if any? Notwithstanding blockchain has not been explored extensively in aspects of Big Data management and analytics, both technologies could and should be seen as two sides of the same coin.
While blockchain is focused on recording validating data (data integrity), data science analyses data for actionable insight, making predictions from large amounts of data (prediction). While blockchain is changing data management, the latter is transforming the nature of transactions. Or said in another way: “If Big Data is the quantity, blockchain is the quality”.
What may Blockchain bring?
Securing and interpreting such large amounts of information is not an easy task. Blockchain could be seen as an ideal solution to address many of the challenges of Big Data management and analytics. Blockchain technology has three essential properties: Decentralization, Immutability, and Integrity.
Decentralized
The biggest advantage of blockchain is that it is decentralized. No longer working in a central perspective where all data should be brought together, but instead in a decentralized way where data may be analysed “right off the edges of individual devices”. No single person or company controls data entry or its integrity; however, the immunity of the blockchain is verified continuously by every computer on the network.
Through its decentralized system, blockchain technology ensures the security and privacy of data. Through decentralized consensus algorithm and cryptography, blockchain validates data making it almost impossible to be manipulated due to the huge amount of computing power that will be required.
Distributed
Blockchain is a distributed database or ledger system that records economic transactions such that they cannot be manipulated. Each entry in the ledger – each “block” – includes the entire transaction history for that entry, making it theoretically impossible for anyone to change information about the records.
The distributed network system also means the same transaction is shared network-wide, making it secure by design. It cannot be forged due to the network architecture. As the blockchain is secure it can be used to prevent hacking and data leaks.
The process is also much more transparent because nothing can be changed without the approval of every server in the network, everyone can see what changes are being made. This makes it much more difficult for mistakes and fraudulent transactions to occur.
Immutable
Furthermore, validated data generated via blockchain technology comes structured and complete plus the fact it is immutable. As such, once a transaction has been recorded, it can never be changed. Information remains in the same state for as long as the network exists.
Immutability of data is of utmost importance to the corporations which are concerned with Big Data. If the data set that needs to be scrutinized is modified in any way, the resulting analysis is bound to be of little value.
Where Blockchain can help Big Data
The combination of blockchain technology and Big Data could deliver a number of interesting opportunities. There are several ways blockchain may be of help for Big Data management esp. data analytics. This technology has the most potential in the quality of the data. What this means is the data being captured and validated by big businesses on the blockchain will become far more valuable for companies.
“If there is a ‘sweet spot’ for blockchain, it will likely be the ability to turn insights and questions into assets. Blockchains will give you greater confidence in the integrity of the data you see. Immutable entries, consensus-driven timestamping, audit trails, and certainty about the origin of data (e.g. a sensor or a kiosk) are all areas where you will see improvement as blockchain technology becomes more mainstream.” VentureBeat
Ensuring Trust (Data Integrity)
The control of so-called dirty data (or erroneous information) is an area that blockchain can positively impact the data analytics field. Blockchain provides a seamless way to conduct data integrity and audit trails, since it ascertains the origin of data through its linked chains.
Blockchain ensures trust of data by maintaining a decentralized ledger. Data recorded on the blockchain are trustworthy because they must have gone through a verification process which ensures its quality. Data integrity is ensured when details of the origin and interactions concerning a data block are stored on the blockchain and automatically verified (or validated) before it can be acted upon. It also provides for transparency, since activities and transactions that take place on the blockchain network can be traced.
Manage Data Sharing
A blockchain-based Big Data system would allow providers to share records with any other sector with an interest without the exponential increase in risk factors that comes a network of different data silos. In this regard, data gotten from data studies can be stored in a blockchain network. This way, project teams do not repeat data analysis already carried out by other teams or wrongfully reuse data that is already been used. Also, a blockchain platform can help data scientists monetize their work, probably by trading analysis outcomes stored on the platform.
Preventing Malicious Activities
Because blockchain uses consensus algorithms to verify transactions, it is impossible for a single unit to pose a threat to the data network. Because the network is so distributed, it makes it almost impossible for a single party to generate enough computational power to alter the validation criteria and allow unwanted data in the system. To alter the blockchain rules, a majority of nodes must be pooled together to create a consensus. So, it is making an almost impossible task for cybercriminals to access and manipulate data on a large scale.
Predictive Analysis
Blockchain data (just like other types of data) can be analysed to reveal valuable insights into the behaviours, trends etc. As such they can be used to predict with good accuracy future outcomes of events like customer preferences, customer lifetime value, dynamic prices, and churn rates as it relates to businesses. With blockchain, banks and other organizations that require real-time analysis of data in large scale can observe changes in data in real time making it possible to make quick decisions.
What is more, blockchain provides structured data gathered from individuals or individual devices. And due to the distributed nature of blockchain and the huge computational power available through it, data scientists even in smaller organizations can undertake extensive predictive analysis tasks. This is, however, not limited to business insights as almost any event can be predicted with the right data analysis whether it is social sentiments or investment markers.
What benefits it may bring?
The advantages of integrating blockchain with Big Data are many. Their adoption in combination is capable of bringing unmatched results for enterprises “of all size”. There is a wide range of benefits that blockchain technology is capable of serving the data analytics systems. Blockchain will make Big Data more valuable because it ensures data quality, accessibility and security. More quality data with more insights means more value. This enables better management of huge volumes and variety of information that keeps flowing in for businesses.
Enhanced data quality
The technology brings innovations to data storage. By substituting the traditional storage methods with blockchain, businesses can enhance data quality as it is complete and structured. Moreover, the integration of blockchain into a Big Data analytics solution strengthens its core by eliminating its weak points. This increases accuracy and facilitates comprehensive analysis to deliver rich and reliable insights for the business. When the data captured by big business is more secure, more trusted, and more accessible, it will become easier for companies to make decisions based on their insights.
Strengthen data security
Perhaps the biggest benefit that this technology serves to Big Data analytics relates to the security of the information that resides within the blockchain. The system is decentralized, which means that there is no single person who holds control and it cannot be altered without the approval of everyone involved. This renders transparency into the entire system as well as alleviates the risk of fraudulent activities.
Fraud prevention
In the financial services industry, Big Data has not yet solved the difficulties of detecting fraud and assessing risk. This is primarily because existing detection and assessment methods depend on historical data. The blockchain technology allows the financial institutions check every transaction real-time. Thus said, instead of analysing the records of the fraud that already happened, the banks are able to identify risky or fraudulent transactions on the fly and prevent the fraud entirely. If financial institutions can harness blockchain as a means of conducting transactions, they will finally be able to evaluate risk and identify suspicious patterns in real time. This will help to protect banks and their customers from fraud.
Streamlining of Big Data
By storing data on a decentralised ledger Big Data companies could process all data much quicker and more efficient, compared to centralised solutions. It will also speed up the transaction process (making it almost instantaneous) and reduce the cost of money transfers by eliminating the barriers of security and risk checks involved.
Facilitate data access
Another way in which blockchain can power up Big Data and analytics is by streamlining the access to data. Users across various departments within the organizations can be made a part of the blockchain, where they can reach the data required for the analysis process. This smoothens up the work process and also shortens the time cycle of data access and analysis.
By storing the database in a blockchain, a single, immutable source of information is being developed where all those that are authorised to get particular information. Under blockchain one could need multiple authorized ‘signatures’ or permissions from other parties of a network to access records. This will ensure that all those will get precisely measured out information, that is needed for performing their role.
Real time insights
A whole new field is emerging around accessing huge amounts of data and gleaning insights from it, in near real-time. The marriage of blockchain and Big Data will help enterprises by making real-time analytics much more achievable and reliable.
“Since the blockchain has a database record for every single transaction, it provides a way for institutions to mine for patterns in real-time.” Abhinav Vankat of Noah Data
But extending blockchain to other areas such as AI, new data analytics, and specialized forms of data intelligence will enable to meet these real-time needs in the future.
Cost savings
As a result the adoption of blockchain technology as a part of the Big Data analytical models can bring down the costs of storage to a considerable extent. Immutable information becomes a business asset as it delivers insights that can be used over a long span of time by enabling long-term business decisions.
Healthcare as an example
One of the sectors where blockchain technology and Big Data could perfectly work together is healthcare where patient data are not 100% hack-proof. Data analysis is very relevant in the healthcare industry to track patient treatment and equipment flow. Non-careful handling of patient records in healthcare could have serious consequences such as incorrect diagnoses, wrong treatment methods, test results to get lost, wrong medicines etc. A big concern is that two different practitioners don’t have access to the same, updated information about the same patient, and as a result could prescribe conflicting treatment or, more seriously, drugs that could cause a lethal interaction.
Placing healthcare databases including all patients information, on the blockchain would create a single, unchangeable resource for practitioners to use when treating a patient. Physicians would be able to get instant access, and patients would get a lot more control over how their sensitive information is used. The most significant benefit the blockchain could offer healthcare is security. Under blockchain, even a doctor would require multiple authorized “signatures” or permissions from other parts of a network to access patient records. That information could also be shared – privately and securely – with any other stakeholders who could benefit from access.
“A blockchain-based healthcare system would also allow providers to share records with justice departments, insurers, employers and any other sector with an interest in people’s health without the exponential increase in risk factors that comes with stretching a network thin; after all, a multi-department system is only as secure as the defences at its weakest point”. Daniel Smyth for Big Data Made Simple.
Blockchain Big Data platforms
The blockchain industry is already very active in this field. Blockchain developers are building decentralized data marketplaces that are now starting to emerge. These marketplaces are platforms that use the peer-to-peer connectivity possible through blockchain technology to link data sellers with data buyers.
The common goal of these platforms is offering services that store those data on decentralised networks, instead of central servers, whereby third parties have no entrance to the data that are stored in the network. Some of these platforms even offer services, that allow users to rent their unused store capacity, in exchange of regular money or crypto currencies.
Here below are a select number of these networks that are making use of some sophisticated developments in blockchain technology to address the challenges of decentralizing Big Data. These are however just a few of many blockchain platforms looking to capitalize on the massive need for companies to improve the data sets they analyse and leverage for their products and services as well as put the power back in their hands.
Streamr
One example is Big Data marketplace Streamr that collects data from both individual users and IoT devices. In an increasingly connected world, IoT devices hold vast quantities of data about how we use our home electronics. Streamr leverages "sharding," a process by which a blockchain ledger is broken up into smaller pieces so that each node on the network doesn't need to bear the weight of the entire database, to create a fast network that includes all of this data.
ReBloc
Another example is ReBloc, a data marketplace for the real estate sector. This sector currently suffers from a lack of transparency and trust in its data. Real estate transactions usually depend on many different parties, including insurers, land registries, surveyors, mortgage companies, etc. Therefore, trusted data is critical to a real estate sale.
ReBloc uses a validation protocol to ensure the accuracy and trustworthiness of the data on its platform. Each data transaction is done through a smart contract, and before data is released to the interested user, it is run through a validation protocol that compares it across other data sets to judge its accuracy. Once it successfully passes the validation protocol, the buyer can trust that the data is valid, regardless of who provided it. The data are automatically released to the buyer and payment is sent to the vendor.
Endor Protocol
While the real estate market generally depends on fixed data sets for sales, other businesses using Big Data don't use it with specific goals in mind. This creates a further challenge - how can companies ensure that they are using the right data to ask the right questions?
The Endor Protocol, a blockchain-based AI toolset (developed by a group of MIT alumni), uses data pulled from online human-based sources and using artificial intelligence to generate crowd-based wisdom. It applies a discipline called "social physics" to generate answers to future-based questions.
Predictive analytics has huge potential where companies are developing and launching new products. For example, a marketing officer could use Endor to find out which kind of consumers are most likely to buy a product. Knowing this information could help optimize spending on advertising campaigns, as ads could be targeted to the consumers who are most likely to buy the new product.
StorJ
And there is Storj, a Blockchain project that allows to store data in the cloud in a completely decentralized way. It is similar to Dropbox, only that there is no central entity that manages the data. You basically pay directly to people who rent their storage and network excess capacity. For example, you could put all crypto currency transactions in a graph database, and then analyze transaction patterns, or research the origins of funds in a certain account.
With more and more of these blockchain platforms looking to operate in the background, with names like Arweave, O2OPAY, Datum and DataWallet, the amount of quality big data and subsequent analytics are going to drastically impact the way companies do business all around the world.
Forward thinking
While blockchain technology is still relatively new it is starting to have an impact on and how Big Data is being processed and analysed. From the examples above, it is clear that developments in blockchain technology are proving its ability to handle the challenges of decentralizing Big Data.
Blockchain has the potential to fundamentally change the way that Big Data is treated and analysed, with enhanced security and data quality just some of the benefits afforded to businesses using this technology. The potential of Big data analytics to positively change business operations grows may be even more compelling.
It is likely that we will see further progress in the partnership of Big Data analytics and blockchain as developments in this space continue. As the technology matures and there are more innovations around it, more concrete use cases will be identified and explored to benefit Big Data management and data analysis. As more data is collected in real-time it will be interesting to see how the blockchain will continue to revolutionise different industries and bring better data privacy.
So, blockchain and Big Data may become a great marriage.