Moving From A Data Revolution To Data Evolution
- Daniel van der Woude, Product Owner at AllianceBlock
- 06.01.2022 12:45 pm #data
The best ideas are formed through the meeting of minds. Everybody loves the story of a scientist that suddenly has a eureka moment. Yet the truth is, most innovators come up with great ideas by talking to other people. This is why we have flexible workspaces or start-up hubs where people can mingle and collaborate. When we meet at the coffee machine or chat over a beer after work, the cells in our brain literally form new networks, gather information, and allow different ideas to arise. The same is true when it comes to data. The whole picture is often more impactful than the sum of its parts.
When we apply machine learning algorithms to vast amounts of data, the same principle that exists in human interactions applies. When we let machine learning algorithms run over vast amounts of data, often, we witness the creation of surprising connections. The algorithm gives us a new idea, a valuable insight that we might not have thought about before.
The last decade has seen exponential growth in the amount of data that we produce and consume. When I started at university 10 years ago, I was really proud of my laptop that had 250GBs of harddisk space. Nowadays, that’s about the same storage capacity of an average Iphone. It’s little surprise then that last year the entire digital universe reached about 44 zettabytes (That’s a 44, with 21 zeros behind it!). To put that into context, there are now about 40 times more bytes of data than there are stars in our observable universe.
The proliferation of data has fueled incredible innovation across most industries, from enabling the discovery of new medical drugs, to making self-driving cars a reality. Despite this however, much of the data that currently exists sits in siloes, meaning we are unable to connect the dots. The expansion of the data economy is driven by faster computation power and increasingly cheap storage space. However, we are still early in the adoption phase. So if there is so much that connecting different data sets can offer in terms of innovation, why haven’t we tried to make it widely available yet?
Many organizations struggle with making data available internally, let alone externally. The more data you have, the more complex your architecture becomes. Making data clean and accessible is a huge challenge, while siloed organizational structures mean that it can be hard for the right data sources to be connected with the right people. Sometimes organizations reject or overlook data insights if they do not align with business objectives, meaning that what could be useful information for a wider group, is not made available.
Organizations that possess a lot of data may find value in being the sole keepers of that information. They may not want to share it with others because gathering and processing data about their users gives them a competitive advantage that they can capitalize on. At the moment, many institutions treat their data like they have a patent on it, seldomly sharing it outside of their organization or charging a hefty fee if they do. In order to tackle these issues, we need to continue to innovate and cooperate. Companies should not construct barriers, but instead make sure that information can flow freely inside and outside of their organizational structures.
Creating platforms where data can be explored and connected, by anyone inside or outside an organization, will also be key. People should be able to freely explore data and get rewarded if they discover something interesting and unexpected. There are already some strong efforts being made in this regard by some corporations that sponsor competitions for data scientists and we need more initiatives like this.
In the end, a mindshift from the biggest actors within the data space needs to take place. This might still be far away, as the discussion about who owns your data has only really just begun. However, the current move towards Web3 and a more user-centric, secure data ownership model provides reason for hope. Blockchain enables us to better track how much data is actually used by the end user without the interference of any trusted third party. Imagine that instead of paying a hefty fee for a subscription to a dataset that is only partly interesting for you, you could pay per use, where you only pay for the data that you actually consume.
The good news is that we are making steps in the right direction. Governments, institutions, and corporations are setting up different data platforms and making more of their data available. Privacy issues and institutional rigidity remain ongoing obstacles but with the right measures in place, machine learning algorithms and free flowing data will help create even more Eureka moments in the future.