Data Scientist Skills Gap Crisis: a Glimmer of Hope in the Cloud
- Clifford McDowell, CEO & Founder at Doorda
- 01.11.2022 05:00 pm #cloud
The British government has acknowledged that we are in the grip of a significant data scientist skills gap in the UK – and announced the National Data Strategy to help alleviate this in the coming years – but meanwhile, innovation is urgently required to accelerate insight and increase productivity.
Almost half of businesses (48%) are recruiting for roles that require hard data skills but just under half (46%) have struggled to recruit for these roles over the last two years, according to the recent policy paper Quantifying the UK Data Skills Gap published by the Department for Digital, Culture, Media & Sport last year.
In the meantime, data scientists find themselves in the bizarre position of being highly sought after for their skill-sets, highly paid and yet wasting the majority of their working lives on the grunt work – inefficiently searching for useful data rather than carrying out sophisticated analysis.
A recent report by IDC found that searching for and preparing data took up a total of 44% of data scientists’ work time. However, there is now a potential solution on the horizon with the advent of cloud-based consumable and reliable data services.
Urgently required: focused, reliable, consumable data
Modern organisations use data to fuel analysis, provide insight and drive efficiencies and digital automation. Analysts and data scientists are continually looking for valuable data that can be combined with their own data to improve outcomes.
More than 1,500 official bodies publish vast amounts of useful, GDPR-compliant data but it is difficult to know what data is available, sometimes it is hard to find, and it is almost impossible to use.
Data scientists have been spending many hours researching, scraping public data scattered across the internet, and then repeating the process several times yearly to refresh it to get the latest information. Data is often used in ad-hoc projects, and after putting in a lot of work and effort, the same scientists have to re-learn all those skills and re-find the data many months later.
Many data companies are good at analysing or visualising data, but not at engineering it.
Never mind “big data”, which is not a phrase we often hear these days. The current thinking is about data focus. From information available, the key is focusing on data which implies or can be used to derive what you need to know. Fortunately, the data industry is quickly evolving to meet this challenge, offering cloud-based data feeds that data scientists can tap into on demand. The datasets are refreshed in real time, in alignment with the publishers, and can be easily interrogated online.
New data depth
Access to reliable and real time data can help empower data scientists. No longer needing to focus on low-level data preparation, instead, freeing up resources for analysing and feeding high quality data into their predictive models and AI tools.
This is already causing a revolution in the depth of information available to companies across multiple sectors – financial services, insurance, property, marketing, and any sector where data analysis is required.
For example, for a loan company attempting to analyse a business that wants to borrow money, the public information typically available about British companies on Companies House is from the previous year’s balance sheet. But now there are many more proxy indicators, clusters of data points, being made available in real time to data scientists that, when combined with more traditional information, provide a far deeper, up-to-date picture of the fundamental health of a business.
If they have moved premises several times recently that could be a cause for concern. Health and safety fine data or low hygiene ratings could also indicate that they do not even have enough money to look after their staff correctly. Suppose they have fallen foul of unfair dismissal tribunals. In that case, all of this helps form an overall picture of the behavioural indicators of the company rather than just considering their bottom line.
We have also seen the emergence of detailed geodemographics used by marketing companies. GDPR prevents such companies from targeting individuals but now they can access data on areas where, for example, income levels are high, combined with anonymised data about the health and age of people living in a particular area. This would allow a marketing company to choose where to target their next advertising campaign.
Finding and linking this data would involve visiting hundreds of locations online, downloading it, linking it together, running it by the legal team and hosting it before data teams could start consuming it and building it into models. But instead, this is now available for immediate consumption in the cloud within minutes instead of weeks or months.
The future: refinement
The next step will be greater data refinement – drilling down further to offer ever-deeper consumable and reliable data. If the government succeeds in its drive to plug the data scientist skills gap, this, combined with newly available cloud-based data engineering services, could significantly increase productivity across multiple UK industries and rapidly boost Britain’s ailing economy.