Using Google Cloud and machine learning to improve fraud detection

Holly Godwin

PR Executive at Ocado Technology

Views 190

Using Google Cloud and machine learning to improve fraud detection

07.03.2018 12:00 pm

As the world’s largest online only supermarket, our systems handle millions of events every minute as our customers navigate our website and apps, add items to their trolleys, choose delivery slots, and check out their orders.

These interactions result in petabytes of data collecting in our data lake stored in Google Cloud. One challenge facing any retailer operating online is isolating and recognizing the rare incidents classified as fraud in a smart and efficient way.

For those unfamiliar with online fraud, it typically covers any instance where an order is delivered but not paid for. Fraud can happen as a result of a genuine mistake (a customer entering the wrong personal details or using an expired card accidentally) but, occasionally, it can also be the result of malicious intent. If left unchecked, fraud can propagate to other systems and companies and affect our customer service.

Therefore, we needed a clever way of predicting and recognizing these incidents among millions of other normal events. The answer to this complex challenge was to use the cloud and machine learning (ML). Our data science team had already successfully deployed many ML projects into production so it made sense to design our own solution using the experience and competencies we had gained from elsewhere in the business.

In addition to augmenting our contact center, machine learning pervades our end-to-end e-commerce, fulfilment and logistics platform. For example, ML is already powering the way we recommend products on our webshop or how we generate search results designed in order to avoid suggesting meat to vegetarians or products containing gluten to celiacs.

The motivation behind using ML for fraud detection was twofold: speed and adaptability. Machines are fundamentally more capable of quickly detecting patterns compared to humans. Also, as fraudsters change their tactics, machines can learn the new patterns much quicker.



Traditionally, fraud detection agents are employed to make judgement calls on whether they think a certain interaction is likely to be fraud or not. Decisions are based largely on intuition and can leave companies in a position of playing a cat and mouse game with fraudsters. For example, if fraud agents notice a correlation between baskets containing an unusually large order of alcohol and confirmed instances of fraud, they might then continue to look out for this trend in future. However, once fraudsters pick up on this, a new trend may start for say household goods, and so the game of catch up continues.

A machine learning model on the other hand can learn and adapt far quicker, evolving based on the current environment and even predicting future trends; the model can also look at many more factors than a human or fixed rule based engine can. The work of fraud agents is then made more manageable, as they no longer have to frantically analyze thousands of data points to establish fraud. Instead, they simply perform a final check to confirm whether they should cancel the order or not based on the prediction made by the model; it’s a perfect case of humans and machines working together in harmony.

However, just because we could improve our fraud detection process with a ML model didn’t make it easy to implement. Confirmed fraud cases are incredibly rare; given a typical fraud rate of one in every thousand orders (0.1%), a machine learning model that is only 99.9% accurate could still miss several instances of fraud.

Therefore, our fraud detection ML model had to be incredibly accurate.

So, how did we do it? From the data we had collected from past orders, including cases of fraud, we created a list of features which included the number of past deliveries, the cost of baskets, and other information. The more features we included in the training data, the more reliable the model could be, so we made sure that we were providing our model with as much information as possible (and we will continue to add more as time goes on).

After collating our data, we then had to decide upon an algorithm capable of learning from the information. Eventually we implemented a deep neural network on TensorFlow, as it was precise and easy to deploy into production. Using TensorFlow was a natural choice as we had already made the move over to Google Cloud for data analytics so using TensorFlow alongside our data stored on the Google Cloud Platform worked well. It also made our model scalable and transferable, which has in turn empowered our developers.

In order to brainstorm ideas and improve our proof-of-concept model, we hosted hackdays where our multidisciplinary team of data scientists and software engineers explored new features, tested new models, analysed and visualised new data and explored monitoring. The main goal of getting everyone in a room was to manipulate our data in order to gain insights into the problem and provide more information for our model. These hackdays allowed us to focus solely on the task at hand and took our model to the next level.

We now have a model that predicts results in real-time and provides the likelihood of fraud as a probability, using the following process:

  • The customer order information is stored and analysed using BigQuery.
  • The information is then processed using Dataflow, where the data is normalised, (a process whereby numerical features are re-scaled around the origin, and categorical features are transformed into a sequence of integers). This reformatting is necessary for many ML algorithms, including Deep Neural Networks built using TensorFlow.
  • Dataflow is also used to transfer the data from BigQuery to Datastore and Cloud storage.
  • Datastore provides fast data access, allowing for pre-computed features to be accessed when running real time predictions.
  • Cloud Storage efficiently stores data using a file system.
  • Cloud ML consumes the data from Cloud Storage and produces models as APIs.
  • The Ocado fraud detection model, powered by TensorFlow, then reads the data from Datastore and then, using the Cloud ML APIs, makes real-time predictions.



The model has been a great success, improving Ocado’s precision of detecting fraud by a factor of 15x. However, we are keen to continue improving. We are now tackling our next challenges: investigating algorithms that could allow us to explain our predictions in more detail, assessing whether we can transfer learnings from one retailer to another, and considering what tools could help us to streamline our process.

Looking back, this project wouldn’t have been possible without the close collaboration between many technology and retail teams in Ocado: the ML model has been consuming data from our order management, payments, CRM, and e-commerce teams. We have been using the tools developed by our data engineering, data platform, and machine learning services teams. We also relied on the data governance team to help us set up the Google Cloud projects and interact with our colleagues in the fraud team.

In such a fast changing industry we are always trying to stay ahead of the game, exploring the latest technologies and thinking of creative ways to implement our cutting edge ideas.

Have you found any innovative uses for machine learning models using TensorFlow?


Latest blogs

Jay Singer Mastercard

How Small and Medium Sized Businesses can Go Big in Trade

Small and Medium enterprises (SMEs) are the backbone of the world’s economy, yet they remain largely ignored by the industry and governments when it comes to developing new technology solutions or policies for global trade. Read more »

Will Newcomer Wolters Kluwer

Wolters Kluwer: Taming data duplication to support business transformation

Thanks to CECL granular data requirements are here to stay. But how can firms tame the new data requirements by adapting their IT and data infrastructure? Will Newcomer, Vice President of Product & Strategy for Wolters Kluwer’s Finance, Risk Read more »

Andrew Toon Thames Technology

Eco-friendly Retail Cards – Lean, Green Revenue-building Machines

Consumers are increasingly conscious of their environmental impact and this extends to the brands, products and services they choose to engage with. Indeed, nearly 60% of customers are willing to pay more for brands that are committed to maintaining Read more »

Ed Molyneux Free Agent

Spring Statement: A wasted opportunity to provide clarity on many of the big issues facing freelancers and micro-businesses

It’s disappointing, although not surprising, that the Chancellor chose not to make any significant announcements in the Spring Statement. I think it was a wasted opportunity to provide some much-needed clarity about many of the big issues facing Read more »

Marcy Cohen Mastercard

Fortune Favors the Bold: The Cashless Society

Can you imagine a world without the tooth fairy? Read more »

Related Blogs

Guy Warren ITRS

The Financial Sector Comes Around to the Cloud

After initial hesitation, the financial services sector is warming up to the potential of cloud computing. The use of private and public cloud is growing exponentially in the space. Why? It’s due to a number of factors coming together. A better Read more »

Helen Kelisky IBM Cloud UK & Ireland

Innovating responsibly: How Data-led Reinvention Is Driving UK Businesses to Differentiate

We are witnessing disruption across industries and business models at a growing pace. Every record, interaction and engagement across the economy’s complex systems is generating data. New technologies are driving the reinvention of business models Read more »

Joni Leskinen Tieto

You've Heard of Azure, but What Do You Know About Azure Stack?

Microsoft's Azure Stack promises to be a game changer in the cloud computing market when it goes on general release at the later part of 2017, and we at Tieto are working hard to bring the benefits of Azure Stack to our customers. Read on to find Read more »

Sali Jalalpour Tieto

Transitioning to Cloud – Make it Work Securely

The final decision has been made. Papers are signed. You’re transitioning to cloud, but how does it work exactly, and how can you do this in a secure manner? We’ll go through the process in this blog post. Read more »

Karl Roe Nuvias Group

Top Five Cloud Trends for 2017

1. Un-Clouding We have all witnessed the ‘cloud rush’ of recent years where organisations have been encouraged to move their workloads to the cloud. However, there is a growing recognition among organisations that cloud services are not the be-all Read more »

Free Newsletter Sign-up
+44 (0) 208 819 32 53 +44 (0) 173 261 71 47
Download Our Mobile App