Using Google Cloud and machine learning to improve fraud detection

Holly Godwin

PR Executive at Ocado Technology

Views 781

Using Google Cloud and machine learning to improve fraud detection

07.03.2018 12:00 pm

As the world’s largest online only supermarket, our systems handle millions of events every minute as our customers navigate our website and apps, add items to their trolleys, choose delivery slots, and check out their orders.

These interactions result in petabytes of data collecting in our data lake stored in Google Cloud. One challenge facing any retailer operating online is isolating and recognizing the rare incidents classified as fraud in a smart and efficient way.

For those unfamiliar with online fraud, it typically covers any instance where an order is delivered but not paid for. Fraud can happen as a result of a genuine mistake (a customer entering the wrong personal details or using an expired card accidentally) but, occasionally, it can also be the result of malicious intent. If left unchecked, fraud can propagate to other systems and companies and affect our customer service.

Therefore, we needed a clever way of predicting and recognizing these incidents among millions of other normal events. The answer to this complex challenge was to use the cloud and machine learning (ML). Our data science team had already successfully deployed many ML projects into production so it made sense to design our own solution using the experience and competencies we had gained from elsewhere in the business.

In addition to augmenting our contact center, machine learning pervades our end-to-end e-commerce, fulfilment and logistics platform. For example, ML is already powering the way we recommend products on our webshop or how we generate search results designed in order to avoid suggesting meat to vegetarians or products containing gluten to celiacs.

The motivation behind using ML for fraud detection was twofold: speed and adaptability. Machines are fundamentally more capable of quickly detecting patterns compared to humans. Also, as fraudsters change their tactics, machines can learn the new patterns much quicker.



Traditionally, fraud detection agents are employed to make judgement calls on whether they think a certain interaction is likely to be fraud or not. Decisions are based largely on intuition and can leave companies in a position of playing a cat and mouse game with fraudsters. For example, if fraud agents notice a correlation between baskets containing an unusually large order of alcohol and confirmed instances of fraud, they might then continue to look out for this trend in future. However, once fraudsters pick up on this, a new trend may start for say household goods, and so the game of catch up continues.

A machine learning model on the other hand can learn and adapt far quicker, evolving based on the current environment and even predicting future trends; the model can also look at many more factors than a human or fixed rule based engine can. The work of fraud agents is then made more manageable, as they no longer have to frantically analyze thousands of data points to establish fraud. Instead, they simply perform a final check to confirm whether they should cancel the order or not based on the prediction made by the model; it’s a perfect case of humans and machines working together in harmony.

However, just because we could improve our fraud detection process with a ML model didn’t make it easy to implement. Confirmed fraud cases are incredibly rare; given a typical fraud rate of one in every thousand orders (0.1%), a machine learning model that is only 99.9% accurate could still miss several instances of fraud.

Therefore, our fraud detection ML model had to be incredibly accurate.

So, how did we do it? From the data we had collected from past orders, including cases of fraud, we created a list of features which included the number of past deliveries, the cost of baskets, and other information. The more features we included in the training data, the more reliable the model could be, so we made sure that we were providing our model with as much information as possible (and we will continue to add more as time goes on).

After collating our data, we then had to decide upon an algorithm capable of learning from the information. Eventually we implemented a deep neural network on TensorFlow, as it was precise and easy to deploy into production. Using TensorFlow was a natural choice as we had already made the move over to Google Cloud for data analytics so using TensorFlow alongside our data stored on the Google Cloud Platform worked well. It also made our model scalable and transferable, which has in turn empowered our developers.

In order to brainstorm ideas and improve our proof-of-concept model, we hosted hackdays where our multidisciplinary team of data scientists and software engineers explored new features, tested new models, analysed and visualised new data and explored monitoring. The main goal of getting everyone in a room was to manipulate our data in order to gain insights into the problem and provide more information for our model. These hackdays allowed us to focus solely on the task at hand and took our model to the next level.

We now have a model that predicts results in real-time and provides the likelihood of fraud as a probability, using the following process:

  • The customer order information is stored and analysed using BigQuery.
  • The information is then processed using Dataflow, where the data is normalised, (a process whereby numerical features are re-scaled around the origin, and categorical features are transformed into a sequence of integers). This reformatting is necessary for many ML algorithms, including Deep Neural Networks built using TensorFlow.
  • Dataflow is also used to transfer the data from BigQuery to Datastore and Cloud storage.
  • Datastore provides fast data access, allowing for pre-computed features to be accessed when running real time predictions.
  • Cloud Storage efficiently stores data using a file system.
  • Cloud ML consumes the data from Cloud Storage and produces models as APIs.
  • The Ocado fraud detection model, powered by TensorFlow, then reads the data from Datastore and then, using the Cloud ML APIs, makes real-time predictions.



The model has been a great success, improving Ocado’s precision of detecting fraud by a factor of 15x. However, we are keen to continue improving. We are now tackling our next challenges: investigating algorithms that could allow us to explain our predictions in more detail, assessing whether we can transfer learnings from one retailer to another, and considering what tools could help us to streamline our process.

Looking back, this project wouldn’t have been possible without the close collaboration between many technology and retail teams in Ocado: the ML model has been consuming data from our order management, payments, CRM, and e-commerce teams. We have been using the tools developed by our data engineering, data platform, and machine learning services teams. We also relied on the data governance team to help us set up the Google Cloud projects and interact with our colleagues in the fraud team.

In such a fast changing industry we are always trying to stay ahead of the game, exploring the latest technologies and thinking of creative ways to implement our cutting edge ideas.

Have you found any innovative uses for machine learning models using TensorFlow?


Latest blogs

Duena Blomstrom PeopleNotTech

Questioning Agile

I get that request all the time. People from all industries who have had no brush with the concept and who have seen some of my articles and can see my borderline-obsessive passion when it gets mentioned want me to provide a Cliff notes version to Read more »

N/A Red Deer

The hidden problems Europe uncovered during unbundling

This is the second in a three-part series of articles to help US asset and hedge fund managers answer their clients’ questions about the unbundling of payments for research and trading and understand what a best in class research management system Read more »

Sarah Jackson Equiniti Credit Services

PSD2 and consumer credit: how will open banking impact the market for unsecured loans?

15 months after open banking launched in the UK, consumer understanding of what it means remains low. In January this year, research showed that just 9% of British adults had used open banking services, and less than a quarter (22%) had even heard Read more »

Alex Nelson Investec Click & Invest

Spring Statement comment: "UK a land of opportunity for investors" - Investec Click & Invest

The chancellor's Spring Statement confirms that, despite Brexit uncertainty, the strength of the underlying economy means there are great opportunities in the UK for equity investors. British stocks have, over the last thirty years, been the darling Read more »

Danny Healy MuleSoft

PSD2 Deadline Tomorrow - How Should Banks Respond?

Tomorrow marks PSD2’s next deadline – the point by which banks must make their open APIs available for testing by payment and account information service providers. Danny Healy, financial technology evangelist at MuleSoft has thoughts on it. Danny Read more »

Related Blogs

Javid Khan Pulsant

Using Cloud to Relieve the Compliance Burden

A recent survey of more than 360 enterprises revealed that 86% are dealing with the complexity of multiple types of data and/or data-related processes subject to privacy and security compliance requirements. Just 61% say that their organisations are Read more »

James Smith Cloud Technology Solutions

Considering the Cloud Apps you don’t know about

GDPR has been covered in great length in the run-up to the enforcement date, with checklists, guides and whitepapers telling us what we need to do to stay compliant. This is all great, if your data is held in and structured in one central place. But Read more »

Guy Warren ITRS

The Financial Sector Comes Around to the Cloud

After initial hesitation, the financial services sector is warming up to the potential of cloud computing. The use of private and public cloud is growing exponentially in the space. Why? It’s due to a number of factors coming together. A better Read more »

Helen Kelisky IBM Cloud UK & Ireland

Innovating responsibly: How Data-led Reinvention Is Driving UK Businesses to Differentiate

We are witnessing disruption across industries and business models at a growing pace. Every record, interaction and engagement across the economy’s complex systems is generating data. New technologies are driving the reinvention of business models Read more »

Joni Leskinen Tieto

You've Heard of Azure, but What Do You Know About Azure Stack?

Microsoft's Azure Stack promises to be a game changer in the cloud computing market when it goes on general release at the later part of 2017, and we at Tieto are working hard to bring the benefits of Azure Stack to our customers. Read on to find Read more »

Free Newsletter Sign-up
+44 (0) 208 819 32 53 +44 (0) 173 261 71 47
Download Our Mobile App