Using Google Cloud and machine learning to improve fraud detection

Holly Godwin

PR Executive at Ocado Technology

Views 500

Using Google Cloud and machine learning to improve fraud detection

07.03.2018 12:00 pm

As the world’s largest online only supermarket, our systems handle millions of events every minute as our customers navigate our website and apps, add items to their trolleys, choose delivery slots, and check out their orders.

These interactions result in petabytes of data collecting in our data lake stored in Google Cloud. One challenge facing any retailer operating online is isolating and recognizing the rare incidents classified as fraud in a smart and efficient way.

For those unfamiliar with online fraud, it typically covers any instance where an order is delivered but not paid for. Fraud can happen as a result of a genuine mistake (a customer entering the wrong personal details or using an expired card accidentally) but, occasionally, it can also be the result of malicious intent. If left unchecked, fraud can propagate to other systems and companies and affect our customer service.

Therefore, we needed a clever way of predicting and recognizing these incidents among millions of other normal events. The answer to this complex challenge was to use the cloud and machine learning (ML). Our data science team had already successfully deployed many ML projects into production so it made sense to design our own solution using the experience and competencies we had gained from elsewhere in the business.

In addition to augmenting our contact center, machine learning pervades our end-to-end e-commerce, fulfilment and logistics platform. For example, ML is already powering the way we recommend products on our webshop or how we generate search results designed in order to avoid suggesting meat to vegetarians or products containing gluten to celiacs.

The motivation behind using ML for fraud detection was twofold: speed and adaptability. Machines are fundamentally more capable of quickly detecting patterns compared to humans. Also, as fraudsters change their tactics, machines can learn the new patterns much quicker.

 

 

Traditionally, fraud detection agents are employed to make judgement calls on whether they think a certain interaction is likely to be fraud or not. Decisions are based largely on intuition and can leave companies in a position of playing a cat and mouse game with fraudsters. For example, if fraud agents notice a correlation between baskets containing an unusually large order of alcohol and confirmed instances of fraud, they might then continue to look out for this trend in future. However, once fraudsters pick up on this, a new trend may start for say household goods, and so the game of catch up continues.

A machine learning model on the other hand can learn and adapt far quicker, evolving based on the current environment and even predicting future trends; the model can also look at many more factors than a human or fixed rule based engine can. The work of fraud agents is then made more manageable, as they no longer have to frantically analyze thousands of data points to establish fraud. Instead, they simply perform a final check to confirm whether they should cancel the order or not based on the prediction made by the model; it’s a perfect case of humans and machines working together in harmony.

However, just because we could improve our fraud detection process with a ML model didn’t make it easy to implement. Confirmed fraud cases are incredibly rare; given a typical fraud rate of one in every thousand orders (0.1%), a machine learning model that is only 99.9% accurate could still miss several instances of fraud.

Therefore, our fraud detection ML model had to be incredibly accurate.

So, how did we do it? From the data we had collected from past orders, including cases of fraud, we created a list of features which included the number of past deliveries, the cost of baskets, and other information. The more features we included in the training data, the more reliable the model could be, so we made sure that we were providing our model with as much information as possible (and we will continue to add more as time goes on).

After collating our data, we then had to decide upon an algorithm capable of learning from the information. Eventually we implemented a deep neural network on TensorFlow, as it was precise and easy to deploy into production. Using TensorFlow was a natural choice as we had already made the move over to Google Cloud for data analytics so using TensorFlow alongside our data stored on the Google Cloud Platform worked well. It also made our model scalable and transferable, which has in turn empowered our developers.

In order to brainstorm ideas and improve our proof-of-concept model, we hosted hackdays where our multidisciplinary team of data scientists and software engineers explored new features, tested new models, analysed and visualised new data and explored monitoring. The main goal of getting everyone in a room was to manipulate our data in order to gain insights into the problem and provide more information for our model. These hackdays allowed us to focus solely on the task at hand and took our model to the next level.

We now have a model that predicts results in real-time and provides the likelihood of fraud as a probability, using the following process:

  • The customer order information is stored and analysed using BigQuery.
  • The information is then processed using Dataflow, where the data is normalised, (a process whereby numerical features are re-scaled around the origin, and categorical features are transformed into a sequence of integers). This reformatting is necessary for many ML algorithms, including Deep Neural Networks built using TensorFlow.
  • Dataflow is also used to transfer the data from BigQuery to Datastore and Cloud storage.
  • Datastore provides fast data access, allowing for pre-computed features to be accessed when running real time predictions.
  • Cloud Storage efficiently stores data using a file system.
  • Cloud ML consumes the data from Cloud Storage and produces models as APIs.
  • The Ocado fraud detection model, powered by TensorFlow, then reads the data from Datastore and then, using the Cloud ML APIs, makes real-time predictions.

 

 

The model has been a great success, improving Ocado’s precision of detecting fraud by a factor of 15x. However, we are keen to continue improving. We are now tackling our next challenges: investigating algorithms that could allow us to explain our predictions in more detail, assessing whether we can transfer learnings from one retailer to another, and considering what tools could help us to streamline our process.

Looking back, this project wouldn’t have been possible without the close collaboration between many technology and retail teams in Ocado: the ML model has been consuming data from our order management, payments, CRM, and e-commerce teams. We have been using the tools developed by our data engineering, data platform, and machine learning services teams. We also relied on the data governance team to help us set up the Google Cloud projects and interact with our colleagues in the fraud team.

In such a fast changing industry we are always trying to stay ahead of the game, exploring the latest technologies and thinking of creative ways to implement our cutting edge ideas.

Have you found any innovative uses for machine learning models using TensorFlow?

 

Latest blogs

André Stoorvogel Rambus

5 Reasons Why E-commerce Sites Need a Token Gateway

Card-on-file (the process of collecting and storing payment card credentials) is fundamental to the card-not-present (CNP) ecosystem, facilitating the delivery of popular payment methods such as one-click ordering and recurring payments. Growing CNP Read more »

René Haeberlin ServiceNow

Getting personal with millions of customers

A new approach to the personal approach Banking, like any industry, goes though cycles. It has trends. Through its over-arching forward progression, there are times when it learns from the past, just as much as there are times when it forges the Read more »

Chris Skinner Financial Services Club

Over the next decade, over $2.5 trillion will be generated by blockchain trade

I just saw a report from the World Economic Forum, who estimate that over $1 trillion in new trade will be created through blockchain-based distributed ledger technologies (DLT) over the next decade. They also estimate it will reduce the global Read more »

Jason Bell ServiceNow

The opportunity no bank can ignore

Banks need to become technology companies. This is the single most critical strategic driving force in the industry as we all approach another of those dates that sound like defining moments: 2020. Just a few months away and yet everything will have Read more »

René Haeberlin ServiceNow

Financial Services: What really needs to change?

Among the many critical success factors driving change across the financial sector, some obvious ones stand firmly at the top of the 'to-do' list. Deloitte's  2018 Banking Industry Outlook observes: “For banks globally, 2018 could be a pivotal year Read more »

Related Blogs

James Smith Cloud Technology Solutions

Considering the Cloud Apps you don’t know about

GDPR has been covered in great length in the run-up to the enforcement date, with checklists, guides and whitepapers telling us what we need to do to stay compliant. This is all great, if your data is held in and structured in one central place. But Read more »

Guy Warren ITRS

The Financial Sector Comes Around to the Cloud

After initial hesitation, the financial services sector is warming up to the potential of cloud computing. The use of private and public cloud is growing exponentially in the space. Why? It’s due to a number of factors coming together. A better Read more »

Helen Kelisky IBM Cloud UK & Ireland

Innovating responsibly: How Data-led Reinvention Is Driving UK Businesses to Differentiate

We are witnessing disruption across industries and business models at a growing pace. Every record, interaction and engagement across the economy’s complex systems is generating data. New technologies are driving the reinvention of business models Read more »

Joni Leskinen Tieto

You've Heard of Azure, but What Do You Know About Azure Stack?

Microsoft's Azure Stack promises to be a game changer in the cloud computing market when it goes on general release at the later part of 2017, and we at Tieto are working hard to bring the benefits of Azure Stack to our customers. Read on to find Read more »

Sali Jalalpour Tieto

Transitioning to Cloud – Make it Work Securely

The final decision has been made. Papers are signed. You’re transitioning to cloud, but how does it work exactly, and how can you do this in a secure manner? We’ll go through the process in this blog post. Read more »

Magazine
ALL
Free Newsletter Sign-up
+44 (0) 208 819 32 53 +44 (0) 173 261 71 47
Download Our Mobile App