Using Google Cloud and machine learning to improve fraud detection

Using Google Cloud and machine learning to improve fraud detection

Holly Godwin

PR Executive at Ocado Technology

Views 1169

Using Google Cloud and machine learning to improve fraud detection

07.03.2018 12:00 pm

As the world’s largest online only supermarket, our systems handle millions of events every minute as our customers navigate our website and apps, add items to their trolleys, choose delivery slots, and check out their orders.

These interactions result in petabytes of data collecting in our data lake stored in Google Cloud. One challenge facing any retailer operating online is isolating and recognizing the rare incidents classified as fraud in a smart and efficient way.

For those unfamiliar with online fraud, it typically covers any instance where an order is delivered but not paid for. Fraud can happen as a result of a genuine mistake (a customer entering the wrong personal details or using an expired card accidentally) but, occasionally, it can also be the result of malicious intent. If left unchecked, fraud can propagate to other systems and companies and affect our customer service.

Therefore, we needed a clever way of predicting and recognizing these incidents among millions of other normal events. The answer to this complex challenge was to use the cloud and machine learning (ML). Our data science team had already successfully deployed many ML projects into production so it made sense to design our own solution using the experience and competencies we had gained from elsewhere in the business.

In addition to augmenting our contact center, machine learning pervades our end-to-end e-commerce, fulfilment and logistics platform. For example, ML is already powering the way we recommend products on our webshop or how we generate search results designed in order to avoid suggesting meat to vegetarians or products containing gluten to celiacs.

The motivation behind using ML for fraud detection was twofold: speed and adaptability. Machines are fundamentally more capable of quickly detecting patterns compared to humans. Also, as fraudsters change their tactics, machines can learn the new patterns much quicker.

 

 

Traditionally, fraud detection agents are employed to make judgement calls on whether they think a certain interaction is likely to be fraud or not. Decisions are based largely on intuition and can leave companies in a position of playing a cat and mouse game with fraudsters. For example, if fraud agents notice a correlation between baskets containing an unusually large order of alcohol and confirmed instances of fraud, they might then continue to look out for this trend in future. However, once fraudsters pick up on this, a new trend may start for say household goods, and so the game of catch up continues.

A machine learning model on the other hand can learn and adapt far quicker, evolving based on the current environment and even predicting future trends; the model can also look at many more factors than a human or fixed rule based engine can. The work of fraud agents is then made more manageable, as they no longer have to frantically analyze thousands of data points to establish fraud. Instead, they simply perform a final check to confirm whether they should cancel the order or not based on the prediction made by the model; it’s a perfect case of humans and machines working together in harmony.

However, just because we could improve our fraud detection process with a ML model didn’t make it easy to implement. Confirmed fraud cases are incredibly rare; given a typical fraud rate of one in every thousand orders (0.1%), a machine learning model that is only 99.9% accurate could still miss several instances of fraud.

Therefore, our fraud detection ML model had to be incredibly accurate.

So, how did we do it? From the data we had collected from past orders, including cases of fraud, we created a list of features which included the number of past deliveries, the cost of baskets, and other information. The more features we included in the training data, the more reliable the model could be, so we made sure that we were providing our model with as much information as possible (and we will continue to add more as time goes on).

After collating our data, we then had to decide upon an algorithm capable of learning from the information. Eventually we implemented a deep neural network on TensorFlow, as it was precise and easy to deploy into production. Using TensorFlow was a natural choice as we had already made the move over to Google Cloud for data analytics so using TensorFlow alongside our data stored on the Google Cloud Platform worked well. It also made our model scalable and transferable, which has in turn empowered our developers.

In order to brainstorm ideas and improve our proof-of-concept model, we hosted hackdays where our multidisciplinary team of data scientists and software engineers explored new features, tested new models, analysed and visualised new data and explored monitoring. The main goal of getting everyone in a room was to manipulate our data in order to gain insights into the problem and provide more information for our model. These hackdays allowed us to focus solely on the task at hand and took our model to the next level.

We now have a model that predicts results in real-time and provides the likelihood of fraud as a probability, using the following process:

  • The customer order information is stored and analysed using BigQuery.
  • The information is then processed using Dataflow, where the data is normalised, (a process whereby numerical features are re-scaled around the origin, and categorical features are transformed into a sequence of integers). This reformatting is necessary for many ML algorithms, including Deep Neural Networks built using TensorFlow.
  • Dataflow is also used to transfer the data from BigQuery to Datastore and Cloud storage.
  • Datastore provides fast data access, allowing for pre-computed features to be accessed when running real time predictions.
  • Cloud Storage efficiently stores data using a file system.
  • Cloud ML consumes the data from Cloud Storage and produces models as APIs.
  • The Ocado fraud detection model, powered by TensorFlow, then reads the data from Datastore and then, using the Cloud ML APIs, makes real-time predictions.

 

 

The model has been a great success, improving Ocado’s precision of detecting fraud by a factor of 15x. However, we are keen to continue improving. We are now tackling our next challenges: investigating algorithms that could allow us to explain our predictions in more detail, assessing whether we can transfer learnings from one retailer to another, and considering what tools could help us to streamline our process.

Looking back, this project wouldn’t have been possible without the close collaboration between many technology and retail teams in Ocado: the ML model has been consuming data from our order management, payments, CRM, and e-commerce teams. We have been using the tools developed by our data engineering, data platform, and machine learning services teams. We also relied on the data governance team to help us set up the Google Cloud projects and interact with our colleagues in the fraud team.

In such a fast changing industry we are always trying to stay ahead of the game, exploring the latest technologies and thinking of creative ways to implement our cutting edge ideas.

Have you found any innovative uses for machine learning models using TensorFlow?

 

Latest blogs

Ian Johnson Marqeta

UK finance finds that 7.4 million in UK living "almost cashless" lives

These findings show that even before COVID-19 struck, digital banking was increasing exponentially. As more people adopt online and mobile banking, the demand for greater personalisation, flexibility and value that consumers expect from their Read more »

Ian Bradbury Fujitsu UK

UK Finance's UK Payment Markets Report - Comment from Fujitsu

Over the past months, businesses have had to rapidly move away from physical cash in order to provide consumers with a safer service. However, this data shows us that a gradual movement away from cash in society started long before the Read more »

James Turner Turner Little

Protecting yourself against a recession

The coronavirus outbreak has spread to businesses, leaving many around the world counting costs. Notoriously, known as the Great Lockdown, it’s been affecting the world economy since early this year. The predicted recession is considered to be the Read more »

Alan Cole JHC Financial

Every Cloud: Covid-19 and the opportunity for digital transformation

Faced with tighter regulations and changing customer needs, over the last decade Wealth Managers have not had it easy – but with the development of new technologies, many have been able to create efficiencies, reduce costs and shrink operational Read more »

Nabeel Irshad Mastercard

Two sides of the same coin: Financial and digital inclusion

The issue of how to tackle financial inclusion has long been a part of the conversation in banking and financial services circles. Regulations have ledto the UK’s biggest banks having to provide ‘basic bank accounts’ to cater for those who do not Read more »

Related Blogs

Francis Leclerc Horizon Software

Just about managing: How cloud can help boost trading profits

It’s a tough environment for trading at the moment. Margins are being squeezed across the board to the extent that some major investment banks are completely withdrawing from certain asset classes upon discovering they are not making a profit. This Read more »

Christian Damour FIME

Host Card Emulation – key technologies to secure cloud-based mobile payments

The rise of ‘tap-to-pay’ payments made using smartphones is showing no signs of slowing down. It is estimated that mobile payments will amount to $14 trillion by 2022. To keep up with this trend, banks and issuers must be proactive in offering Read more »

Jeff Axelrad Amazon Web Services (AWS)

What are the European Banking Authority Guidelines on Outsourcing and what do they mean for financial services organisations?

Financial institutions across the globe use AWS to transform the way they do business. It’s exciting to watch our customers in the financial services industry, such as Allianz, Barclays, Goldman Sachs Monzo, Tandem, and Starling Bank, innovate in Read more »

Yeming WANG Alibaba Cloud

Innovative cloud developments for the financial services industry

The financial services industry has been energised by the power of digital technologies as it looks for new ways to deliver products and services. Exciting fintech innovations are helping the sector to reinvent itself, transforming its processes and Read more »

Gareth Williams YellowDog

Reach for the Skies with Your Multi-Cloud Transformation

The huge proliferation of sensitive data, increasing technological complexities and a continually evolving regulatory landscape mean data handling is more difficult than ever in the financial world. Cloud-computing has, to some extent, provided Read more »

Magazine
ALL
Free Newsletter Sign-up
+44 (0) 208 819 32 53 +44 (0) 173 261 71 47
Download Our Mobile App
Financial It Youtube channel