Migrating a legacy predictive model to GCP

Our Scenario and Current State

In this example scenario, our company is an online retailer. We have had a data science team since 2013 and one of their first projects was a customer churn prediction model. The churn model predicts which customers are going to stop buying from us in the next week. We email a special retention offer to customers who are predicted likely to churn.

We track the profitability of our churn prevention campaign with target and control groups. The process has been profitable since it was first set up in 2013.

Some readers may be wondering “how” we can migrate this process to GCP. Others may be wondering “why” touch it all. If it is not broken, why “migrate” it? This article will answer both of those questions.

Diagram of the current architecture

The rush to send the retention offers starts on Friday morning. The churn model process "lives” on the laptop of one data scientist, Jake. Jake comes in at 9:00 AM and starts the process of querying the data warehouse and training the machine learning model. Jake must put all his work on hold while he is “running the churn model”. The model training and cross-validation are usually finished by about 11:00 AM. Jake spends 15 minutes checking the outputs and then he emails the CSV file with account numbers and churns scores to the CRM team.

The retention executive, Thomas, sets up the target and control groups in MS Excel and uploads four separate CSV files into the CRM system. The CRM system sends retention emails.

The CRM tracks who took up the retention offer. On Monday morning, Thomas extracts lists of who took up the retention offer, who did not and who made a purchase regardless. Thomas then calculates the uplift from the churn model in a spreadsheet and emails it around the company.

Sometimes Jake takes annual leave. Sometimes he takes personal leave. He cannot be in the office every Friday to “run” the churn model. Jake’s team members are there to pick up the slack –they all know his password. The password on Jake’s laptop has not changed since the creator of the churn model left the company in 2014 – and passed the laptop down to someone who previously held Jake’s role.  One of Jake’s team members, Anita, even has Jake’s password on a post-it note – stuck to her monitor. The post-it note is 5years old – it was “passed down” to Anita by a previous employee. The company uses SSO and Jake uses the same password for his Employee Self Service. Jake’s laptop was supposed to be decommissioned last year – but it is running a critical business process.

Luckily, the company has already start its cloud journey. Now it’s time to migrate this process to the Google Cloud Platform.


Intermediate State – Lift, Shift and Automate

Diagram of the intermediate state

The outcomes for the intermediate state are:

  • Automate the process known as “running the model” and un-block the process owner while it is "running"
  • Remove the dependency on Jack’s laptop and move the process to the cloud
  • Remove security risks
  • Automate the data loads into and out of the CRM
  • Automate the retention campaign uplift report
  • Share the retention campaign up lift report as a dashboard and get it out of the email system

The company has already migrated its data warehouse to Google BigQuery. We will create a GCE VM that is set up to “run” the predictive model. The VM can be suspended for most of the week. We will setup Google Cloud Composer DAG to execute shortly after the Data Warehouse ETL processes have been completed in the early hours of Friday morning. The DAG will start up our VM, which queries the data warehouse, trains and cross-validates the churn model - and outputs the predictions and model diagnostics into Cloud Storage.

Our data scientist, Jake, can then check the model diagnostics that were written Cloud Storage. If there are no problems, he can flick a switch to load the contact lists into the CRM. On Monday morning, another scheduled Google Cloud Composer DAG can extract the retention offer uptake data out of the CRM - into cloud storage – then load it into BigQuery. We can perform the campaign uplift in BigQuery and present the results in a dashboard. No need for spreadsheets in an email ever again.


Long Term Future State – Redesign the whole process

In the long term, we want to build areal-time “next best action” recommender system. The next best action could be a retention offer, a cross-sell recommendation or doing nothing. The system should act on real-time data – and message the customer in real-time. Building such a system is a big project that requires many divisions in the company to work together. We will save the details for the future.

Migrate and de-risk your legacy processes. Reach out to The Migration Company today!

trending posts:


Follow us on Instagram

View on Instagram
View on Instagram
View on Instagram
View on Instagram