Using Multiworld Testing to find the perfect offer

Making decisions about what offers to personalize to which customers isn’t simple. There’s a huge number of variables to consider, and the more information you account for, the better the results.

There’s the customer – who they are, where they are, what they like, what they’ve bought and what they’re likely to buy. There’s the marketing mix – prices, products, places and promotions. And there are contextual variables – what’s the weather like? What’s the time? What’s happening at any given moment that will or can change how you need to talk to the customer, and how they’re likely to respond?


And it’s the context that makes things interesting. Customer decisions on where to go, how to engage and what to buy can alter in an instant due to a change in context: an unexpected downpour, getting lost, or discovering it’s 3:00 and you accidentally worked through lunch.

Luckily for marketers, access to contextual information through mobile and connected devices means we can adapt messaging to better reflect the circumstances. Someone caught out in the rain could receive an offer for a hot meal or savings on wet weather gear; someone in an unexpected part of town could be pointed at a recently opened outlet; and a lunch regular who didn’t check in could be invited to enjoy a coffee combo – something that could become a regular afternoon behavior with the right positioning.

So how do you work out the absolute best offer at any given moment, given the individual and their current context?

Short answer: machine learning. Shorter answer: testing.


Testing (the longer answer)

Most of us are familiar with A/B testing. You test one option against another until enough people choose either A or B for you to be confident it’s the best alternative. You then discard the less popular option and use the ‘winning’ option for the duration. This doesn’t consider context though, and it may take a while to capture sufficient data to be sure of the best possible offer.

Multiworld testing and multi-armed bandits

One way to experiment with offers is Multiworld Testing. This is significantly more complex than an A/B test, and to explain it we have to backtrack a bit and talk about probability - specifically bandits.

Imagine you’re in a casino looking to make your fortune. In front of you is a bank of slot machines: 1-armed bandits*. Because you’re not yet a millionaire, you want to maximize your gains from these machines; playing the machine(s) that will give you the most money. But because you’re just a rube, you don’t know in advance the odds of each machine paying out. The only way you can learn this is to play them.

So how much time do you spend playing each machine in order to make your millions? You need to collect enough information on each machine’s payout to make that decision, but you don’t want to waste too much time on less profitable machines. You also need to make sure you do find the most rewarding machines – so if you’re looking to optimize your winnings you can’t just settle on a good enough machine and stop looking (what if the motherlode machine is still out there?!)

In a perfect world you’d find the perfect machine straight away and play it until they boot you out of the casino. In the real, less exciting world you need to find a balance between exploitation – playing the machine with the best payout, and exploration – trying other machines to see if a better one’s out there.

mwt exploitation exploration.png

There are multiple strategies for divvying up your time among machines, but ultimately you want to end up playing the best machine as often as possible.

* This is usually referred to as a multi-armed bandit problem. These aren’t an actual thing in actual casinos but the upshot is that instead of deciding which machine to play, you’d decide which of multiple (k or n) arms to pull on one machine. We prefer people to walk around and get a bit of exercise in our thought experiments…

Working with more variables: contextual bandits

But wait, there’s more! Because we’re concerned with context, we need to consider how external factors might influence our foolproof wealth-maximizing slot machine selection strategy.

Now before we decide which machine to play, we’re alerted to some contextual variable. Maybe it’s 2am and not 10pm like we thought. Maybe some machines are being moved to or from another room. Maybe the number of players has doubled or drinks are now half price. Each time we go to play, we get context, we make a decision, we observe the result. With enough information we can connect these contextual variables to the payout of each machine, and in time (and given enough data) we’ll only need to know the variable to know the best machine to play.

All of which leads us very neatly to how we use MWT for retail offers.

  • Instead of choosing which machine to play we’re choosing which offers to send to customers.

  • For payout we’re measuring offer uptake, sales, basket value, or whichever measure fits.

  • Contextual variables include time of day, weather conditions, and location.

So given the context, which offer should we send in order to maximize our return?


Multiworld testing: cheeseburgers or sundaes?

Initially we work with fairly broad information. We know that people with full-time jobs generally prefer sundaes, and students prefer cheeseburgers. So because we want to maximize our return, that’s what we give them.

As we continue to send out these offers we receive information back that will help us refine our algorithms further. So over time we may change what we offer people based on the feedback we receive – whether offers are looked at or redeemed, and what the financial upshot of that is. Handily we don’t have to send people an under-performing offer while we wait for sufficient data to declare the winner of an A/B cheeseburger/sundae test; we’re always sending the best offer (or moving to the best slot machine) we find as our algorithm evolves.

Assuming we don’t receive any information that leads us to change our offer, we continue to promote cheeseburgers to students. Because we can (and should!) we also collect contextual information at the same time – so we can relate offers to time of day, location, weather conditions etc.

mwt burgersundae.png

What we now find is that when the weather is warm, students don’t actually respond that well to cheeseburgers – they’re more likely to take up a sundae offer. In this case the context (weather) is having a marked impact on our success. We’re no longer optimizing our outcome because we’re offering cheeseburgers to people who would rather have sundaes – we’re playing the wrong machine.

Luckily we’re not relying on human intervention to notice this and manually change up our campaigns, so we can adapt our offer in real time in response to the context. Now we have this information we’ll offer sundaes when it’s warm and switch back to cheeseburgers when it cools off; adapting our strategy to optimize our outcomes.

And because we’re balancing exploration and exploitation we’ll keep testing to make sure this is the best result we can get. Maybe we should introduce a frozen drink into the mix, or a salad, or new pricing, or a two for one deal - or any number of other options that might replace the sundae as the best offer with the best payout for our customer.

Updated August 2019.