Appetite for risk — A/B testing in fast-paced environments

Graham McNicoll

Jul 9, 2021

min read

Appetite for risk — A/B testing in fast-paced environments

Rigorous statistics are often at odds with the needs of modern, product-driven companies to move fast and ship fast. Statistics is all about probabilities, and the more data, the more accurate the predictions. Product-led organizations are all about building the smallest part of a project that can add value as fast as possible, and then iterating quickly, looking for signals of product-market fit. The Venn diagram of these two areas overlaps with A/B testing, but this creates a tension between the statistics and the need to move fast.

If we all had enormous amounts of traffic to test against and infinite time to do these tests, we would make almost perfect decisions. Realistically, the pressure to ship fast often leads us to make calls on less-than-perfect data. These pressures can happen when metrics appear to be doing especially well or poorly, or if there are time constraints. You can, of course, stop a test whenever you like, as long as you’re aware of what this does to your statistics.

User behavior data has a lot of random variation, and this creates a noisy signal (it’s also a reason why trend data — data over time — is largely meaningless in A/B testing contexts). If samples are small, you’re more likely to be looking at noise than if samples are large. As the sample size increases, this noise is averaged out. Furthermore, if you’re using a Frequentist approach, your statistics only become actionable when the predetermined sample sizes are reached — otherwise, you’re falling into the peeking problem, the subject of many articles. If you’re using statistics that are less susceptible to peeking, like Bayesian or Sequential, you can peek and make decisions.

In all these contexts, some data is absolutely better than no data, but without finishing a test, you increase the odds of picking the wrong variation, and you lose the resolution on the most probable outcome. In short, you increase the risk of making a decision. The question that every experimentation program should be asking then is, what is your appetite for risk?

Risk

Most A/B testing statistics give you the chance to beat baseline/control as the probability that your variation is at least better than other control variation. But this measure gives no indication as to the amount better or worse it will be. If you’re forced to make decisions without perfect data, wouldn’t it be great to have some indication of what risks you’re taking? You would like to know if you call the test now, and you’re wrong about which variation you implement, what the likely negative impact would be. The good news is that Bayesian statistics give us just such a measure.

This risk, also known as potential loss, that Bayesian statistics provide can be interpreted as “When B is worse than A, if I choose B, how many conversions am I expected to lose?” It can replace the P-value as a decision rule, or stopping rule for A/B testing — that is, you can call your tests if the risks go below (or above) your risk tolerance thresholds, instead of using other values. If you want to read more about how risk is calculated, you can read Itamar Faran’s excellent article: How To Do Bayesian A/B Testing at Scale.

Growth Book recently implemented this same risk measure (with the help of Itamar) into our open-source A/B testing platform. This measure empowers teams to call tests earlier and be aware of the risks they are taking in doing so.

GrowthBook Bayesian experiment results showing Chance to Beat Control, Risk, and Percent Change confidence interval for two metrics

In the above example, both metrics are up, but only have about a 78% chance of beating the control, well short of the typical 95% threshold. However, neither one appears to be very risky. If you stop the test now and choose the variation and you’re wrong, your metrics would only be down by less than a percent. Depending on your business, that may be good enough, and you can move on to the next experiment without wasting valuable time.

The combination of Chance to Beat Control, Risk, and a Percent Change confidence interval gives experimenters everything they need to make decisions quickly without sacrificing statistical accuracy.

Example H2