The Uplift Blog

Subscribe
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Bayesian Model Updates in GrowthBook 3.0
Analytics
3.0
Product Updates
Experiments

Bayesian Model Updates in GrowthBook 3.0

May 20, 2024
x
min read

In anticipation of the forthcoming GrowthBook 3.0 release, we’re making several changes to how our Bayesian engine works to enable specifying your own priors, bring variance reduction via CUPED to the Bayesian engine, and improve estimation in small sample sizes.

What does this update mean for your organization? In some cases, you may notice a slight shift in the results for existing experiments. However, the magnitude of these shifts is minimal, only applies in certain cases, and serves to enhance the power of our analysis engine.

This post will give a high-level overview of what’s changing, why we changed it, and how it will affect results. You can also check out the content of this blog in video form using the video below:

The new Bayesian model

In Bayesian inference, we leverage a prior distribution containing information about the range of likely effects for an experiment. We combine this prior distribution with the data to produce a posterior that provides our statistics of interest — percent change, chance to win, and the credible interval.

The key difference in our new Bayesian engine is that priors are specified directly on treatment effects (e.g. percent lift) rather than on variation averages and using those to calculate experiment effects. With this update, you only need to think about how treatment will affect metrics instead of providing prior information for both the control variation and the treatment variation. This simplifies our Bayesian engine and the work involved for you.

This change is largely conceptual for many customers. If you’re interested in the details, you can read about the new model here and the old model in the now outdated white paper.

3 key benefits of the new model

1) Specify your own priors

The previous model required at least 4 separate values to be specified for every metric in order to set custom priors. This involved more effort to come up with reasonable values.

Now, you set a single mean and standard deviation for your prior for the relative effects of experiments on your metric. For example, a prior mean of 0 and a standard deviation of 0.3 (our defaults if you turn on priors) captures the prior knowledge that the average lift is 0% and that ~95% of all lifts are between -60% and 60%, in line with our existing customer experiments.

The default is not to use prior information at all, but you are able to turn it on and customize it at the organization level, the metric level, and at the experiment-metric level.

Organization-level prior default settings showing mean and standard deviation configuration for the Bayesian engine

2) CUPED is now available in the Bayesian engine

By modeling relative lifts directly, the new model unlocks CUPED in the GrowthBook Bayesian engine for all Pro and Enterprise customers. CUPED uses pre-experiment data to reduce variance and speed up experimentation time and can be used with either statistics engine in GrowthBook. You can read a case study about how powerful CUPED is here and you can read our documentation on CUPED here.

3) Fewer missing results with small sample sizes

Our old model worked only when the model was reasonably certain that the average in the control variation was greater than zero. When the control mean was near zero, the log approximation we previously relied on could return no chance to win or credible intervals (CI). You might have seen something that looked like the following:

Example of the issues computing chance to win and confidence intervals in the old model

The 50% is a placeholder as we could not compute the inference and the CI is missing. This is somewhat frustrating, given that there’s almost 3k users in this experiment! In the new model, we do not have the same constraints and we can instead return the following, more reasonable results:

New Bayesian model returning complete chance to win and credible interval results for the same small sample experiment

How does it affect existing estimates?

Any new experiment analyses, whether that be a new experiment or a results refresh for an old experiment, will use the new model. If the last run before refreshing results used the old model, results could shift slightly even if you do not use the new prior settings.

  1. For proportion/binomial/conversion metrics, the % change could shift, along with chance to win and the CI. However, these shifts should be minimal (< 3 percentage points) in most cases, especially if the variations are equal size.
  2. For revenue/duration/count/mean metrics, the % change should not shift at all, but the chance to win and the CI could change slightly (again up to around 3 percentage points except in some edge cases).

For example, here’s a typical proportion metric and what it looks like before and after the change.

Before:

Proportion metric result before the Bayesian engine update showing chance to win and percent change

After:

Same proportion metric after the Bayesian engine update showing minimal shift in results

The changes for revenue metrics (and other count or mean metrics) should be even less pronounced on average.

Were the old results incorrect?

No.

The results from the old Bayesian model were not less accurate or incorrect. In general, Bayesian models can take many forms, each with their own pros and cons. The old model was tuned to specify priors for each variation separately, which made it highly customizable but more tedious to set up. The new model makes setting priors easier and improves our ability to compute key inferential statistics in small sample sizes.

The new model moves the Bayesian machinery to focus on experiment lifts; this allows us to leverage approaches like the Delta method to compute the variance for relative lifts as well as CUPED to make the new model more tractable in certain edge cases and more powerful for most users.

Feel free to read more about the statistics we use in our documentation here or reach out in our community Slack here.

GrowthBook Version 2.9
Releases
Product Updates
2.9

GrowthBook Version 2.9

Apr 3, 2024
x
min read

We’re proud to announce the release of GrowthBook 2.9, which includes many highly requested features, including feature flag approvals, URL redirect testing, and quantile metrics. Full details are below.

URL Redirect Testing

URL Redirect Testing setup showing experiment design with source and destination URLs for client-side redirects

One of the most common use cases for A/B testing is comparing two versions of a page hosted on different URLs to see which performs better. This was possible already with feature flags, but it required writing a lot of custom code and manually handling tricky edge cases. Now, GrowthBook has built-in support for this.

Simply design a new experiment, add a URL Redirect, and start the test! All you need on your site is the latest version of our JavaScript, React, or new HTML Script Tag SDK (see below). When a user visits the targeted URL and is assigned one of the treatments, they will be redirected to the new URL immediately.

This initial release is geared toward client-side redirects in a browser, but we’re actively working on support for CDNs and Edge Workers, which we’re super excited about! URL Redirect tests require a Pro or Enterprise license. View the docs here.

Feature Flag Approvals

Feature flag approval flow showing draft revision with ready-to-review and approve/request changes options

GrowthBook now supports advanced approval flows for feature flag changes. It behaves similarly to GitHub — make a change in a new draft revision, mark it as “ready to review”, another person on your team reviews it and decides whether to approve, request changes, or just leave a comment. Once approved, you can publish your draft to make it live. There’s also a brand new “Drafts” page where you can see all of the active feature drafts and their statuses.

Drafts page showing all active feature flag drafts and their current approval statuses

In this initial release, we let you configure which environments require approvals and whether approvals should be dismissed when further changes are made. The Drafts tab is available to everyone, but approval flows require an Enterprise license key. Contact sales@growthbook.io or view the docs if you’re interested in learning more. We have a lot planned here in the future, so stay tuned!

Quantile Metrics (Median, P99, and more)

Quantile metric configuration showing P99 latency and median purchase price built on top of Fact Tables

We’re proud to announce that GrowthBook is the first experimentation platform to fully support Quantile metrics. You can now report on things like P99 Latency, Median purchase price, and even use quantiles for decomposition deep dives!

Quantile metrics are built on top of Fact Tables and utilize advanced techniques to keep your SQL fast and efficient while maintaining high accuracy in the statistical results. You can read more about this (including a technical deep dive) in our docs. Quantile metrics are available to all Pro/Enterprise customers and support all data sources except for MySQL and Mixpanel.

Project-scoped Attributes and Environments

Project-scoped attributes and environments showing mobile-specific targeting attributes isolated from unrelated projects

For large complex applications, projects in GrowthBook are crucial for organizing your features and experiments. For example, you might have separate projects for your front end, back end, and mobile app. In previous versions of GrowthBook, targeting attributes and environments were global and shared between all projects. This resulted in some weird situations, like a mobile app having a “browserVersion” attribute or your marketing site having a “staging” environment when that was only relevant to your back end.

Now, you can restrict attributes and environments to a subset of your projects, simplifying the GrowthBook UI and reducing the chance for typos and mistakes. Plus, when combined with project-scoped roles, you now have fine-grained control over exactly who can manage which attributes and environments.

New HTML Script Tag SDK

There’s a brand new GrowthBook SDK available, perfect for all low-code websites (Webflow, Shopify, WordPress, and more). All you have to do is add a single `<script>` tag to your website, and you’ll get support for our Visual Editor, new URL Redirect tests (see above), and even Feature Flags! No configuration required (although there are lots of knobs and switches for those who want them). Check out the docs here!

New and Improved Webhooks with Slack/Discord Support

Revamped event webhooks with tag and environment filtering, plus Slack and Discord formatter support

We’ve revamped our event webhooks with more powerful filtering. Want to trigger a webhook for all feature flags that change in `production` with the tag “important”? You can do that! And to make integration easier, there’s a new Formatter option to automatically render the webhook in a format that Slack or Discord understands.

Now, with only a few clicks, you can enable fine-grained notifications directly in your messaging app. Support for MS Teams, as well as more event types, is coming soon! Read the docs here.

Other Improvements

  • New improved LaunchDarkly importer
  • Improved documentation on holdouts
  • 50+ other bug fixes and improvements
Code References
Product Updates
2.8
Feature Flags

Code References

Feb 27, 2024
x
min read

As companies grow, they often find themselves increasingly reliant on feature flags. While these are valuable tools, they sometimes linger in the codebase, leading to technical debt. It's important for developers to be aware of this and consider regular clean-ups. If not addressed in a timely manner, this can become a challenging issue, potentially affecting the engineering team's efficiency and effectiveness. Proactive management of these feature flags can help ensure smooth and sustainable operations.

GrowthBook Code References showing feature flag instances surfaced directly in the GrowthBook UI, with file locations

Code References is a new feature that allows teams to quickly see instances of feature flags being leveraged in their codebase. By scanning customers’ code bases via CLI tool and sending results to our application backend, GrowthBook can help surface valuable information early and direct devs to the exact lines of code that need addressing.

Let's take a high-level look at how Code References in GrowthBook works and how your company can get started with its GrowthBook account.

Overview

Code References requires implementing a step in your development CI workflow.

Since the task of searching for multiple feature flag keys across a potentially large codebase can be hard, we've provided a low-level Go utility meant to run quickly on your CI infrastructure that can produce results that your GrowthBook API can process.

This utility is called gb-find-code-refs, and is a fork of an existing open-source tool created by LaunchDarkly called ld-find-code-refs. Our changes have made the tool more general purpose, so you can use it for your own purposes in addition to using it with GrowthBook.

Using gb-find-code-refs, you can create a CI job that will fetch feature flags from GrowthBook, then scan your codebase for those flags using gb-find-code-refs, and finally submit those generated code references back to GrowthBook.

The diagram below illustrates the flow of information from gb-find-code-refs to GrowthBook.

Diagram showing how gb-find-code-refs scans a codebase for feature flag keys and sends results back to GrowthBook

Getting Started

To support Code References, we provide a streamlined, all-in-one GitHub Action that integrates easily with your existing GitHub workflow. For non-GitHub users, we provide all the tooling you'll need to set it up yourself.

-See the Getting Started section in our documentation for more information.

Conclusion

The significance of Code References underscores a broader goal of more sustainable and efficient development practices, focusing not just on introducing new features but also on the long-term health and scalability of the software.

For companies seeking to maintain a competitive edge in software development, adopting tools like Code References is essential. We hope you find Code References in GrowthBook a powerful tool in your toolkit for managing feature flags efficiently and effectively.

GrowthBook Version 2.8
Releases
Product Updates
2.8

GrowthBook Version 2.8

Feb 26, 2024
x
min read

We’re excited to announce the release of GrowthBook 2.8, with many highly requested features, including Prerequisite Flags, Code References, and a new “No Access” role. Full details are below.

Prerequisite Flags

GrowthBook Prerequisite Flags UI showing a feature flag dependent on a parent release flag

With Prerequisites, you can group together related feature flags and describe complex relationships between them. For example, you can have a bunch of features that all reference a `release-2.8` parent flag as a prerequisite. The child features will only be enabled when the parent flag evaluates to `true`.

Prerequisites can also be defined at the individual rule or experiment level. For example, only include users in your experiment who are getting assigned `b` of a separate `pricing-page-version` feature flag.

Top-level Prerequisites are available to all Pro and Enterprise customers. Rule-level and Experiment-level Prerequisites are only available to Enterprise customers. Simple cases where the prerequisite is deterministic (e.g., always true or always false) work in all SDKs with no configuration required. Advanced cases (e.g., prerequisite’s value depends on an experiment) are currently only supported in the latest JavaScript and React SDK versions. Read more about this in our docs.

Feature Code References

GrowthBook Feature Code References showing exact file locations and line numbers where a flag is used in the codebase

Back in GrowthBook 2.6, we added stale feature flag detection. Now, with Code Refs, we’re making that more actionable by showing you exactly where a feature is used in your application’s codebase.

Setup is easy — Just add a new job to your application’s CI pipeline that runs whenever code is pushed. This CI job fetches a list of feature flags from the GrowthBook REST API, scans your codebase for references, and sends the line numbers and surrounding code back to GrowthBook so it can be displayed in the UI.

If you use GitHub Actions, we provide a pre-built action you can install. We don’t have official integrations for other CI platforms at this time, but we have published a low-level CLI script and Docker image you can use to integrate manually. Code References are available to all Pro and Enterprise customers.

Official (Version Controlled) Metrics

GrowthBook Official Metrics showing version-controlled metric definitions synced from GitHub with a special badge

It’s not uncommon for large organizations to have hundreds or thousands of experimentation metrics. These are often a mix of “Official” metrics (widely used and vetted by the data team) and Ad-Hoc metrics (one-off, created by a product team, etc.). GrowthBook 2.8 introduces a brand new workflow that can make this distinction clearer and lead to more trustworthy experimentation.

In a nutshell, you can now store your Official Metric definitions as code in a version control system like GitHub, and changes can be automatically synced to your GrowthBook account. These Official Metrics are marked with a special badge and cannot be edited from within the GrowthBook UI. Whenever you see the Official badge, you can be confident that the definitions have gone through your version control and review process.

Check out our tutorial for storing Official Metrics in GitHub.

“No Access” Role

In previous versions, the lowest role you could grant someone in your GrowthBook account was “read-only”. This meant that every user you invited to your organization could, at the very least, see every feature flag, experiment, and other settings in your account.

GrowthBook 2.8 introduces a new role called “No Access”. As the name suggests, this is even lower than “readonly” and essentially grants no permissions whatsoever. When combined with project-scoped roles, this becomes super powerful. For example, grant someone the global “No Access” role, then override it with a more permissive role for specific projects. That user would not even be able to see the projects they don’t explicitly have access to, and all of the associated feature flags, experiments, etc. would be hidden from them.

This new role is available to all Enterprise customers. You can read more about this in our docs.

Webhooks for SDK Connections

Every SDK Connection in GrowthBook gets a dedicated API endpoint that returns a JSON payload of all included feature flags and experiments. You can now attach Webhooks to an SDK Connection to be alerted whenever this payload changes. For example, if you have a caching layer in front of GrowthBook, you can use a Webhook to invalidate your cache, resulting in faster feature releases to your users.

Other Improvements

  • New guide on integrating GrowthBook with WordPress sites — view it here
  • Metric lookback windows
  • Option to ignore zeros in percentile capping
  • JumpCloud SSO support
  • Plus, more than 50 documentation and bug fixes!
Changing Running Experiments Safely and Flexibly in GrowthBook
Experiments
Product Updates
2.7

Changing Running Experiments Safely and Flexibly in GrowthBook

Jan 19, 2024
x
min read

Running experiments can be a messy business, and you often want to make changes mid-experiment.

Making changes to running experiments in GrowthBook 2.7 is:

  • Safer than ever. With guided flows that ensure you don’t introduce bias when changing targeting or traffic rules, you are able to pick the least disruptive deployment strategy for your changes.
  • More flexible than ever. Sticky bucketing allows you to preserve user experience when making certain kinds of changes.

This article walks you through three kinds of changes you can make in GrowthBook, how our UI helps you navigate them, and how sticky bucketing can help you make changes safely.

Decreasing traffic to an experiment

In some cases, you may wish to decrease traffic to an experiment, either because you have enough users and want to ramp down new enrollment, or you want to begin restricting your experiment to some subset of users using more restrictive targeting attributes.

Imagine you add a targeting attribute to only target US-based users for your experiment. If you simply make the change and push it live, your experiment analysis will include users from before you pushed the change, who are no longer eligible for the experiment. They will still be in your experimental sample, which means their behavior may be affecting your experiment results, but they’ll no longer be receiving the same feature values as before since they are now excluded by the new targeting rules.

In this case, you have three options:

  • New phase, re-randomize users. This ensures your analysis is accurate, but it throws away your existing data and may change users’ experiences.
  • Same phase, apply changes to everyone. This lets you leverage existing data, with the understanding that some users may be in your experimental data but are no longer receiving the new experiment feature, thereby biasing your results.
  • In GrowthBook 2.7, we have added sticky bucketing, which, if enabled for your org, provides you with a third option: Same Phase, apply changes to new traffic only. This lets you (a) keep all of your data in your analysis, (b) ensure users do not change their originally assigned variation, and (c) avoid suffering any bias as users in each variation will continue receiving those feature values.

Here’s a Loom showing the flow in action

Increasing traffic to an experiment

Increasing traffic to an experiment is less problematic. You can safely make any of the following changes without starting a new phase:

  • Increasing the percent of traffic to an experiment
  • Removing restrictive targeting (i.e. making targeting more permissive)
  • Removing an experiment from a Namespace
GrowthBook UI showing safe options for increasing experiment traffic without starting a new phase

Restarting an experiment, or starting a new phase

Sometimes, we need to start an experiment over and throw away our old data, either due to a bug in implementation, a change in experiment design, or if we want to restart an A/A test to make sure that some imbalance was due to chance and not due to an issue with your GrowthBook implementation.

In these cases, starting a new phase of the experiment will require that you re-randomize in order to avoid carry over bias.

As with other changes, we provide this information directly in the app to ensure you can make fully informed choices.

GrowthBook UI showing re-randomization options when restarting an experiment or starting a new phase

Feel free to check our docs on sticky bucketing or making changes to running experiments for more detail.

New GrowthBook Version 2.7
Releases
Product Updates
2.7

New GrowthBook Version 2.7

Jan 19, 2024
x
min read

Sticky Bucketing, Reusable Targeting Conditions, Experiment Health Tab, Fact Table Optimizations, and more!

Happy New Year, everyone! We’re back at it with the release of GrowthBook 2.7. This version features sticky bucketing, reusable targeting conditions, an experiment health page, and more. Full details are below.

Sticky Bucketing

When running an experiment, you want to ensure users are not exposed to multiple variants (e.g., a single user seeing both A and B at different times). GrowthBook accomplishes this by using deterministic hashing, which works great most of the time. However, there are certain scenarios where this can break down. For example, if your experiment targets only German visitors, a user could switch from seeing variation B to A if they take a train to France.

Sticky Bucketing allows you to remember the first variation a user sees, so their experience remains consistent, even if something changes that would otherwise reassign them. This behavior is opt-in and is currently supported only in the latest versions of our JavaScript and React SDKs. Read more in our Sticky Bucketing Docs.

Reusable Targeting Conditions

GrowthBook Saved Groups showing reusable Condition Groups with complex targeting rules for features and experiments

We’ve expanded Saved Groups to support more advanced use cases. Instead of just a list of IDs, you can now create Condition Groups with arbitrarily complex targeting rules (e.g., UK Chrome users with a Pro subscription). Just like existing Saved Groups, these can be reused across multiple features and experiments, and updating the group will immediately update everywhere it is referenced. You can read more about this in our completely revamped Targeting Docs.

Fact Table Query Optimization

It’s common for an experiment to have multiple metrics coming from the same underlying database table — for example, Revenue per User and Orders per User, both driven by a Purchases table. In this latest release, we’re leveraging this relationship to drastically reduce the number of queries we need to run. For data warehouses with usage-based billing, such as BigQuery or Snowflake, this can lead to significant cost savings.

This optimization is available only to Enterprise customers using the new Fact Tables to define their metrics. Read more about this on our Blog.

Experiment Health Tab

GrowthBook Experiment Health Tab showing Sample Ratio Mismatch checks and dimension-based health monitoring

GrowthBook has always run data quality checks on your experiments to detect issues such as Sample Ratio Mismatch (SRM) and multiple exposures. In this latest release, these checks live under a new dedicated “Health” tab.

In addition, you can now pick a set of dimensions that will be checked automatically for every experiment you run. For example, if you pick a “browser” dimension, you will be able to easily detect SRM errors that only affect Safari. Read more on our Health Tab Docs.

Safely Update Live Experiments

GrowthBook Make Changes button guiding users through safe release strategies for live experiments

Changing a live experiment mid-flight is much more complicated than many people realize. If you aren’t careful, it can lead to carryover bias, SRM errors, or add significant noise to your results.

The safest approach is to basically start over — begin a new experiment phase, throw away the old data, and completely re-randomize all of your users. However, on lower-traffic sites, this can be prohibitively expensive.

Now, there’s a brand new “Make Changes” button at the top of running experiments. It will guide you through the process and recommend a safe release strategy that preserves past data whenever it’s safe to do so. If you choose a different release strategy, we will give you detailed warnings outlining the risks so you can make an informed decision.

This new flow is available to everyone, but Pro and Enterprise users also have access to additional release strategies powered by Sticky Bucketing (see above). Read more about this and see more examples on our Blog.

New Best Practices Guide

GrowthBook documentation Best Practices Guide covering account organization, experiment results, and self-hosting security

There is a new section in our documentation that includes guides on experimentation in general, as well as chapters on how to get the most out of GrowthBook. Among other things, it covers how to organize your account with projects and tags, how to understand and interpret experimental results, and checklists to ensure your self-hosted GrowthBook deployment is secure. You can find the guide here.

Contextual AI Bot

GrowthBook AI bot in the documentation site answering questions about GrowthBook features and systems

We’ve added an AI bot trained on GrowthBook's content and systems that can provide detailed answers to any questions you may have. You can try it out from the documentation site by clicking on the ‘ask AI’ box on the bottom right.

Other Improvements

  • Validate advanced targeting conditions before saving
  • Okta SCIM Improvements
  • More information on the compatibility of SDK versions
  • Display helpful query stats for BigQuery (bytes scanned, execution time, etc.)

Plus many more changes and bug fixes, which you can read about here: https://github.com/growthbook/growthbook/releases

Fact Table Query Optimization
Analytics
Experiments
2.7
Platform

Fact Table Query Optimization

Jan 18, 2024
x
min read

Back in October 2023, GrowthBook 2.5 added support for Fact Tables. This allowed you to write SQL once and reuse it for many related metrics. For example, an Orders fact table is being used for RevenueAverage Order Value, and Purchase Rate metrics.

However, behind-the-scenes, we were still treating these as independent metrics. We weren't taking advantage of the fact that they shared a common SQL definition.

With the release of GrowthBook 2.7 in January 2024, we added some huge SQL performance optimizations for our Enterprise customers to better take advantage of this shared nature. Read on for a deep dive on how we did this and the resulting gains we achieved.

Understanding Experiment Queries

The SQL that GrowthBook generates to analyze experiment results is complex, often exceeding 10 sub-queries (CTEs) and 200 lines total. Here's a simplified view of some of the steps involved:

  1. Get all users who were exposed to the experiment and which variation they saw
  2. Roughly filter the raw metric table (e.g., by the experiment date range)
  3. Join [1] and [2] to get all valid metric conversions we should include
  4. Aggregate [3] on a per-user level
  5. Aggregate [4] on a per-variation level

An important thing to note is that these queries are metric-specific. If you add 10 metrics to an experiment, we would generate and run 10 unique SQL queries.

Abstracting out the Common Parts

The queries above can be expensive, especially for companies with huge amounts of data. Reducing duplicate work can yield significant savings, especially for usage-based data warehouses like BigQuery or Snowflake.

It's pretty clear that step 1 above will always be identical for every metric in an experiment, whether or not they share the same Fact Table. That's why in GrowthBook 2.5, we released Pipeline Mode to take advantage of this by running that part of the query once and creating an optimized temp table that subsequent metric queries could use.

Now that we have Fact Tables, we can take this optimization even further. If multiple metrics from the same Fact Table are added to an experiment, it's pretty easy to see that step 2 will be the same*. What's not so obvious is that with a little tweaking, the remaining steps (3-5) can also become largely identical, paving the way for significant performance gains. We don't need temp tables; we can just run a single query that operates over multiple metrics at the same time.

Filters

I said above that step 2 (roughly filtering metrics) would be the same for all metrics in a fact table. This isn't 100% true because of Filters. We let you add arbitrary WHERE clauses to a metric to limit the rows that it includes. This lets you, for example, define both a Revenue and a Revenue from orders over $50 metric using the same Fact Table.

Filters are super powerful, but can pose a problem when combining metrics into a single query. If two metrics have different filters, we can't use a WHERE clause to filter because it will apply to both metrics.

The solution turns out to be easy - the all-powerful CASE WHEN statements. This lets each metric have its own mini WHERE clause without interfering with any other ones.

SELECT 
  amount as revenue,
  (CASE WHEN amount > 50 THEN amount ELSE NULL END) as revenue_over_50
FROM orders


Combining Metrics

This is a simplified view of the data at step 3, when we join the experiment data (variation) with the metric data (timestamp/value) based on userId.

userId variation timestamp value
123 control 2024-01-18T00:01:02 $100.54
456 variation 2024-01-18T00:01:03 $75.43
123 control 2024-01-18T11:15:12 $10.54

Notice how each event has its own row (UserId 123 purchased twice and has 2 rows). To support multiple metrics, we can't have a single value column anymore. We need each metric to have its own column. We can just prefix these with the metric number. m0_valuem1_value, etc.. m0 might represent the revenue, m1 might represent the number of items in the order.

We do the same thing in step 4, aggregating by userId. The structure is identical; there will just be one single row per userId, and we will sum each prefixed value column.

SELECT
  userId,
  variation,
  SUM(m0_value) as m0_value,
  SUM(m1_value) as m1_value
FROM
  step_3
GROUP BY userId, variation


Step 5, aggregating by variation, is similar, but a single number per metric is no longer enough. We need multiple data points to calculate standard deviations and other more advanced stats. So we end up with something like this:

SELECT
  variation,
  COUNT(*) as users,
  -- Multiple prefixed columns for each metric
  SUM(m0_value) as m0_sum,
  SUM(POWER(m0_value, 2)) as m0_sum_squares,
  ...
FROM
  step_4
GROUP BY variation


Pulling Them Apart Again

The results we get back from the data warehouse can be very wide, with potentially hundreds of columns (each metric needs between 3 and 10 columns, and there could be dozens in an experiment).

Our Python stats engine was written to process one metric at a time, so to avoid a massive refactor, we simply split this wide table back into many smaller datasets before processing. For example, to process m1, we would close the dataset, remove all of the m0_m2_, etc. columns, and rewrite the m1_ column names to remove the prefix. Now the result looks 100% identical to how it was before this optimization.

Ratio Metrics, CUPED, and More

All of the examples above show the simplest case. Combining metrics is even more powerful for advanced use cases.

Ratio metrics let you divide two other metrics, for example, Average Order Value (revenue/orders). Previously, we would have to select 2 metric tables, one for the numerator and one for the denominator, and join them together, which could get really expensive. Now, if both the numerator and denominator are in the same Fact Table, we can avoid this costly extra join entirely, making the query significantly faster and cheaper.

CUPED is an advanced variance reduction technique for experimentation. It involves looking at user behavior before they saw your experiment and using it to control for variance during the experiment. This makes metric queries more expensive since they now have to scan a wider date range. Because of this, users had to be really judicious about which metrics they enabled CUPED for. Now, since that expensive part of the query is shared across multiple metrics, it becomes feasible to run CUPED for everything without worrying about performance costs.

The story is similar for other advanced techniques, such as percentile capping (winsorization), time-series analyses, and dimension drill-downs.

The Gains

So, was all of this work worth it? Absolutely.

BigQuery is the most popular data warehouse for our customers. BigQuery charges based on the amount of data scanned and compute used, so any performance improvements translates directly to cost savings for our users.

For a large company adding 100 metrics to an experiment, it's not uncommon for those to be split between only 5-10 fact tables, lets say 10 to be conservative.

Selecting additional columns from the same source table is effectively free from a performance point of view, so going from 100 narrow queries to 10 wide ones is a 90% cost reduction! When you factor in the savings from Ratio Metrics and CUPED, it's not unheard of to see an additional 2X cost decrease!

We're super excited about these cost savings for our users. The biggest determinant of success with experimentation is velocity - the more experiments you run, the more wins you will get. So anything we can do to reduce the cost and barrier of running more tests is well worth the investment.

Stale Feature Flag Detection
Feature Flags
Product Updates
2.6

Stale Feature Flag Detection

Nov 29, 2023
x
min read

Feature flags are a fantastic way to reduce risk during deployments, but they can also be a source of technical debt. It’s common for engineers to forget to clean up feature flags from the code that aren’t being used anymore.

To help solve this problem, GrowthBook now alerts you when we detect a "stale" feature flag. These stale flags are good candidates for removal from your code, reducing your technical debt.

GrowthBook UI showing stale feature flag detection alert on an unused flag

We define a "stale" feature flag as one that has not been updated in the past two weeks and serves the same value to all users - that means there are no active experiments or force rules.

This detection isn't perfect. There are times when a long-lived feature flag makes sense - for example, a kill switch for when a 3rd party provider has an outage. For these features, you can easily dismiss the stale notification by clicking on the icon.

disable-stale-ff-02
Dismissing a stale feature flag notification in GrowthBook for a long-lived kill switch

We have a lot more planned in the future to help you stay on top of your technical debt - everything from integrating with GitHub Code References to detecting actual realtime usage from our SDKs. Stay tuned and let us know your thoughts!

Boost Confidence in Experiment Launches with GrowthBook's Pre-Launch Checklists
Experiments
Product Updates
2.6

Boost Confidence in Experiment Launches with GrowthBook's Pre-Launch Checklists

Nov 28, 2023
x
min read

At GrowthBook, our mission has always been to empower organizations to optimize their online experiments seamlessly. We are thrilled to announce our latest Feature-Customizable Pre-Launch Checklists.

Experimentation is at the heart of progress, but launching an experiment involves meticulous preparation and attention to detail. To streamline this process and ensure a smoother launch experience, we've introduced customizable checklists. Now, organizations leveraging GrowthBook's Enterprise Plan for their online experiments can tailor their pre-launch requirements precisely to their needs.

GrowthBook pre-launch checklist showing pre-defined options like requiring screenshots and experiment hypotheses

Why Customizable Checklists Matter

Launching an experiment isn't just about hitting the "Go" button. It's about confidence, precision, and ensuring that every aspect is in place for a successful rollout. With our new feature, organizations can choose from a range of pre-defined checklist options, such as requiring screenshots for each variation or mandating an experiment hypothesis.

But that's not all. We understand that each organization has its unique set of protocols and requirements. That's why GrowthBook now allows you to define your own checklist items. Need to alert the support team before an experiment goes live? No problem. Want to ensure that specific accessibility standards are met? You got it. The power is in your hands to create a tailored checklist that aligns with your processes.

GrowthBook custom pre-launch checklist builder showing organization-specific items added to the experiment launch flow

Enhanced Confidence in Experiment Launches

Launching an experiment can be nerve-wracking, especially when multiple stakeholders are involved. The customizable pre-launch checklist feature is designed to instill confidence. It acts as a safety net, ensuring that all necessary steps are completed before an experiment sees the light of day.

By providing this level of customization, GrowthBook aims to empower teams, mitigate risks, and ultimately drive more successful experiments. With a checklist that reflects your organization's unique requirements, you can proceed with the certainty that everything is in place for a successful experiment launch.

Getting Started

GrowthBook Experiment Settings showing where to configure and update the pre-launch checklist for all experiments

Customizing your pre-launch checklist in GrowthBook is easy. Simply navigate to your organization's Experiment Settings by selecting Settings > General from the Sidebar and scrolling down to Experiment Settings. Updating the checklist will affect any experiment that isn't already live.

When it comes time to launch an experiment, we'll calculate the completion of any pre-defined checklist option automatically, and give you an opportunity to manually complete any items we can't calculate automatically. All manually checked items are logged and available in your in-app audit log.

Ready to level up your experiment launches?

Experience the difference by trying out the customizable pre-launch checklist today.

Stay tuned for more updates as we continue to evolve and innovate to support your experimentation journey.

Ready to ship faster?

No credit card required. Start with feature flags, experimentation, and product analytics—free.

Simplified white illustration of a right angle ruler or carpenter's square tool.White checkmark symbol with a scattered pixelated effect around its edges on a transparent background.