Experiments
Feature Flags

Best open source A/B testing and experimentation tools

A graphic of a bar chart with an arrow pointing upward.

Open source A/B testing tools are attractive for a simple reason: experimentation becomes core infrastructure once a product team starts using it seriously.

If experiments decide which onboarding flow ships, which pricing page converts, which recommendation model runs, or which checkout path becomes default, teams need more than a hosted script tag. They need control over assignment, metrics, exposure logging, feature flags, data flow, privacy, and long-term cost.

Open source can help. It lets engineering teams inspect the code, self-host when needed, keep data closer to their own infrastructure, and avoid paying per visitor or per event before the experimentation program has proven its value.

But open source A/B testing is also uneven. Some projects are full experimentation platforms. Some are feature flag systems that can assign variants but expect you to bring your own analytics. Some are statistical libraries. Some are legacy frameworks that are useful to study but risky to adopt as production infrastructure in 2026.

Hacker News discussions about open-source experimentation often call out this reality directly: older A/B testing projects can become abandoned, and the statistical analysis layer is usually the hardest part. Reddit threads asking for free or open-source A/B testing tools tend to mention GrowthBook and PostHog first, then branch into Firebase, feature flags, or DIY systems depending on the use case.

This guide separates production-ready platforms from narrower tools so you can choose based on the job you actually need done.

Quick comparison

ToolBest forWhat is open sourceMain watchout
GrowthBookFull-featured open-source experimentation and feature flagsOpen-core platform, bulk of repo under MIT licenseBest when teams want a real experimentation workflow, not only a script
PostHogProduct analytics with experiments and flagsOpen-source developer platform with self-hosting optionsUsage and product breadth can complicate ownership
UnleashOpen-source feature flags that can power experimentsFeature management platformExperiment analysis usually needs another analytics layer
FlagsmithOpen-source flags and multivariate rolloutsFeature flag platformA/B analysis depends on your analytics stack
GO Feature FlagLightweight OpenFeature-native flaggingOpen-source feature flag systemMore flag infrastructure than experiment platform
MojitoSource-controlled web split testingModular split-testing stackNarrower, more DIY, and mostly web-focused
PlanOut and PlanOut4JExperimental design and randomizationOpen-source framework and portsNot a modern end-to-end platform
UpGradeEducation technology experimentsOpen-source edtech experimentation platformSpecialized for learning applications
TW ExperimentationStatistical analysis and decision supportOpen-source Python libraryAnalysis library, not assignment or flag infrastructure
Wasabi and SixpackLegacy experimentation architecture referencesOpen-source projectsUseful to study, risky to adopt without active ownership

How to evaluate open source A/B testing tools

Open source should not be treated as a synonym for free. The license may be free, but your team still pays in setup, maintenance, infrastructure, upgrades, metric modeling, debugging, and support.

Start with the workflow, not the license

An experimentation system has several jobs:

  • Assign users or accounts to variants.
  • Keep assignment stable across sessions and devices when required.
  • Expose the right variant through an SDK, feature flag, API, or client library.
  • Log exposures at the right moment.
  • Connect exposures to conversion, revenue, retention, activation, guardrail, or infrastructure metrics.
  • Detect assignment problems such as sample ratio mismatch.
  • Support analysis without encouraging peeking or false-positive inflation.
  • Help teams decide whether to roll out, roll back, iterate, or stop.
  • Help engineers remove stale flags and dead code after the decision.

Some open-source projects cover the full workflow. Many cover one or two pieces. That is not a flaw if you know what you are buying into. A randomization library can be enough for a data science team building an internal platform. It is not enough for a product team that needs product managers, engineers, and analysts to run experiments every week.

Check whether it includes feature flags

Modern product experiments are usually delivered through feature flags. A flag can target internal users, start a beta, roll out to 5 percent of traffic, assign users to A/B variants, and roll the winning variation forward.

If the tool does not have feature flags, you may need a separate flag system. If the flag system does not have experiment analysis, you may need an analytics layer. The more pieces you assemble, the more important identity, exposure logging, and metric consistency become.

Inspect the data path

Open source is often chosen because teams care about data control. That makes the data path critical.

Ask where raw events live, where experiment assignments live, how metrics are defined, and whether analysis uses your warehouse or the tool's own event store. If product, finance, and data teams already trust warehouse tables, a warehouse-native tool can reduce duplication and debate. If your team wants an all-in-one product analytics suite, a tool with its own event store may be easier to start.

Count operational cost honestly

Self-hosting changes the bill. It does not make experimentation free.

Someone has to deploy the service, monitor it, back it up, upgrade it, secure it, rotate keys, debug SDK issues, handle incidents, and explain what happens when an experiment result looks strange. If the tool sits on the critical path for feature rollout, it also becomes production infrastructure.

That cost is still often worth it. A self-hosted experimentation platform can be far cheaper than a high-traffic SaaS contract, especially when experiment volume grows. It can also be necessary when data residency, privacy, or procurement requirements make third-party event ingestion difficult.

The mistake is pretending infrastructure cost does not exist. Before picking a tool, decide who owns:

  • Deployment and upgrades.
  • SDK key management.
  • Experiment data retention.
  • Incident response.
  • Metric definitions.
  • Statistical method choices.
  • User permissions.
  • Audit logs and access reviews.
  • Flag and experiment cleanup.

If those owners are unclear, a hosted plan or managed support tier may be cheaper than an unsupported self-hosted deployment.

Check security and privacy fit early

Open source gives teams more control over data flow, but it does not automatically solve security or privacy requirements.

For any finalist, review authentication, authorization, SSO support, audit logging, project and environment permissions, network architecture, secret storage, SDK key exposure, and whether client-side flags reveal targeting logic that should stay private. Also check how the tool handles personally identifiable information. Some teams can avoid sending sensitive user attributes by evaluating flags server-side or hashing identifiers. Others need stronger governance.

Data privacy also affects analytics. If experiment analysis requires sending event streams to a third-party system, the open-source flagging layer may not be enough. If analysis runs against your warehouse, confirm that the query layer respects data access controls and that experiment users can only see the metrics they are allowed to see.

Make non-engineering participation explicit

Many open-source tools are developer-oriented. That is often a strength. Developers can inspect the code, use Git workflows, and integrate deeply with the application.

But experimentation is cross-functional. Product managers need to define hypotheses and launch criteria. Data scientists need to review metrics and power. Designers and researchers may need qualitative context. Support and customer success teams may need to know which customers saw which treatment.

If the tool has no usable UI, weak documentation, or no workflow for experiment notes, screenshots, decisions, and status, engineering will become the bottleneck for every test. That may be acceptable for an internal platform team. It is usually painful for a product organization trying to scale experimentation.

When evaluating open source tools, include the people who will request, review, analyze, and decide experiments. A tool that developers love but product teams cannot use will slow the program down.

Know when open source is the wrong first move

Open source is not automatically the right path for every team.

If your company has no engineering capacity for deployment, no clear owner for experiment data, and no one responsible for statistical quality, a managed commercial platform may produce better decisions sooner. The same is true when the experimentation program is led by marketers who need visual editing, agency workflows, and campaign personalization more than SDK-level control.

Open source is strongest when the team has technical ownership and wants control. It is weaker when the organization is trying to outsource all experimentation process, governance, and analysis to a vendor.

There is also a hybrid path. Teams can start with a hosted open-source product, then self-host later if data, compliance, or cost requirements change. GrowthBook and PostHog are good examples of this pattern because both offer hosted products while keeping open-source roots. That path can reduce early operational work without locking the team into a closed experimentation system from day one.

Separate maintained platforms from useful old projects

Open-source A/B testing has a long history. PlanOut, Wasabi, Sixpack, Proctor, AlephBet, and many older libraries influenced how teams think about experimentation. Some are still useful for learning. Some may still work if a team owns them deeply.

For most teams, though, maintenance matters. Look for current documentation, recent releases or commits, active issue handling, security guidance, supported SDKs, deployment docs, and a clear license. A dormant project can be a fine reference. It should not silently become your production experimentation platform.

The practical test is simple: would you be comfortable letting this tool decide a revenue-impacting rollout next quarter? If the answer is no, either narrow its role to a library or reference architecture, or choose a maintained platform that covers the operational pieces your team does not want to build.

1. GrowthBook

GrowthBook is the best open source A/B testing and experimentation tool for technical product teams that want feature flags, experiment analysis, product analytics, and warehouse-native metrics in one platform.

Best for

GrowthBook fits SaaS teams, data-mature product teams, and engineering-led organizations that want an experimentation platform close to their own data and infrastructure.

The GrowthBook GitHub repository describes GrowthBook as open-source feature flags, experimentation, and product analytics. The repository notes that GrowthBook is open core, with the bulk of the code under the permissive MIT license and some enterprise directories under a separate commercial license. That distinction matters: buyers should understand the license boundary, but the open-source core is much more substantial than a toy SDK.

Key strengths

GrowthBook covers the full experimentation loop. It supports feature flags with advanced targeting, gradual rollouts, and experiments. It also includes SDKs across common web, server, mobile, and edge environments, plus a stats engine with methods such as CUPED, sequential testing, Bayesian analysis, post-stratification, bandits, and sample-ratio checks according to the repository.

The feature flag docs explain how flags can target users, gradually roll out changes, and run A/B tests on client or server. The experiment docs show that GrowthBook treats flags and experiments as connected workflows, not separate products.

The warehouse-native model is the strategic differentiator. Instead of forcing teams to copy all experiment data into a vendor-controlled analytics store, GrowthBook can query the data sources teams already trust. That is valuable when activation, retention, revenue, expansion, or cost metrics already live in the warehouse.

Watchouts

GrowthBook is most valuable when teams want a real experimentation program. If your need is a tiny JavaScript library for one landing-page test, GrowthBook may feel like more platform than you need.

Teams should also check plan boundaries for advanced governance, security, SSO, permissioning, and enterprise support. Open-source control reduces vendor dependency, but production ownership still requires engineering time.

Pricing and implementation notes

Current GrowthBook pricing lists a free Cloud Starter plan, a per-seat Pro plan, enterprise options, and a free self-hosted open-source option with unlimited feature flags, experiments, and traffic.

For a proof of concept, connect GrowthBook to a real warehouse metric, implement one flag-based experiment, inspect the assignment and exposure data, and walk through the result with product, engineering, and data stakeholders. If those groups can agree on what happened without reconciling three systems, GrowthBook is doing the hard work.

2. PostHog

PostHog is a strong open-source option when A/B testing should live inside a broader product analytics suite.

Best for

PostHog fits startups and product teams that want analytics, feature flags, experiments, session replay, surveys, and debugging tools together. It is often considered when teams want an open-source alternative to stitching analytics, flags, and experimentation across several SaaS products.

The PostHog GitHub repository describes an all-in-one developer platform with feature flags, experiments, analytics, session replay, surveys, and more. The feature flags docs describe flags as the foundation for safe rollouts, A/B testing, and remote configuration.

Key strengths

PostHog's advantage is breadth. An experiment can connect to events, funnels, cohorts, recordings, and product analytics. That is useful for teams that want to investigate behavior around an experiment rather than only read a statistical result.

The experiment creation docs show a guided experiment workflow with feature flag keys, variants, rollout and release conditions, inclusion criteria, and metrics. This is much more useful for product teams than a library that only randomizes users.

PostHog is also easy to pilot because the hosted product has free allowances and the docs are developer-friendly.

Watchouts

PostHog's breadth can create ownership and cost questions. If feature flags, analytics, replays, surveys, and experiments all grow together, teams should understand which usage meters apply and who owns the data model.

PostHog is also not warehouse-native in the same way GrowthBook is. If your trusted metrics already live in Snowflake, BigQuery, Redshift, Databricks, or another warehouse, decide whether PostHog should become another analytics source or whether experiment analysis should query your existing metrics.

Pricing and implementation notes

Current PostHog pricing lists free monthly allowances and usage-based pricing across several products. Run the proof of concept with one event-driven experiment, one feature flag, one funnel, and one replay review. Then model cost under production event and flag-request volume.

3. Unleash

Unleash is a mature open-source feature management platform that can be used to run A/B tests through feature flag variants.

Best for

Unleash fits teams that want open-source feature flags, self-hosting, activation strategies, variants, lifecycle management, and enterprise feature governance.

The Unleash feature flag docs describe variants as a way to determine which version of a feature a user sees, including for A/B testing. The A/B testing guide walks through defining variants, targeting users, managing cross-session visibility, connecting impression data to conversion outcomes, and rolling out a winning variant.

Key strengths

Unleash is strong at the flagging layer. It gives teams a real feature-management control plane with strategies, variants, SDKs, environments, lifecycle concepts, and self-hosting.

That makes it useful when the primary need is release control and variant assignment. If your data team already has an analysis pipeline, Unleash can be the assignment and rollout layer that feeds it.

Watchouts

Unleash is not primarily an A/B testing analytics platform. Its A/B testing workflow relies on connecting feature flag impression data to conversion outcomes. That can work well, but your team must design the measurement layer.

If you want built-in experiment analysis, warehouse-native metrics, and product analytics in the same system, GrowthBook will usually be a better fit.

Pricing and implementation notes

Current Unleash pricing includes self-hosted and cloud options, with paid plans for enterprise use. For open-source evaluation, test not only flag variants but also impression events, metric join keys, assignment stability, and stale-flag cleanup.

4. Flagsmith

Flagsmith is an open-source feature flagging platform with multivariate flags that can support A/B/n testing.

Best for

Flagsmith fits teams that want open-source feature flags, remote config, segments, identities, deployment flexibility, and a hosted or self-hosted path.

The Flagsmith open-source page explains open-source feature flags as publicly available, inspectable code that teams can self-host. The feature flag docs describe boolean and multivariate flags.

Key strengths

Flagsmith is practical flag infrastructure. Teams can create flags, use segments, target identities, manage environments, and use multivariate flags for A/B/n-style assignment. The core flag management docs describe multivariate flags with percentage weightings and per-identity bucketing.

It is a good fit when you want open-source feature control but plan to use an existing analytics stack for analysis. Flagsmith also has a clear lifecycle philosophy: create the flag, add it to code, control behavior, remove the code reference, deploy, and remove the flag from Flagsmith.

Watchouts

Flagsmith can assign variants, but it is not a full experiment-analysis platform in the way GrowthBook is. Teams need to connect flag data to analytics events and decide how statistical analysis will be done.

The free hosted tier can be useful for evaluation, but collaboration and production scale require checking paid limits carefully.

Pricing and implementation notes

Current Flagsmith pricing includes a free hosted plan and paid tiers based on requests and team needs. For evaluation, run a multivariate flag, export or integrate assignment data, and confirm your analytics stack can produce the experiment readout you need.

5. GO Feature Flag

GO Feature Flag is an open-source, OpenFeature-native feature flag system that can support experimentation rollouts.

Best for

GO Feature Flag fits engineering teams that want lightweight, infrastructure-friendly feature flags without a database-heavy control plane.

The GO Feature Flag website describes the project as an open-source, OpenFeature-native feature flag management system that runs on infrastructure you already have. The experimentation rollout docs describe testing different versions of a feature for a limited time before deciding whether to roll out broadly.

Key strengths

GO Feature Flag is appealing because it is small and standards-oriented. Teams that care about OpenFeature compatibility, Git or file-based configuration, and low operational overhead may prefer it over a full platform.

It can be a useful assignment layer for teams building their own experimentation system or connecting to a separate analytics product.

Watchouts

GO Feature Flag is feature flag infrastructure, not a full A/B testing platform. It can help expose variants, but your team still needs metric definitions, exposure logging discipline, analysis, reporting, governance, and cleanup workflow.

That makes it a strong technical component for platform teams, but a weaker standalone choice for product teams that need a turnkey experimentation workflow.

Pricing and implementation notes

Use GO Feature Flag when your team wants open-source flagging with OpenFeature alignment and is comfortable owning the analysis layer. In evaluation, test config storage, SDK compatibility, assignment stability, event export, and how non-engineers will participate in experiment decisions.

6. Mojito

Mojito is a source-controlled split-testing framework for teams that want web experiments managed through Git and CI.

Best for

Mojito fits technically comfortable web teams that want experiments defined in source control rather than a proprietary visual editor.

The Mojito GitHub repository describes it as a modular, source-controlled split-testing framework for building, launching, and analyzing experiments via Git and CI. Its documentation says Mojito is composed of JS delivery, Snowplow storage, and R analytics modules.

Key strengths

Mojito is opinionated in a useful way. It treats experimentation as code. The framework overview explains the modular approach: a front-end library for running experiments, data models and events for tracking, and analytics templates for reporting.

That can be attractive for teams that dislike opaque visual-editor changes and want code review, version control, and CI/CD around experiment changes.

Watchouts

Mojito is narrower than GrowthBook or PostHog. It is more of a modular split-testing stack than a full cross-platform experimentation system with feature flags, warehouse-native metrics, product analytics, permissions, and broad SDK coverage.

It is also a better fit for web experimentation than complex server-side, mobile, or multi-service feature experiments.

Pricing and implementation notes

Use Mojito when the team wants source-controlled web experiments and has engineering capacity to own the stack. In evaluation, test experiment definition, deployment through CI, tracking events, analytics templates, and whether product managers can participate without depending on one developer.

7. PlanOut and PlanOut4J

PlanOut is an influential open-source framework for experimental design and assignment, while PlanOut4J is a Java implementation inspired by it.

Best for

PlanOut and PlanOut4J fit teams that are studying experiment assignment systems or building internal experimentation infrastructure, not teams looking for a complete modern platform.

InfoQ's coverage of Facebook open-sourcing PlanOut describes it as a language for online field experiments supporting A/B tests, factorial designs, and more complex experiments. The PlanOut4J repository describes a Java implementation designed to conduct experiments on the web at scale.

Key strengths

PlanOut's contribution is conceptual clarity. It separates experimental design from application code and gives teams a way to express randomization, parameters, and assignment logic explicitly.

For data scientists and platform engineers, this is valuable. If you are designing an in-house experimentation platform, reading PlanOut-style systems helps clarify the assignment layer.

Watchouts

PlanOut is not a maintained end-to-end product for most SaaS teams. It does not give you modern dashboards, feature flag lifecycle management, warehouse-native analysis, guardrail metrics, product analytics, permissions, SDK breadth, or experiment governance.

Use it as a reference or building block, not as the whole program.

Pricing and implementation notes

PlanOut-style systems are free in the license sense, but expensive in engineering ownership. If you adopt one, make sure your team owns assignment, exposure logging, data quality checks, and statistical analysis intentionally.

8. UpGrade

UpGrade is an open-source experimentation platform designed for education technology applications.

Best for

UpGrade fits researchers, product teams, and engineering teams working in edtech environments where learning applications need controlled experiments.

The UpGrade GitHub repository describes it as an open-source platform for large-scale A/B testing in edtech web applications. That specialization is important. It is not trying to be a generic conversion optimization tool.

Key strengths

UpGrade is valuable because education experiments often have different needs than standard SaaS experiments. A learning platform may need experiments by classroom, curriculum, course, assignment, student cohort, or instructional condition. Research and ethics workflows may also matter more than in a typical product-growth test.

For teams in this domain, a specialized open-source platform can be more useful than adapting a generic landing-page testing tool.

Watchouts

UpGrade is not the default choice for general SaaS, ecommerce, or developer-tool experimentation. Teams outside edtech should treat it as a specialized platform and evaluate whether its assumptions match their product model.

Pricing and implementation notes

Use UpGrade if your experimentation problem is education-specific. In evaluation, test assignment units, consent or research requirements, teacher and student context, metric export, and integration with your application architecture.

9. TW Experimentation

TW Experimentation is an open-source library for experiment design, data checks, statistical tests, and decision support.

Best for

TW Experimentation fits data teams that want an open-source analysis library for A/B testing and causal inference workflows.

The TW Experimentation repository describes it as a library to design experiments, check data, run statistical tests, and make decisions. That is a different role from GrowthBook, PostHog, Unleash, or Flagsmith.

Key strengths

The strength is analysis. Data scientists can use libraries like this to standardize notebooks, calculations, data quality checks, and decision frameworks around experiments.

This can be useful if your organization already has its own assignment system and event pipeline, but wants reusable statistical tooling.

Watchouts

TW Experimentation does not replace an experimentation platform. It does not provide feature flags, SDKs, targeting, exposure logging, product-facing dashboards, permissions, lifecycle management, or rollout controls.

For most product teams, it is a complement to a platform, not the platform itself.

Pricing and implementation notes

Use TW Experimentation when the analysis layer is the gap. Pair it with a clear assignment and exposure system, and make sure experiment results can be communicated outside notebooks.

Older open-source projects worth studying

Several older A/B testing projects still come up in searches and community threads. They are useful references, but most teams should be cautious about adopting them as production systems without dedicated maintenance ownership.

Wasabi

Intuit's Wasabi repository describes an API-driven A/B testing service for web, mobile, and desktop, but it also states that the project is no longer under active development or support. That makes it a useful architecture reference, not a default 2026 choice.

Sixpack

Sixpack is a language-agnostic A/B testing framework that exposes a simple API for client libraries. It is historically interesting because it focuses on cross-language assignment through a central service. Teams should inspect maintenance activity before using it in production.

AlephBet

AlephBet is a pure-JavaScript A/B and multivariate testing framework. It is useful for understanding lightweight browser-side experimentation, but modern teams should be cautious about relying on localStorage-oriented client-side assignment for important product decisions.

Proctor and other internal-platform frameworks

GitHub's A/B testing topic lists projects such as Indeed's Proctor and other organization-specific frameworks. These can be educational if you are building in-house experimentation infrastructure, but they usually require a platform team to turn them into a complete workflow.

Which open source approach should you choose?

The right choice depends on what "open source A/B testing" means inside your company.

If you want a full experimentation platform, start with GrowthBook. It covers flags, experiments, metrics, analysis, product analytics, cloud hosting, and self-hosting. It is the clearest production-ready choice for teams that want open-source control without building the whole stack.

If you want analytics plus experiments in one developer suite, test PostHog. It is especially useful when funnels, events, recordings, and feature flags should live together.

If you want open-source feature flag infrastructure and will bring your own analysis, test Unleash, Flagsmith, or GO Feature Flag. These are good choices when release control is the core problem and experimentation analysis is owned elsewhere.

If you want web split testing as code, test Mojito. If you are building or studying experimentation infrastructure, read PlanOut, PlanOut4J, Wasabi, Sixpack, AlephBet, and Proctor. If you need edtech experimentation, evaluate UpGrade.

Proof-of-concept checklist

Run the same proof of concept for every serious finalist:

  • Confirm the license and open-core boundaries.
  • Deploy the tool in the same environment model you would use in production.
  • Create one feature flag or experiment assignment.
  • Assign users or accounts using the same identity key your product uses.
  • Confirm assignment stability across sessions.
  • Log exposure only when the user actually sees the changed experience.
  • Connect the exposure to a primary metric and one guardrail metric.
  • Run a sample-ratio or assignment sanity check.
  • Compare the result to your trusted analytics or warehouse reporting.
  • Roll out, roll back, or stop the experiment.
  • Add an owner and cleanup date.
  • Remove the test code after the decision.
  • Estimate infrastructure and maintenance cost, not only license cost.

Open source gives you control. The proof of concept should prove that your team can use that control responsibly.

The practical recommendation

GrowthBook is the best open source A/B testing and experimentation tool for most technical product teams.

PostHog is a strong choice when experimentation belongs inside a product analytics suite. Unleash, Flagsmith, and GO Feature Flag are strong when open-source feature management is the main need and analysis can live elsewhere. Mojito is useful for source-controlled web split testing. PlanOut-style frameworks, TW Experimentation, UpGrade, and older systems are valuable in specific contexts.

The reason GrowthBook is the default recommendation is simple: it solves more of the real experimentation workflow while preserving the reasons teams choose open source in the first place. You get feature flags, A/B testing, product analytics, warehouse-native metrics, self-hosting, and a cloud path without pretending experimentation is only a randomization function.

Table of Contents

Related Articles

See All Articles
Experiments

Type I vs Type II error: key differences with examples

Jun 17, 2026
x
min read
Experiments

Type I error explained: definition, examples, and how to reduce it

Jun 16, 2026
x
min read
Experiments

Multivariate testing vs A/B testing: key differences explained

Jun 16, 2026
x
min read

Ready to ship faster?

No credit card required. Start with feature flags, experimentation, and product analytics—free.

Simplified white illustration of a right angle ruler or carpenter's square tool.White checkmark symbol with a scattered pixelated effect around its edges on a transparent background.