How Fanatics made experimentation a strategic growth driver

Ten years ago, experimentation at Fanatics was a boutique CRO team running about 10 tests in a month, mostly conversion nudges. Today, they run close to 100 experiments a month across a multi-billion dollar e-commerce business with 60 million monthly visits and roughly 900 sites spanning every major sports league.
The impact is staggering. Experimentation consistently delivers about 8% of total annual growth at Fanatics. When you are a $3B business, 8% means hundreds of millions in growth, year after year.
Medha Umarji, VP of growth and experimentation at Fanatics, joined me on The Experimentation Edge to share how they built this engine. The story is not about a single tool or methodology. It is about culture, infrastructure, and a relentless commitment to learning from every test they run.
It starts at the top: a CEO who lives the data culture
The most common barrier to scaling experimentation is not technical. It is cultural. Teams struggle to get leadership buy-in, fight to justify test cycles, and spend more time selling the idea of testing than actually testing.
Fanatics does not have that problem. Their CEO, CPO, and the entire C-suite are actively engaged in experiment outcomes. But what makes Fanatics unusual is not just executive support. It is executive humility.
"Our CEO is so data-driven," Medha told me. "He literally consumes Excel spreadsheets. He does not shy away from data on the slides. He wants to see everything. And he will question everything." More importantly, he is openly willing to have his mind changed when the data contradicts his intuition.
That humility at the top cascades through the entire organization. When the CEO demonstrates that his intuition isn’t what matters, it’s the data. He is leading by example and sets the tone for everyone else. As Medha put it, "that humility is very much a part of the culture here."
The result is a fundamental shift in how teams approach new initiatives. The conversation at Fanatics moved from "why should we test this?" to "how do we test this?" When people default to asking how to measure something rather than whether they should measure it at all, you know experimentation has become strategic.
Building the learning infrastructure: the experimentation wiki
Culture gets you buy-in. Infrastructure turns that buy-in into compounding returns.
Early on, Fanatics had the same problem most experimentation teams face: test results lived in scattered PowerPoints and Word documents buried in shared drives. Finding what you had already learned was almost as hard as learning it in the first place.
So they built a wiki. Not a passive knowledge base, but a structured system where every test result lives alongside causal interpretations, screenshots, video recordings of the experience, and a detailed next-steps section.
The next-steps section is where the magic happens. Those recommendations auto-feed a JIRA backlog, creating a flywheel that keeps the experimentation engine running without waiting on new engineering work. The features are already built from previous tests. The team fills roadmap bandwidth gaps with iterations sourced directly from the wiki.
"It has sort of become our growth engine," Medha said. And she means that literally. The wiki does not just store results. It generates the next wave of experiments.
Meta-analysis: turning individual tests into institutional knowledge
A single test tells you what happened. A pattern across tests tells you why.
Medha pushes her team to build a meta-analysis table after running 3 or more tests on the same feature. Three messaging variants, 3 price points, 3 layout treatments, all summarized in one consumable view. Stakeholders can skip individual test briefs and see the pattern at a glance.
This practice converts isolated experiments into institutional knowledge. Instead of each test existing as a standalone result, the meta-analysis reveals which variables actually matter and which ones are noise. It is the difference between having a pile of test results and having a learning program.
The wiki is now also feeding AI tools like Glean and Claude, enabling self-serve analysis across the entire test history. Teams can ask questions about what Fanatics has already learned about a feature without reading every individual brief. The institutional knowledge is becoming searchable and composable.
Going beyond surface metrics: the ads experiment
Rigor at Fanatics goes deeper than statistical significance. Medha shared a story that illustrates exactly how.
The team tested removing ads from their product grid pages. Everyone wanted the ads gone. They cluttered the experience and made the site look less polished. The first test run returned a positive result at 95% statistical significance. Revenue was up. The team celebrated.
But at Fanatics, a top-line win is not the end of the analysis. It is the beginning. The team traces every positive result through a chain of micro-metrics: Did users scroll more? Were more products viewed? Did grid-to-cart conversion increase? If the top-line metric moved, something in the user behavior should have moved too.
In this case, the causal chain did not hold up. The micro-metrics were not moving in ways that explained the revenue lift. So they did something most teams would not do: they turned the test off and reran it.
The second run was flat. A textbook false positive, the 1-in-20 that a 95% confidence level tells you to expect. They had tested this same change 6 to 8 times over the years, and the result was always the same: flat.
The customer insight was counterintuitive. Even though the website looks cleaner without ads, customers simply gloss over them. Banner blindness is real. The ads were not degrading the measured experience.
The lesson is clear: a single metric is not enough to declare a win. If you cannot trace the causal chain from the user behavior change to the top-line result, you do not have a real finding. You have a number. Replication and guardrail metrics are how you tell the difference.
Risk avoidance: the underrated half of experimentation
Most teams measure experimentation success by the wins they ship. Medha argues the bigger value comes from the losses they prevent.
"Your odds of winning at roulette or poker are probably higher than your odds at winning at experimentation," she told me. And the data backs her up. Most experiments are inconclusive or negative. Only about 10-20% produce a measurable win.
But think about what that means in practice. If you are running 10 experiments and 2 are winners, a few are flat, and 1 or 2 are losers, simply catching the losers before they ship can double your net growth impact. The winners add value. Avoiding the losers preserves it.
This is where Fanatics built their "do no harm" framework. For changes that are hard to measure with traditional A/B testing metrics, like branding plays, customer sentiment improvements, or post-purchase experience changes, they use non-inferiority guardrails. As long as primary KPIs are not hurt, teams can ship changes supported by user research rather than conversion lifts.
The danger is assuming a change will "do no harm" without testing it. Humans are terrible at guessing what will have an impact and what will not. A change that seems obviously harmless can quietly erode conversion, and you will never know because your overall business growth masks the damage. These are the silent killers: bad changes hidden by a growing top line. You never know what your growth could be if you were catching every loser.
Medha says quantifying avoided risk is "the next frontier" for her team. It is an under-quantified, under-appreciated dimension of experimentation value, and one that more teams should be tracking.
What other teams can learn from Fanatics
Fanatics did not build a world-class experimentation program overnight. It took a decade. But the principles behind their success are applicable at any scale.
Start with leadership buy-in rooted in humility, not just support. Build infrastructure that turns test results into a self-sustaining backlog. Invest in meta-analysis so individual tests compound into institutional knowledge. Go beyond surface metrics and trace the causal chain before declaring wins. And measure the value of risk avoidance, not just the value of wins shipped.
Medha's advice for teams just getting started: find the teams already inclined toward data, build wins with them, and keep the barrier low. Do not gate-keep rigor so tightly that nobody adopts. You can always raise the bar once the culture is in place.
Listen to my conversation with Medha on The Experimentation Edge podcast. And if you are building your own experimentation program, GrowthBook gives you the platform to run, analyze, and learn from every experiment at scale.
Related articles
Ready to ship faster?
No credit card required. Start with feature flags, experimentation, and product analytics — free.


.png)

.avif)