Experiments

Understanding false causality and examples

A graphic of a bar chart with an arrow pointing upward.

Your dashboard shows a metric moving in the right direction right after a feature launch.

The story writes itself — and that's exactly the problem. The causal narrative feels earned because the data is real, but data showing that two things happened together says nothing about whether one caused the other. Teams that skip that distinction don't just get wrong answers — they ship the wrong features, fund the wrong campaigns, and optimize for metrics that have no real connection to the outcomes they care about.

This article is for engineers, product managers, and data analysts who work with experiment results and analytics data and want to reason about it more carefully. Whether you're new to statistical thinking or just looking to sharpen how you evaluate causal claims, here's what you'll learn:

  • What false causality is and why human cognition makes it the default reasoning error, not the exception
  • The specific subtypes — post hoc, cum hoc, the Texas Sharpshooter fallacy, regression to the mean, and more — and how each one produces a different flavor of wrong conclusion
  • Real examples, from ice cream and drowning rates to product analytics and the UC Berkeley admissions case, that show how the same error scales from obvious to invisible
  • How confounding variables create spurious correlations and why Simpson's Paradox can make aggregate data point in the exact wrong direction
  • Practical safeguards — A/B testing, pre-registration, multiple testing corrections, and anti-peeking discipline — that help teams build systems that catch these errors before they become decisions

The article moves from concept to taxonomy to examples to methodology. By the end, you'll have a working vocabulary for spotting false causal reasoning in the wild and a concrete set of practices for avoiding it in your own work.

False causality is an informal fallacy with expensive consequences

Every data team has been there: a metric moves, someone finds a correlated variable, and within minutes a causal story has taken shape. The feature launch caused retention to spike. The email campaign caused the revenue jump. The new onboarding flow caused the drop in churn.

These conclusions feel earned — they're backed by data, after all. But the data shows association, not causation, and conflating the two is one of the most consequential reasoning errors in analytical work. That error has a name: false causality.

This article is designed to be genuinely useful to practitioners who work with data every day. Here's what we'll cover:

  • What false causality is and why it's formally classified as a logical fallacy
  • The main subtypes: post hoc, cum hoc, Texas Sharpshooter, Simpson's Paradox, and more
  • Real-world examples from business and product analytics
  • How confounding variables produce spurious correlations
  • Why controlled experiments work — and how they can still fail
  • The organizational habits that make causal discipline stick

Non causa pro causa: the formal definition

False causality — also called the false cause fallacy — is formally classified in logic as non causa pro causa, Latin for "not the cause for the cause." It belongs to the category of informal fallacies, specifically fallacies of presumption: arguments that presume something to be true that has not actually been established.

The logical pattern is straightforward: A is regularly associated with B; therefore, A causes B. The inference fails because correlation establishes association — it tells you that two variables move together — but it says nothing about mechanism, direction, or exclusivity.

Something else entirely might be driving both A and B. A might actually be caused by B, not the other way around. Or the relationship might be coincidental, a statistical artifact with no meaningful structure at all.

False causality is also an umbrella term. Post hoc ergo propter hoc, cum hoc ergo propter hoc, the third-cause fallacy, and several other named variants are all subtypes of the same underlying error. What they share is the unwarranted leap from "these things are associated" to "one of them explains the other."

Why the human brain defaults to this error

The reason false causality is so pervasive isn't carelessness or low analytical skill. It's cognition working exactly as designed.

Humans are pattern-seeking by nature. When we observe two events occurring together — especially in sequence — we instinctively reach for a causal explanation. As one framing puts it: "We naturally look for explanations when we notice patterns. This tendency can lead us straight into faulty causal reasoning."

The same cognitive shortcut that helped early humans survive (if predators appeared near the river, avoid the river) becomes a systematic liability when applied to dashboards and experiment readouts.

This matters because it means false causality isn't an outlier mistake made by unsophisticated analysts. It is the default output of human cognition applied to correlated data without rigorous controls. Recognizing the error requires deliberate effort precisely because the incorrect inference feels natural and even obvious.

Why it costs teams real money and strategic direction

In product, marketing, and data work, false causality doesn't just produce wrong answers in a vacuum — it produces wrong decisions that compound over time. Teams ship features that didn't actually drive the outcomes attributed to them. Marketing budgets get reallocated toward campaigns that happened to coincide with seasonal trends. Optimization efforts get directed at proxy metrics that have no genuine causal link to the goals they're supposed to represent.

That last failure mode has a specific name: Goodhart's Law. When a proxy metric is not strongly causally linked to the target metric, pushing hard on the proxy may have no effect on the actual goal — or may actively break the correlation that made the proxy seem useful in the first place.

The causal assumption embedded in metric selection turns out to be false, and the entire optimization strategy built on top of it collapses.

False causality also surfaces in subtler ways. Simpson's Paradox — where an apparent trend in aggregate data reverses entirely once a confounding variable is accounted for — is a documented, real-world example of how false causal conclusions can emerge from legitimate data analyzed without sufficient rigor.

The UC Berkeley 1973 admissions case is the canonical illustration, and it's worth noting that the data wasn't fabricated or cherry-picked; the false causal conclusion arose from a failure to account for a hidden variable.

Understanding false causality starts here, with the definition: an informal logical fallacy in which association is mistaken for causation. Everything else — the subtypes, the examples, the experimental safeguards — is built on this foundation.

The main types of false causality fallacies: post hoc, cum hoc, and more

False causality isn't a single mistake — it's a family of related errors, each with its own mechanism for producing a faulty causal conclusion. What they share is "the illogical assumption that a specific factor caused a specific effect."

But the way each subtype arrives at that assumption differs, and recognizing those differences is what allows you to catch the error in a dashboard, a business review, or a team meeting. Here's a working taxonomy.

Post hoc ergo propter hoc: mistaking sequence for cause

The Latin phrase translates roughly to "after this, therefore because of this." The mechanism is simple: A happens before B, so A must have caused B. The error is treating temporal sequence as causal evidence.

The classic illustration is superstitious reasoning: "Every time I wear this jersey, my team loses — it must be unlucky." The jersey preceded the loss, so it gets blamed. In business contexts, this looks like: "We launched a new homepage last quarter, and revenue went up — the redesign drove growth." Maybe it did. But the fact that one thing preceded another tells you nothing about whether a causal relationship exists.

Cum hoc ergo propter hoc: simultaneous correlation

This variant drops the temporal element entirely. Two things move together — they rise and fall in tandem — so one must be causing the other. The canonical example is ice cream sales and drowning incidents. Both spike in summer and drop in winter.

The actual driver is warm weather, which independently increases both swimming activity and ice cream consumption. Neither causes the other.

The distinction from post hoc matters: in cum hoc reasoning, there's no claim that one event preceded the other. The error is purely about co-occurrence being mistaken for causation.

The third-cause fallacy: the hidden variable

Closely related to cum hoc, the third-cause fallacy occurs when two correlated variables are treated as causally linked while an unmeasured third variable — the actual driver — is ignored. Warm weather is the third cause in the ice cream example. This error becomes particularly dangerous at scale, because the correlation can be statistically robust and the hidden variable genuinely difficult to identify without deliberate investigation.

The Texas Sharpshooter fallacy: retrofitting patterns to data

The name comes from an image of a Texan firing at a barn wall and then painting a target around the bullet holes. In data analysis, the equivalent is examining results without a pre-set hypothesis, finding a cluster that looks meaningful, and treating it as a discovery. GrowthBook's experimentation documentation defines it precisely as "cherry-picking data clusters to suit a particular argument, hypothesis, or bias."

This fallacy is directly tied to the multiple testing problem. If you test 20 metrics at a 5% significance threshold, you have roughly a 64% probability of finding at least one false positive by chance alone — even if nothing real is happening.

The Texas Sharpshooter fallacy is what happens when analysts don't account for that and report the significant result as if it were a genuine finding. GrowthBook's docs flag this as a specific risk when teams "analyze the data in multiple ways or look at various subgroups without adjusting for multiple comparisons."

Wrong direction: reversing cause and effect

Sometimes the causal relationship is real, but the direction is inverted. A study might find that hospitals are associated with higher mortality rates and conclude that hospitals cause death — when in reality, people go to hospitals because they are already sick or injured. The association is genuine; the interpretation is backwards.

In product analytics, this shows up when teams observe that heavy users engage more with a new feature and conclude the feature is driving engagement, when the actual pattern is that highly engaged users are simply more likely to try new features.

The regression fallacy: mistaking natural variation for intervention

Extreme values tend to move back toward average over time — this is regression to the mean, and it happens regardless of any intervention. The regression fallacy occurs when that natural movement is credited to something that happened in between.

A sales team has its worst month on record, leadership introduces a new process, and the next month performance rebounds. The process gets the credit. But some portion of that rebound would have happened anyway, simply because extreme low performance rarely persists. Without a control group, there's no way to separate the intervention's effect from the natural correction.

Each of these subtypes produces the same surface-level error — a causal claim that isn't warranted — but through meaningfully different paths. Knowing which one you're looking at shapes how you'd go about disproving it.

False causality examples: from ice cream and drowning rates to flawed business decisions

The easiest way to understand false causality is to start somewhere almost embarrassingly obvious — and then notice how the same mistake, dressed in more sophisticated clothing, shows up in your team's quarterly review.

The classic examples that make the pattern click

Ice cream sales and drowning incidents both spike in summer. If you plotted them on a chart, you'd see a near-perfect correlation. A naive reading of that chart might suggest that ice cream consumption somehow increases drowning risk — or, absurdly, that drowning incidents drive ice cream sales.

Neither is true. Both are driven by a third variable: hot weather draws people to pools and beaches while simultaneously driving ice cream consumption. The correlation is real; the causal relationship is entirely fabricated.

The same structure appears in simpler superstitions. "Every time I wear my lucky socks, we win the game" is a post hoc ergo propter hoc error — the socks came before the win, so the socks must have caused it. Or consider the political version: "After the new mayor took office, crime went up."

The temporal sequence feels like an explanation, but it isn't one. Crime trends are shaped by economic conditions, policing policy, demographic shifts, and dozens of other variables that have nothing to do with who won the last election.

These examples feel obvious in isolation. The problem is that the underlying reasoning pattern — "these two things happened together, so one must have caused the other" — is exactly the same pattern your team uses when it looks at a dashboard.

Where false causality causes real business harm

The business version of the ice cream problem is subtler but structurally identical. Imagine a marketing team that launches a campaign in early Q4, and sales spike two weeks later. The campaign gets the credit.

But Q4 also brings seasonal buying patterns, a competitor's product recall, and a PR moment from an unrelated news story. Without isolating those variables, attributing the spike to the campaign is the same logical error as blaming the mayor for the crime rate.

A more insidious version involves proxy metrics. Product teams routinely use metrics like "items added to cart" as a stand-in for purchases, assuming a causal link between the two. But as GrowthBook's documentation on experimentation problems notes, "if the proxy metric is not strongly causally linked to the target metric, pressing hard on the proxy may have no effect on the goal metric, or might actually cause the correlation to break."

Optimizing aggressively for a proxy that isn't causally connected to the outcome you care about is false causality operationalized into your product roadmap.

The cost of acting on these false causal assumptions is real. Merritt Aho, Digital Analytics Lead at Breeze Airways, put it plainly: "People only see the wins, but there's actually greater value in avoiding losses. We've stopped changes that could have cost millions." That's the business case for taking false causality seriously — not as an academic concern, but as a source of expensive, avoidable decisions.

When the numbers lie in product analytics

The most dangerous false causality in analytics contexts is the kind that hides inside aggregate data. The UC Berkeley 1973 admissions case is the canonical example. Looking at overall admission rates, men appeared to be admitted at a significantly higher rate (44%) than women (35%) — a pattern that seemed to implicate gender bias.

But when researchers broke the data down by department, women actually had higher admission rates than men in many departments. In the Department of Education, for instance, women were admitted at a 77% rate compared to 62% for men.

The confounding variable was department choice. Women disproportionately applied to more competitive departments with lower overall acceptance rates. Once that variable was accounted for, the apparent pattern of discrimination reversed entirely. The aggregate number wasn't lying exactly — it was just answering a different question than the one people thought they were asking.

Product analytics teams run into this constantly. An aggregate metric improves, but when you segment by acquisition channel or device type, the improvement disappears — or exists only in one cohort that happened to grow. Twyman's Law offers a useful heuristic here: "Any data or figure that looks interesting or different is usually wrong."

When a result looks surprisingly clean, the more likely explanation is a data or implementation problem, not a genuine causal effect.

Industry-wide A/B test success rates sit around 33%, meaning teams' intuitions about what will move a metric are wrong roughly two-thirds of the time. That's not a reason for paralysis — it's a reason to be skeptical of causal stories that haven't been tested.

Correlation vs. causation: how confounding variables drive false causality

Most false causality errors don't happen because analysts are careless. They happen because a hidden third variable is quietly driving both sides of an observed relationship, making two unrelated things look like cause and effect. Understanding this mechanism — the confounding variable — is essential for anyone who makes decisions based on data.

What is a confounding variable?

A confounding variable is a third factor that independently influences both the apparent cause and the apparent effect, producing a spurious association between them. The two observed variables aren't causally linked at all; they're both downstream of something else.

A clean example from product analytics: users who have activated more in-app notifications tend to spend more time in the app. The tempting interpretation is that notifications drive engagement. But the actual driver may be that power users — people who already love the product — are both more likely to turn on notifications and more likely to spend hours in the app.

User engagement level is the confounder. It explains both behaviors independently, and the correlation between notifications and time-in-app is entirely spurious.

What makes confounders particularly dangerous is that they're often unmeasured or unrecognized. If you never think to look for the third variable, the spurious correlation looks like solid evidence.

Why correlation is an unreliable proxy for causation

Correlation measures the degree to which two variables move together. It says nothing about whether one causes the other, or whether both are being driven by something else entirely. This is the formal basis of the cum hoc ergo propter hoc fallacy — mistaking simultaneous correlation for causation.

In the presence of a confounder, two completely unrelated variables can show strong, consistent correlation. The pattern looks compelling. It replicates across time periods. It shows up in your dashboards. And it's still entirely misleading.

For product and marketing teams, the practical cost is real: optimizing based on correlated metrics without testing for causation means investing in features, campaigns, or interventions that have no actual effect on outcomes. The metric moves, but not because of anything you did.

Simpson's Paradox — when the aggregate pattern lies

The most dramatic illustration of confounding-driven false causality is Simpson's Paradox: a statistical phenomenon where a trend appears in aggregate data but disappears or reverses entirely when the data is broken down by subgroup.

The UC Berkeley 1973 admissions case — covered in detail in the previous section — is the canonical illustration: the aggregate data pointed in the exact wrong direction once department choice was accounted for as a confounding variable.

This is confounding at its most extreme: the observed pattern didn't just understate the true relationship, it pointed in the wrong direction entirely. Any decision made on the basis of the aggregate data would have been not just imprecise but actively wrong.

Controlled experiments as the methodological response

The reason controlled experiments — particularly randomized A/B tests — are the gold standard for establishing causation is precisely because they neutralize confounders. By randomly assigning subjects to treatment and control conditions, randomization distributes confounding variables roughly equally across groups.

Whatever third factors exist, they're present in both groups, so they can't explain away a difference in outcomes.

Observational analysis can partially address confounding through stratification and statistical controls, but these approaches require you to identify and measure the confounders in advance — which is exactly what you often can't do. Randomization handles confounders you haven't thought of yet.

For teams running experiments, this means paying close attention to whether experimental groups are actually comparable in demographics and behavior before drawing conclusions. Platforms built for rigorous experimentation — GrowthBook among them — incorporate techniques like CUPED and Sample Ratio Mismatch detection specifically to catch the kinds of group imbalances that can reintroduce confounding even within a structured experiment. The mechanics of randomization are necessary but not always sufficient; the analysis has to hold up too.

Controlled experiments neutralize confounders — but only when the analysis holds up

Understanding false causality is one thing. Building systems that reliably avoid it is another. The honest answer to "how do I stop drawing false causal conclusions?" is: run controlled experiments. But that answer is incomplete without a serious accounting of the ways experiments themselves can go wrong.

A/B testing as the gold standard — and why it works

The reason controlled experiments are so effective at establishing causation comes down to a single mechanism: random assignment. When you randomly split users into a control group and a treatment group, you distribute confounding variables roughly equally across both groups.

The hidden factors that corrupt observational data — seasonality, user demographics, concurrent product changes — don't disappear, but they stop being a problem because they affect both groups equally. What's left is the isolated effect of the variable you're actually testing.

This is precisely what observational analysis cannot do. You can control for confounders you know about, but you can't control for the ones you haven't thought of. Random assignment handles both categories at once.

That said, even well-designed experiments fail when they're poorly planned. GrowthBook's pre-experiment guide frames the problem directly: poorly planned experiments waste time and lead to bad decisions. Defining your hypothesis, primary metric, and success criteria before you start isn't bureaucratic overhead — it's the thing that makes your results interpretable.

Statistical pitfalls inside experiments

Here's the uncomfortable truth: you can run a properly randomized A/B test and still draw a false causal conclusion. The mechanism is usually one of three things.

P-hacking happens when analysts — often unconsciously — explore different metrics, time periods, or user segments until they find a statistically significant result. The problem isn't malice; it's that statistical significance at p < 0.05 means you'll see a false positive 5% of the time by chance alone. If you're testing enough slices of your data, false positives become nearly inevitable.

The multiple testing problem makes this concrete. If you test the same hypothesis across 20 different metrics at a 5% significance level, the probability of finding at least one statistically significant result purely by chance is around 64% — assuming those metrics are independent, which they often aren't in digital products.

These corrections work by raising the bar for what counts as statistically significant when you're testing many things at once — the more tests you run, the stricter the threshold needs to be. GrowthBook's documentation on experimentation problems names the standard approaches: Bonferroni correction, False Discovery Rate correction, and the Benjamini-Hochberg procedure, each offering a different trade-off between sensitivity and specificity. The practical takeaway is simpler: if you're tracking a large number of metrics, treat any single significant result as a signal to run a follow-up experiment, not as a conclusion.

Peeking — stopping an experiment early because the results look good — is a subtler failure mode. Every time you look at interim results and consider stopping, you inflate your false positive rate. The fix is straightforward in principle: set a predetermined sample size or duration before the experiment starts and commit to it. For teams that need more flexibility, sequential testing methods are specifically designed to allow early stopping without inflating error rates.

Pre-registration and the Texas Sharpshooter problem

The Texas Sharpshooter fallacy — shooting at a barn, then painting a target around the bullet holes — has a direct experimental equivalent: analyzing your data first, then constructing a hypothesis to fit what you found. It's easy to do accidentally. You run an experiment, dig into the results looking for something interesting, find an unexpected segment that shows a strong effect, and report that as your finding.

But that's not a hypothesis you tested; it's a pattern you noticed after the fact.

The defense is pre-registration in practice: write down your hypothesis, your primary metric, and your definition of success before you look at any results. This isn't just a statistical formality — it's what separates a finding from a story you told yourself about your data. GrowthBook's documentation notes that unusually large or surprising results should trigger skepticism rather than celebration until you've ruled out implementation errors. If a result looks too good, it probably is.

Building habits that outlast any single experiment

Avoiding false causality at scale isn't purely a statistical problem — it's an organizational one. When a measure becomes a target, it ceases to be a good measure. Teams that optimize for proxy metrics (items added to cart, say, rather than completed purchases) can produce results that look causal but aren't, because the proxy may not be strongly linked to the outcome that actually matters.

The teams that consistently avoid false causality share a few habits: they define hypotheses before running experiments, they apply correction methods when testing multiple metrics, they treat surprising results as a reason to investigate rather than celebrate, and they run follow-up experiments to validate findings before acting on them. Methodology gets you most of the way there. Discipline in execution gets you the rest.

The causal story feels true because pattern recognition is what brains do

The through-line of this article is simple: the causal story your brain constructs from correlated data feels true because pattern recognition is what brains do. That's not a flaw to fix — it's a feature to compensate for. The compensation is methodology: controlled experiments, pre-registered hypotheses, multiple testing corrections, and the discipline to treat surprising results as a reason to investigate rather than ship.

A quick-reference summary: the most common false causality patterns to watch for

Post hoc errors show up whenever a metric moves after a launch and the launch gets the credit. In dashboards where two lines move together and no one asks what's driving both, that's cum hoc reasoning. Any analysis that started with the data rather than a hypothesis is a candidate for the Texas Sharpshooter fallacy.

And Simpson's Paradox is waiting in every aggregate metric that hasn't been segmented — the UC Berkeley case is a reminder that the aggregate number can point in the exact wrong direction while every subgroup tells the opposite story.

Practical questions to ask before drawing any causal conclusion

Before you attribute a metric movement to a cause, ask three things: Was there a pre-specified hypothesis before the data was collected? Is there a plausible third variable that could explain both sides of the correlation? And does the pattern hold when you break it down by meaningful subgroups? These aren't bureaucratic checkboxes — they're the questions that separate a finding from a story you told yourself about your data.

What to do next

Look at the last causal claim your team acted on — a feature that "drove" a metric, a campaign that "caused" a lift. Ask whether it was tested with a control group, whether the hypothesis was written before the data was collected, and whether the result held across subgroups. If the answer to any of those is no, you've found your starting point. Run the follow-up experiment before the next decision gets made on the same assumption.

The teams that get this right are more disciplined organizationally than statistically

The teams that get this right aren't necessarily more sophisticated statistically — they're more disciplined organizationally. They write down what they expect to see before they run an experiment. They apply corrections when they're tracking multiple metrics. They treat Twyman's Law as a real heuristic: if a result looks surprisingly clean, the first assumption is that something is wrong, not that something is working.

Statistical guardrails built into experimentation platforms — sample ratio mismatch detection and variance reduction techniques — catch the kinds of group imbalances and noise that quietly corrupt even well-randomized experiments. But the tooling only helps if the habits are already there.

This article was written to be genuinely useful to practitioners who are tired of making expensive decisions on the basis of correlations that felt like causes. If it gave you a sharper vocabulary for one meeting or one experiment review, it did its job.

Related insights

Table of Contents

Related Articles

See All Articles
Product Updates

Understanding STAR goals for effective performance

May 22, 2026
x
min read
Experiments

Green release: what it is and how it works

May 21, 2026
x
min read
Experiments

T test vs chi square: key differences explained

May 20, 2026
x
min read

Ready to ship faster?

No credit card required. Start with feature flags, experimentation, and product analytics—free.

Simplified white illustration of a right angle ruler or carpenter's square tool.White checkmark symbol with a scattered pixelated effect around its edges on a transparent background.