Experiments

Experimental probability examples: explained simply

A graphic of a bar chart with an arrow pointing upward.

A basketball player's free throw percentage, a weather forecast's 70% chance of rain, and an A/B test's conversion rate are all calculated the same way: count what happened, divide by how many chances there were.

That single operation — experimental probability — is one of the most useful tools in any data-driven field, and it's simpler than most people expect.

This article is for students, engineers, and product practitioners who want a clear, working understanding of experimental probability — not just the formula, but how to apply it and why it behaves the way it does. Here's what you'll learn:

  • How experimental probability works and how to calculate it using a straightforward formula
  • Step-by-step examples across four different scenarios, from coin flips to basketball free throws
  • How experimental probability differs from theoretical probability, and why the two values don't always match
  • Why more trials produce more reliable results — and what that means for real-world experimentation like A/B testing
  • Where experimental probability shows up in sports, manufacturing, weather forecasting, and product development

The article moves in that order: concept first, worked examples second, key comparisons third, and real-world applications last. Each section builds on the one before it, so by the end you'll have a reusable framework for calculating and interpreting experimental probability in any situation where you can count outcomes and total trials.

Experimental probability starts with observation, not assumption

Experimental probability is exactly what the name suggests: a probability value derived from running an actual experiment and recording what happens, rather than reasoning mathematically about what should happen. If you want to know how likely a coin is to land heads, you could think through the physics and conclude it's 50/50 — or you could flip the coin a hundred times and see what the data says.

The second approach is experimental probability. It is also called empirical probability, and the distinction matters: the value comes from observation, not assumption.

The definition

At its core, experimental probability is determined by conducting a random experiment repeatedly and tracking the results. Each repetition of the experiment is called a trial. The experiment is run many times precisely because a single trial tells you almost nothing reliable — it's the accumulation of trials that produces a meaningful estimate.

Whether you're flipping a coin, rolling a die, or measuring how often a manufacturing process produces a defective part, the logic is the same: repeat the experiment, record the outcomes, and calculate from what you actually observed.

The formula

The formula for experimental probability is straightforward:

P(E) = Number of times the event occurs ÷ Total number of trials

If you flip a coin 30 times and record 13 heads, the experimental probability of getting heads is 13 ÷ 30, or approximately 0.43. That's it. No assumptions about the coin being fair, no theoretical reasoning — just the observed data divided by the number of opportunities for that data to occur.

The formula uses P(E) where E represents the specific event you're tracking. Swap in any event — rolling a six, drawing a red card, a website visitor clicking a button — and the same division applies.

Key vocabulary: trial, outcome, event, and sample space

Four terms appear throughout any discussion of experimental probability, and it's worth defining them precisely before moving into examples.

A trial is a single run of the experiment — one coin flip, one die roll, one observation. An outcome is the result of that single trial: heads, a four, a click. The event is the specific result you're interested in tracking across all trials — not just any outcome, but the particular one you're measuring.

The sample space, by contrast, is the complete set of all possible outcomes the experiment could produce. For a standard die, the sample space is {1, 2, 3, 4, 5, 6}. For a coin flip, it's {heads, tails}.

Keeping these terms distinct prevents confusion when the numbers start flowing. You're always dividing the count of your specific event by the total number of trials — not by the size of the sample space.

Why probability always falls between 0 and 1

The 0-to-1 range isn't an arbitrary convention — it's a logical constraint built into the formula itself. You cannot record more occurrences of an event than you have trials, so the numerator can never exceed the denominator. That means the maximum possible value is 1, which represents an event that occurred in every single trial — a certainty.

On the other end, if an event never occurs across all trials, the numerator is 0 and the probability is 0 — an impossibility, at least within the scope of your experiment.

Every experimental probability you calculate will land somewhere in that range, with values closer to 1 indicating events that happen frequently and values closer to 0 indicating events that rarely or never occur.

One important caveat: where exactly in that range your result lands depends heavily on how many trials you ran. A small number of trials can produce a result that looks definitive but is actually quite unstable — flip a coin five times and land heads four times, and P(heads) = 0.8, which feels misleading. More trials produce estimates that stabilize toward a reliable value. That relationship between trial count and accuracy is worth understanding carefully, and it's covered in depth later in this article.

Experimental probability examples: step-by-step walkthroughs

Reading a formula is one thing. Watching it work on a real problem is another. The examples below cover four distinct scenarios — two classic classroom setups and two sports-based situations — and every one of them follows the same four-step method: identify the event, count how many times it occurred, divide by the total number of trials, and interpret the result in plain language. Work through each example in sequence and the pattern will become second nature.

The four-step method works the same regardless of context

Before jumping into specific scenarios, it helps to internalize the structure you'll use every time. Step one is identifying the event — the specific outcome you're tracking. Step two is counting how many times that event actually occurred across your trials.

Step three is dividing that count by the total number of trials, which gives you the experimental probability using the formula P(E) = number of times the event occurs / total number of trials. Step four is stating what that number means in plain language, because a decimal sitting on its own doesn't communicate much.

This method works regardless of whether you're analyzing a coin, a die, or an athlete's performance record. The arithmetic changes; the structure doesn't.

Example 1: coin flip experiment

Suppose you flip a coin 10 times and record the result of each flip. Heads comes up 6 times. The event is "flipping heads." It occurred 6 times. The total number of trials is 10. Dividing 6 by 10 gives you 0.6, or 60%.

Based on these observed trials, the experimental probability of flipping heads is 0.6. That's higher than the theoretical probability of 0.5, which is worth noting — the two values don't have to match, especially when the trial count is small.

Example 2: dice roll experiment

You roll a standard six-sided die 20 times and track how often you roll a 5. It comes up 3 times. The event is "rolling a 5." It occurred 3 times across 20 total trials. P(rolling a 5) = 3/20 = 0.15, or 15%.

In plain language: based on this experiment, there's a 15% chance of rolling a 5 on any given roll. For comparison, the theoretical probability is 1/6, which is approximately 0.167. The experimental result is close but not identical — a normal outcome for a modest number of trials.

Example 3: figure skater landing a jump

Now consider a more applied scenario. A figure skater attempts a specific jump 12 times during practice and lands it successfully 7 times. The event is "landing the jump." It occurred 7 times. Total trials: 12. P(landing jump) = 7/12 ≈ 0.583, or about 58%.

Interpreted plainly: based on observed practice attempts, the skater successfully completes this jump roughly 58% of the time. A coach tracking this number over multiple sessions would have a data-grounded basis for assessing consistency and deciding how much to rely on that jump in competition.

Example 4: basketball free throws

A basketball player takes 10 free throw attempts and converts 9 of them. The event is "making the free throw." It occurred 9 times out of 10 total trials. P(making free throw) = 9/10 = 0.9, or 90%.

That's a strong result — but notice it isn't 1.0. The player missed once, which means certainty isn't established, only a high experimental probability based on the observed sample. If the same player attempted 100 free throws and made 90, the 90% figure would carry considerably more weight, because more trials produce more stable estimates.

Each of these experimental probability examples resolves to the same formula and the same interpretive step. The context varies; the method doesn't. Once you've worked through all four, you have a reusable template for calculating experimental probability in any situation where you can count outcomes and total trials.

Experimental probability vs. theoretical probability: key differences

The four examples above all returned results that differed from what a theoretical model would predict — sometimes slightly, sometimes substantially. If you've ever flipped a coin ten times and gotten seven heads, you've already encountered that gap between experimental and theoretical probability.

The two concepts are related but distinct, and understanding exactly how they differ — and why they sometimes disagree — is essential for interpreting any data-driven result, whether it comes from a classroom exercise or a product experiment.

Theoretical probability assumes clean conditions that rarely exist

Theoretical probability is calculated through reasoning alone, without running a single trial. It assumes that all outcomes are equally likely and applies the formula: P(event) = favorable outcomes / total possible outcomes. For a fair coin, the theoretical probability of landing heads is 1/2 = 0.5, because there are exactly two possible outcomes and one of them is heads. No flipping required — the answer comes entirely from the structure of the problem.

Theoretical probability works well when the underlying conditions are clean and known: fair dice, standard decks of cards, idealized scenarios where every outcome has an equal chance. It breaks down the moment real-world complexity enters the picture — when a coin might be slightly weighted, when a manufacturing process has variable defect rates, or when user behavior doesn't follow any clean mathematical model.

Experimental probability works where theory breaks down

Experimental probability is derived from observation. You run trials, count outcomes, and divide: P(event) = number of times the event occurs / total number of trials. If you flip a coin 10 times and get 7 heads, your experimental probability of heads is 7/10 = 0.70 — not 0.50. That's not a mistake. It's what actually happened.

This is the version of probability that applies when you can't derive an answer from first principles. Real systems — user behavior, manufacturing tolerances, athletic performance — are too complex for theoretical models to capture precisely. Experimental probability lets you work with observed reality instead of idealized assumptions.

Why the two values diverge — and when they converge

The gap between experimental and theoretical probability is not an error. It is expected, and it is predictable. With a small number of trials, random variation has an outsized influence on the result. Seven heads in ten flips feels surprising, but it's well within the normal range of outcomes for a fair coin.

The experimental probability of 0.70 diverges from the theoretical 0.50 simply because ten trials aren't enough to smooth out the noise.

This is the core insight behind the Law of Large Numbers — a relationship explored in depth in the next section — and it explains why the gap between experimental and theoretical probability narrows as your dataset grows.

The real-world implications are concrete. The industry-wide success rate for A/B tests is approximately 33%, meaning roughly one in three experiments actually improves the target metric. That figure isn't derived from theory — it's an experimental probability calculated from thousands of real tests. A single experiment's result tells you very little. The aggregate of thousands of experiments produces a stable, reliable estimate.

The practical takeaway: when experimental and theoretical probability disagree, the first question to ask is how many trials were run. A small sample doesn't mean the theory is wrong or the experiment was flawed — it means the estimate hasn't had enough trials to stabilize. More data closes the gap. That's true whether you're flipping coins in a classroom or analyzing conversion rates in a product dashboard.

Why more trials lead to more reliable experimental probability results

Experimental probability is only as trustworthy as the data behind it. Run too few trials and your estimate can land almost anywhere — not because the math is wrong, but because chance hasn't had enough room to average itself out. This is the single most important practical lesson that connects a student's coin-flip homework to a data team's A/B test: sample size isn't a technicality, it's the foundation of any reliable probability estimate.

The problem with small samples

Imagine flipping a coin five times and getting four heads. Your experimental probability for heads would be 4/5, or 0.80 — a full 30 percentage points above the true theoretical probability of 0.5. Does that mean the coin is rigged? Almost certainly not. It means five flips isn't enough data to distinguish a fair coin from a lucky streak.

Small samples are volatile by nature. With only a handful of trials, a single unusual outcome can swing your estimate dramatically in either direction. This isn't a flaw in your experiment — it's a mathematical reality. The estimate isn't wrong given the data you have; the data just isn't sufficient to produce a stable estimate.

This same problem appears in professional experimentation. Practitioners who stopping an A/B test the moment results look promising can end up with a variant that appears to perform 18% better than the control — not because it actually does, but because an early streak of positive results in one branch inflated the numbers. The underlying principle is identical to the five-coin-flip problem, just at higher stakes. Running a test to completion based on a pre-calculated sample size, rather than stopping when results look good, is the standard safeguard against this failure mode.

More trials narrow the range of plausible outcomes

As the number of trials grows, experimental probability converges toward the true underlying probability. Flip that same coin 500 times and your result will almost certainly land somewhere between 0.47 and 0.53 — far closer to the theoretical 0.5 than any five-flip experiment could reliably produce.

The estimates don't just improve; they stabilize. The range of plausible outcomes narrows as more data accumulates.

This convergence behavior is what the Law of Large Numbers describes: over a sufficiently large number of independent trials, the experimental probability of an event will approach its theoretical probability. You don't need to memorize the formal theorem to use the principle — you just need to internalize that more trials mean less noise.

Experimentation platforms that use Bayesian methods make this stabilization visible. As more data comes in, the probability distribution around an estimate tightens — the tails of the distribution shorten, reflecting increasing certainty. What starts as a wide, uncertain distribution gradually narrows into something you can act on. That visual narrowing is the Law of Large Numbers made concrete.

Sample size in real-world experimentation

The same logic that governs coin flips governs A/B tests, and the question is identical in both cases: how many trials do you need before your estimate is trustworthy? GrowthBook's documentation frames statistical power using exactly this analogy — "How many times do I need to toss a coin to conclude it is rigged by a certain amount?" — which makes explicit that power analysis is just experimental probability reasoning applied to product decisions.

In practice, this means calculating your required sample size before you start, based on the minimum effect size you care about detecting and the level of confidence you need. If the real difference between your control and variant is smaller than your experiment is designed to detect, the test will come back inconclusive — even when something genuine is happening. This is called a Type II error, or a false negative: the experiment missed a real effect because it wasn't running long enough or didn't have enough users.

A related failure mode is a Sample Ratio Mismatch — when the actual split of users between variants doesn't match the intended split, often because of a tracking or assignment bug. Both problems corrupt your results in ways that aren't obvious from the numbers alone, which is why running experiments to their pre-planned completion matters.

The student who got 80% heads from five flips and the PM who stopped an A/B test after two days are making the same mistake. The fix in both cases is the same: run more trials, and decide how many before you start.

The same formula runs underneath sports stats, manufacturing, and A/B tests

That same convergence principle — more trials, more reliable estimates — isn't confined to classrooms or controlled experiments. The formula divides observed occurrences by total trials, and that same operation, unchanged, runs underneath some of the most consequential decisions made in professional sports, manufacturing, meteorology, and product development. Understanding where experimental probability appears — and why it matters — transforms it from a homework concept into a foundational tool for interpreting data in any field.

Sports analytics: every stat is an experimental probability

A basketball player's free throw percentage is not a theoretical prediction. It is an experimental probability calculated from every attempt that player has ever taken: shots made divided by shots attempted. The same logic applies to a baseball player's batting average, a soccer goalkeeper's save rate, or a figure skater's landing consistency in competition. Every new game adds new trials, and the estimate updates accordingly.

Professional sports analysts and coaching staffs rely on these figures precisely because they are grounded in observed outcomes rather than assumptions. When a team decides whether to foul a player in the final seconds of a game, they are acting on that player's experimental probability of converting free throws — a number built from hundreds of real trials, not a theoretical model.

Quality control in manufacturing

Manufacturers cannot inspect every unit that comes off a production line, so they sample. A quality control team pulls a batch of units, counts how many are defective, and divides by the total inspected. The result — defective units divided by total units sampled — is an experimental probability of a defect occurring in that production run.

This figure drives real decisions: whether to halt a line, adjust a process, or release a batch to distribution. The same formula a student uses to calculate the probability of rolling a six applies directly to determining whether a production process is operating within acceptable tolerances.

Weather forecasting: probability of precipitation

When a forecast says there is a 70% chance of rain, that figure comes from historical observed data, not a theoretical derivation. Meteorologists examine past days with atmospheric conditions similar to today's and calculate how often precipitation actually occurred across that historical record. If it rained on 70 out of 100 comparable days in the past, the forecast reflects that observed frequency.

This is experimental probability drawn from a large dataset of historical trials. The accuracy of the forecast improves with the size and quality of the historical record — which is exactly the same relationship between sample size and reliability that applies to any experimental probability estimate.

A/B testing and product experimentation

A/B testing is experimental probability applied at enterprise scale. When a product team exposes users to a new variant — a different checkout flow, a revised onboarding screen, a changed pricing page — each user interaction is a trial. The observed conversion rate, click-through rate, or engagement metric is calculated by dividing the number of times the target outcome occurred by the total number of exposures. That is the experimental probability formula, running in production.

What makes this application particularly instructive is that the same logic compounds across an organization's entire experiment portfolio. Platforms like GrowthBook track win rates across all experiments a team has run — the fraction of tests that produced a statistically positive result. That win rate is itself an experimental probability: wins observed divided by total experiments conducted. Experimentation platforms with portfolio-level insights are specifically designed to surface these aggregate figures, helping teams understand whether they are running the right mix of high-risk and incremental experiments over time.

The cumulative view matters for the same reason that more coin flips produce a more reliable estimate. Individual experiments can be noisy — a single test might win or lose for reasons unrelated to the change being tested. But across dozens or hundreds of experiments, the observed win rate stabilizes into a meaningful signal about how an organization's experimentation program is actually performing.

The formula never changes — only the stakes do

The through-line of this article is a single operation: count what happened, divide by how many chances there were. What changes across every context — coin flips, free throws, quality control batches, A/B tests — is not the math but the stakes attached to the result. Understanding that the formula is the same whether you're in a classroom or a product dashboard is what makes experimental probability genuinely transferable.

Quick reference: the experimental probability formula and when to use it

Use P(E) = occurrences ÷ total trials any time you have observed data and want to estimate how likely an event is to happen again. Reach for theoretical probability when the system is clean and idealized — fair dice, standard cards — and switch to experimental probability the moment real-world complexity enters: user behavior, manufacturing variance, athletic performance, weather patterns. If you can count it, you can calculate it.

Common mistakes to avoid when calculating experimental probability

The two most common errors are stopping too early and misidentifying the denominator. Stopping early — whether after five coin flips or two days of an A/B test — produces an estimate that looks precise but is actually just noise wearing a number. The denominator mistake is subtler: you're always dividing by total trials, not by the size of the sample space. Confusing those two will give you a result that's arithmetically clean and conceptually wrong.

Next steps: practice problems and tools to build your skills

The fastest way to internalize experimental probability is to run your own small experiments — flip a coin 50 times, roll a die 30 times, track a repeatable outcome in your own work — and watch how the estimate shifts as you add trials. If you're applying this in a product context, an experimentation platform handles the trial-counting and stabilization mechanics automatically, including flagging low-traffic experiments and checking for Sample Ratio Mismatch before you act on a result that may not be trustworthy.

The honest goal of this article was to make experimental probability feel less like a formula to memorize and more like a lens you already know how to use. If you've ever looked at a batting average and thought "that player hits well," you were already reasoning with experimental probability. Now you have the vocabulary and the structure to do it deliberately.

The tension worth holding onto: more trials make your estimate more reliable, but you rarely have unlimited time or data. The practical skill isn't running infinite experiments — it's knowing how many trials you actually need before your estimate is stable enough to act on, and committing to that number before you start.

What to do next: Pick one number you already track — a conversion rate, a completion percentage, a defect rate — and ask two questions: how many trials is that estimate based on, and is that enough to be stable? If the answer to the second question is uncertain, that's your starting point. Calculate the sample size you'd need to trust the result, and use that as your baseline going forward. That single habit is where rigorous experimentation begins.

Related insights

Table of Contents

Related Articles

See All Articles
Product Updates

Understanding STAR goals for effective performance

May 22, 2026
x
min read
Experiments

Green release: what it is and how it works

May 21, 2026
x
min read
Experiments

Understanding false causality and examples

May 21, 2026
x
min read

Ready to ship faster?

No credit card required. Start with feature flags, experimentation, and product analytics—free.

Simplified white illustration of a right angle ruler or carpenter's square tool.White checkmark symbol with a scattered pixelated effect around its edges on a transparent background.