How DoorDash runs 12,000 experiments per year across a 3-sided marketplace
.png)
Most experimentation platforms optimize for one user. DoorDash optimizes for 3.
Ilya Izrailevsky, senior engineering manager leading DoorDash's experimentation platform, joined me on The Experimentation Edge to share how he runs a testing program that processed 12,000 experiments last year across 42 million monthly active users. At peak, the platform handles 300 million feature flag evaluations per second. But scale isn't the hard part.
The hard part is that every experiment has to satisfy 3 competing interests:
- Consumers who want fast, cheap delivery
- Dashers who want high earnings and low idle time
- Merchants who want order volume without inventory depletion
"We don't have one magic number or metric that we're going after," Ilya told me. "We need to balance between consumers, dashers, and our merchants."
Ilya brings experimentation experience from Amazon, Robinhood, Uber, Intuit, and PayPal. He's lived on both sides of the fence — building experimentation platforms and using them to optimize search and recommendation systems. At Amazon, he led machine learning automation for e-commerce search, running tests that narrowed billions of products down to the 10–20 results customers actually see.
Now at DoorDash, he's scaling experimentation in 4 dimensions at once: democratization (enabling non-technical users to run tests), global expansion (40+ countries), new verticals (grocery, electronics, retail), and AI-powered growth.
🎧 Listen to the full episode →
The CEO reads every experiment result
DoorDash is an operations-driven, data-driven company. That's not marketing speak. After every experiment — win or loss — the team sends results company-wide. CEO Tony Xu reads them. He replies. He congratulates teams. He suggests alternative approaches.
"This really builds a culture that experimentation is encouraged and everything we do should be through experiment," Ilya said.
It's a pattern I keep seeing: the strongest experimentation programs have CEO-level engagement with test results. When leadership reads the details, the rest of the company follows.
One size does not fit all (even within your own product)
DoorDash launched beyond restaurants into grocery, electronics, and retail. The default delivery radius was 11 miles. The team hypothesized that expanding the radius would increase selection and drive more orders.
They were half right.
The expanded radius worked for grocery stores. For retail and clothing, it killed order volume. Customers didn't want to pay higher delivery fees for a $20 T-shirt from 15 miles away. The noise from distant stores degraded the experience.
"One size really does not fit all," Ilya said. "You have to really look at different types of verticals and understand the customer behavior."
The team killed the experiment. But the learning stuck: they now segment by vertical and test category-specific behaviors. Your intuition doesn't matter. Test it.
A "failed" experiment that saved thousands of subscriptions per week
DoorDash offers DashPass, a $10/month subscription that waives delivery fees and unlocks perks. It's similar to Amazon Prime. At one point, churn spiked. Subscribers were canceling.
The team ran an experiment: at the point of cancellation, show users the value they'd received. If you order more than 3 times per month, you've already recouped the fee. Plus streaming perks. Plus faster shipping.
The intervention saved thousands of DashPass subscriptions per week.
But the real learning was bigger. Customers didn't realize the value they were getting. So DoorDash created an entire product area around proactive DashPass messaging — checkout flows, confirmation screens, and monthly recaps. The experiment didn't just save subscriptions. It created a roadmap.
"There's no such thing as a failed experiment," Ilya said. "Every experiment is a learning opportunity."
Protecting the 3-sided marketplace
Because DoorDash is a marketplace, every experiment has to protect 3 constituencies. The team uses separate success metrics for each:
- Consumers: Order quality, reliability, satisfaction, retention
- Dashers: Earnings, utilization (time spent idle), fairness (order distribution)
- Merchants: Order mix (avoiding inventory depletion on one SKU), unit economics, long-term business growth
Guardrail metrics protect the rest of the ecosystem. A test that boosts consumer conversion but tanks dasher earnings gets killed. A test that increases merchant orders but hurts order mix gets reworked.
"It's in our interest for them to be successful," Ilya said, referring to merchants. "We want to have that mixture of different types of orders that people try out."
This is the classic explore-exploit trade-off applied to a marketplace. If you only show what customers have already ordered, they get bored. If you only show new things, they don't convert. Balance matters.
Democratization: From PhDs to product managers
Right now, engineers instrument experiments in mobile, web, and backend code. Data scientists and analysts interpret results. It works, but it doesn't scale.
Ilya's team is building what he calls "opinionated experiment templates" — pre-configured tests with embedded success and guardrail metrics. The goal: let product managers, designers, advertisers, marketers, and business operations teams run their own experiments without needing a PhD in statistics.
But the real moonshot? Enabling merchants to run their own tests.
"We'd like restaurant or store owners to be able to run their own experiments — on promotions, prices, menu items — so they can attract more customers to their local stores," Ilya said.
If it works, it's a win across the board. Customers get more selection. Dashers get more orders. Merchants grow their businesses. DoorDash processes more volume.
AI throughout the experimentation lifecycle
DoorDash is using AI in 3 ways:
- Institutional knowledge mining: Past experiments inform new hypotheses. AI surfaces what worked and what didn't.
- Agentic setup and debugging: AI helps non-technical users configure tests, detect imbalance issues, and fix SRM violations.
- Automated readouts: AI generates experiment summaries for the company-wide emails that the CEO reads.
"AI can help you do a lot of research and give you insights, but at the end of the day, humans need to make calls," Ilya said. "Humans are always at the helm."
The barrier to shipping features has dropped. AI makes it easier to build. That means more features. Which means more tests. Which means experimentation platforms have to scale with AI-powered growth, not just headcount.
What's next
Ilya sees 4 scaling dimensions ahead:
- Democratization: Self-serve experimentation for non-technical users
- Global scale: 40+ countries, each with different customer behaviors
- Vertical expansion: Grocery, electronics, retail — each with different unit economics
- AI-powered growth: More features, more tests, more automation
The platform is already handling 300 million evaluations per second. The next challenge isn't technical capacity. It's making sure every experiment — across every vertical, in every country, for every user type — delivers real customer value.
Because at DoorDash, experimentation isn't the goal. Customer impact is. Experimentation is just the vehicle.
Want to scale your experimentation program like DoorDash?
Listen to my conversation with Ilya on The Experimentation Edge podcast. And if you are building your own experimentation program, GrowthBook gives you the platform to run, analyze, and learn from every experiment at scale.
Related articles
Ready to ship faster?
No credit card required. Start with feature flags, experimentation, and product analytics — free.


.png)

