UPS delivered $500M in revenue by testing e-commerce patterns on shipping flow

TL;DR: When Dave Massey joined UPS in 2016, experimentation was new. The company had a testing tool but no real program. Senior leaders gave him a pilot: prove UX improvements could move revenue. His first test — removing navigation from the checkout flow — drove $35 million. Nine years later, his team has delivered over $500 million in incremental revenue. Here's how they did it.
Most people don't think of UPS as an e-commerce company. But if you ship a package on UPS.com, that's e-commerce in every sense: a funnel, a checkout, a transaction. When Dave Massey, head of user research, personalization, and experimentation at UPS, walked into the company in 2016, nobody saw it that way. The shipping tool was just a tool. Not anymore.
Dave brings a background in digital advertising and conversion rate optimization. His first day, he walked into a meeting where leaders were choosing an A/B testing platform. He raised his hand to clarify some misconceptions about what the tools could and couldn't do. "You know how this works? Okay, so you get to play with it now and also own it," they told him.
He ran UPS's first test—a tiny button change—and then the program went quiet. Other business priorities took over. But in 2019, senior leaders came back. They'd heard this whole "UX thing" was worth considering. But like any big company, UPS needed proof. "We need to make sure that the juice is worth the squeeze," they said.
They gave Dave a revenue target, one other person, and vendor support. That was the pilot.
🎧 Listen to the full episode →
First test: remove distractions, add $35 million
Dave's first move was simple. He applied a basic e-commerce principle to the shipping flow: once a customer enters checkout, remove distractions.
"We basically removed navigation tools and stuff like that once somebody entered the shipping tool to get them to continue through the flow and not get distracted and go somewhere else," Dave said.
Before he launched, colleagues told him he was crazy. It wouldn't help. It would make customers angry.
He ran it anyway. "There's no such thing as a bad test. We learn what to do and what not to do."
The result? A conversion rate increase worth around $35 million over the course of a year.
But showing a number like that right out of the gate triggered skepticism. "Everybody's like, there's no way, we don't believe that," Dave said. His data team had to defend the results upside down and sideways. Holes were poked. Every assumption was challenged. When the dust settled, leadership agreed: "Yeah, this is legit."
The foundation: data rigor and UX research
Dave credits two things for UPS's experimentation success: a world-class data team and tightly integrated UX research.
"You gotta have the best data team you can afford," he said. UPS is an engineering company at its core. If you can't measure it, it doesn't matter. But when Dave joined, the analytics tooling was painfully slow. "You want to run a report, and you come back after lunch and hope it's done."
The company was modernizing its analytics stack around the same time the experimentation pilot launched. That timing was critical. Suddenly, the team had granular data and could connect dots fast.
But data alone isn't enough. Dave's team pairs behavioral metrics with voice-of-customer insights because his UX research team sits under the same umbrella as experimentation.
"Having the voice of the customer along with those behavioral metrics gives you that real 360 view," he said. "We can anticipate more before we even get to the A/B testing side because we've heard customers say, ' Hey, this is a problem, this is not a problem.'"
When a test fails, the team doesn't just look at the data. They go back to customers and ask: Why do you think this didn't work?
"I don't know, I've talked to a few other leaders of their experimentation programs at other companies. And when I tell them that we have our UX team connected to the hip of our experimentation team, it kind of blows their mind. But to me, it seems like that's 101. You would have to have that, right?"
The test that ran for 24 hours
Not every test is a winner. Dave's gut is "wrong more than it's right," he says. But some tests fail so hard they become teaching moments.
Senior leadership once pushed hard to make the recipient's email a required field in the shipping flow. The thinking: capture customer data for retargeting. The product team worried. Dave's team ran the test.
It lasted about 24 hours.
"We saw such a decline in conversion on shipping on UPS.com that we pulled the plug on it," Dave said. "We went back and told the business, yeah, you can't. This is not something that is a good experience, and therefore it will cost us business."
But a couple of years later, the international shipping team wanted to do the same thing—for a completely different reason. Customs paperwork. When a package gets held at customs, the recipient has to deal with it, not the shipper. The team wanted the recipient's email to streamline that process.
This time, the test worked. "We put it in there. This is to help it get through customs. We saw no issue. Nobody had a problem with it because we gave them the reason why. It's not just like, hey, we want this because we want this."
The lesson: context matters. Friction isn't inherently bad if customers understand the value.
Building a culture of testing
Today, Dave's team, the Journey Experience and Design Innovation team or "JEDI", runs experimentation for a $12.6 billion business. The team is about 80 people, including designers, data scientists, developers, and vendor partners. They support nearly 80 different customer-facing applications across UPS digital properties.
They can't test everything. But that's a good problem.
"The fact that we can't test everything that we want to test is great in my eyes because that means people understand it and understand the value of it," Dave said.
Business units across UPS now come asking to test ideas. The team has earned a reputation for rigor and honesty. "The business now knows that they ignore Jedi at their own peril," Dave said.
When senior leaders propose an idea, the team doesn't just say no. They test it. If it doesn't work, they come back with data and alternatives. "We tested your idea, and it did not work well, but we learned these three other things that we can do to accomplish the same goal. That changes that conversation."
That approach has made the team the center of excellence for proving what to do — first and foremost for customers, but also for the bottom line.
What's next: personalization and decentralization
Dave sees two big opportunities ahead. First, personalization. UPS serves a wide range of users: someone who ships once a year and someone who's in the tool all day as a shipping manager. Those users need different experiences.
"Maybe one day they're showing up to ship something to grandma for a birthday. The next day, they're at their day job, and they're the shipping manager," Dave said. "It's being able to understand that. And that's where experimentation and audience targeting and all those sorts of things can come together."
Second, decentralization. Right now, the team runs everything centrally. Dave would love to expand the capability, so other teams can run their own tests — but with the same rigor and standards his team uses.
"We need technology that is simple enough for other business units to be able to do this sort of thing," he said.
AI will play a role. UPS has been experimenting with AI since before it became a buzzword. Dave's team treats it as a tool to improve efficiency. They can get through results faster and generate hypotheses more quickly, but always with a human in the loop. "There's nothing that AI generates that just does not pass go. It has to go through its checks, just like something that a human on our team would have to go through."
The half-billion-dollar value of experimentation
Since launching the pilot in 2019, Dave's team has delivered over half a billion dollars in incremental revenue. That doesn't even account for the savings they've generated.
The formula isn't complicated. Treat your internal tools like e-commerce. Pair behavioral data with customer research. Test everything. Defend your results with rigor. Push back on leadership with evidence, not opinions. And build a team that knows the difference between a failed test and a learning opportunity.
"My team is known for the rigor that we go through in terms of setting up a test and making sure we're not blowing anything up, breaking anything with IT," Dave said. "But also looking at the results and being as unbiased as possible. We have no problem saying, hey, this isn't the right thing to do, sorry. Or it is the right thing to do."
That's how you deliver half a billion dollars.
Listen to my full conversation with Dave on The Experimentation Edge podcast. And if you are building your own experimentation program, GrowthBook gives you the platform to run, analyze, and learn from every experiment at scale.
Related articles
Ready to ship faster?
No credit card required. Start with feature flags, experimentation, and product analytics — free.



.png)
