Sixteen weeks. Checkout completion plus twenty-four percent.
Vs 16-wk pre-program baseline
4 winners · 6 null · 2 losers
From 4 shipped winners
Team runs it without me now
The problem.
The marketing team was running "tests" that weren't experiments. No sample sizing. No pre-registered hypothesis. Whichever variant looked best after a week became the new default. Revenue would slide three weeks later and nobody could explain why. The agency was a partner, not a system. The CFO had stopped believing the lift claims by the time I came in.
The approach.
First two weeks: an honest audit. Of the previous 18 "wins," nine were noise, three were seasonality, two were a pricing change the team hadn't controlled for. Six were real. Then we built a real program — pre-registered hypotheses, sample-sized at 95% confidence and 80% power, an MDE each test could actually detect, three concurrent test slots, a weekly review where every test had to declare its primary metric and stop date before launching. By week ten the marketing manager was chairing the meeting.
The outcome.
Twelve tests in sixteen weeks. Four winners shipped — a cart-page redesign (+11% completion), a guest-checkout default (+8%), a delivery-promise relocation (+6%), and a payment-icon order test (+3%). Stacked, they were the +24% headline. The other eight tests were null or losers — which the team now treats as a result, not a failure. The CFO trusts the lift numbers because they line up with the bank statement. The marketing team runs the program. I haven't touched a test in three months.
If your team is running A/Bs without sample sizing and your wins don't compound — we should talk. One slot open for Q3 2026.