E-commerce · Data Analysis

Cohort Analysis for E-Commerce: How to Read Retention Like an Operator

Blended retention lies. Cohort analysis shows you when and where customers leak — how to build the table, read the curve, and turn it into a decision.

9 min read

Your blended retention rate is lying to you. Not on purpose — it’s just an average, and averages hide exactly the thing you need to see: whether the customers you’re acquiring today behave better or worse than the ones you acquired six months ago. That’s what cohort analysis reveals, and it’s the difference between scaling a healthy brand and pouring money into a leaky bucket.

This is a deep dive within the e-commerce analytics guide. If you can already calculate a retention rate, this is how you make it actually useful.

What a cohort is

A cohort is a group of customers who share a starting point — almost always the month they made their first purchase. The “January cohort” is everyone whose first order landed in January. You then track each cohort forward in time and watch what percentage comes back in month 1, month 2, month 3, and so on.

The power is in the comparison. Because every cohort starts at its own month-zero, you can line them up and ask: is the March cohort retaining better than the January cohort at the same age? If newer cohorts retain worse, something you changed — a discount-driven promo, a worse traffic source, a product slip — is quietly degrading the business, and the blended number won’t show it until it’s a crisis.

Why blended retention hides the problem

Imagine retention is collapsing in your newest cohorts because you started buying cheap, low-intent traffic. Meanwhile your older cohorts — loyal, high-intent customers — keep repurchasing and prop up the average. Your blended retention rate barely moves. Everything looks fine. Then, two quarters later, the older cohorts age out, the bad cohorts dominate, and revenue falls off a cliff “for no reason.”

The reason was visible in the cohort table the whole time. Aggregates tell you what is; cohorts tell you what’s coming.

How to build a cohort table

You need three columns of data: a customer ID, the order date, and ideally order value.

  1. Assign each customer to an acquisition cohort — the month of their first order. This is fixed forever; a customer acquired in January is always in the January cohort.
  2. Bucket every subsequent order by cohort age — months since that customer’s first order (month 0, 1, 2…).
  3. Build the grid. Rows are acquisition cohorts (Jan, Feb, Mar…). Columns are cohort age (month 0, 1, 2…). Each cell is the share of that cohort that purchased at that age.

The result is a triangular table: older cohorts have more columns filled in because they’ve had more time. Read across a row to see one cohort’s decay curve; read down a column to compare cohorts at the same age — that’s where trends hide.

If you live in Shopify and Python, the same groupby/nunique mechanics from the CRR calculation extend naturally to cohorts — you’re just grouping by acquisition month and age instead of a single year.

How to read the curve

Three things to look for:

  • Does it flatten? Healthy retention curves drop fast after month 0, then flatten into a stable plateau — that plateau is your loyal repeat base, the engine of LTV. A curve that decays all the way to zero means you have a leaky bucket and no real retention.
  • Is the plateau rising or falling across cohorts? If newer cohorts plateau higher, your product and acquisition quality are improving. If lower, you’re buying worse customers — fix that before you scale spend.
  • Where’s the cliff? Many brands have a specific month where customers churn (e.g. right after a consumable would run out). That’s a precise trigger for a win-back flow or a subscription nudge.

Retention cohorts vs. revenue cohorts

Two flavors, both worth running:

  • Retention cohorts track the percentage of customers who come back. Good for measuring loyalty and product-market fit.
  • Revenue (or LTV) cohorts track cumulative contribution margin per customer over time. Good for answering “how long until this cohort pays back its CAC?” — which feeds straight into your LTV:CAC ratio and your acquisition ceiling.

Revenue cohorts are the ones that tell you whether you can afford to spend more, because they show payback timing, not just loyalty.

Turn it into a decision

Cohort analysis isn’t a chart you admire — it’s a prompt:

  • Newer cohorts retaining worse? Audit your recent traffic sources and promos before scaling. Benchmark the gap with the Repeat Purchase Scorer.
  • Sharp drop at a specific month? Build a lifecycle flow timed just before it.
  • Plateau improving? You’ve earned the right to spend more on acquisition — raise the CAC ceiling.

The bottom line

Blended retention tells you the business was fine. Cohorts tell you whether it will be. Build the table once, watch the column trends, and you’ll see problems two quarters before they hit the P&L.

When your data won’t cooperate — orders in Shopify, customers fragmented across tools, no clean way to build the cohort — that stitching is the e-commerce analytics work I do. Book a call and we’ll get you a cohort table you can trust.

NOTES · WEEKLY

One Saturday morning email.

What I worked on this week, what I read, one decision I made. No fluff. ~4 min read.

NO SPAM · UNSUBSCRIBE IN ONE CLICK
WANT TO TALK SHOP?

Book a 20-minute call. We'll cover the problem and whether I'm the right fit.

BOOK A 20-MIN CALL