There are a lot of customer satisfaction metrics in circulation. NPS, CSAT, CES, the star average that sits on your Google profile. Some companies run several of them at the same time without anyone being able to explain what the difference is supposed to be. Others pick one, paste the number on a slide, and call that the customer voice.
This guide is for the people in between, who suspect their metric isn't quite telling them what they think it is, or who are about to set one up and want to know what they're getting into. It walks through how each of the main customer satisfaction metrics is defined, what each one captures, and where each one falls down. Then it gets to the part most metric explainers skip, which is whether picking the right metric is even the question worth asking. If you came here looking for "NPS vs CSAT", the short answer is that they measure different things and neither is a substitute for the other. The longer answer is below.
On this page
- Why metrics get measured but rarely understood
- NPS: what it measures and what it doesn't
- CSAT: what it measures and what it doesn't
- CES: the underrated middle child
- Star ratings: the metric everyone uses but few read
- Comparison table
- When each metric is genuinely useful
- Why "which metric?" is usually the wrong question
- What to do alongside (or instead of) a single number
- Frequently asked questions
Why metrics get measured but rarely understood
Customer satisfaction metrics have a strange life. They get adopted because someone wanted a number on a slide, and then they're tracked for years without anyone going back to check whether the number still means what they thought it meant. The score becomes the thing in itself, and the actual question it was meant to answer fades into the background.
Part of that is structural. Boards prefer comparable numbers across quarters and marketing prefers something tidy for case studies. Both audiences would rather have a single number than a paragraph of nuance, so the metric persists, and the work of understanding what it actually captures gets pushed to whoever is unlucky enough to be reading the open-text comments.
The other part is that each of these metrics was originally designed to answer a specific question, and most companies end up using them outside of that context. NPS was a loyalty proxy, CSAT a transactional check, CES an effort-of-resolution measure, star ratings a public-reputation signal. Treating any one of them as your single customer satisfaction number is a bit like picking a thermometer to measure rainfall. It will give you a reading, and the reading might just not be about what you think.
NPS: what it measures and what it doesn't
NPS
Likelihood to recommend, on a 0 to 10 scale.
CSAT
Satisfaction with a specific touchpoint, typically 1 to 5.
CES
How easy it was to handle an issue, typically 1 to 7.
Definition and formula
Net Promoter Score was introduced in a Harvard Business Review article in 2003 by Fred Reichheld at Bain & Company. The premise is one question:
How likely are you to recommend us to a friend or colleague?
Customers answer on a 0 to 10 scale, then get bucketed:
- Promoters: 9 or 10
- Passives: 7 or 8
- Detractors: 0 to 6
The score is calculated as:
NPS = % Promoters − % Detractors
The result is a number between −100 and +100. Passives don't appear in the formula at all. They're treated as neutral, which means a customer who gave you an 8 (a fairly enthusiastic answer in most contexts) doesn't count for or against you.
What it measures and what it misses
NPS is genuinely useful as a directional signal over time, in the same population of customers, asked the same way. If it's climbing slowly over six quarters, something is probably going right; if it's falling, something is probably going wrong. "Would you recommend this" is also a slightly higher bar than "are you satisfied", which is part of why the metric caught on.
The 0 to 10 scale, though, is interpreted very differently across cultures and sectors. A 7 from a North American customer often means "fine, no complaints", whereas a 7 from a German or Dutch customer can mean "actually quite good". The promoter/passive/detractor cutoffs are also arbitrary; there's no statistical reason a 6 is a detractor and a 7 is a passive, but two customers a single point apart on the scale get classified into completely different categories with opposite signs in the formula.
The question itself is about a hypothetical social act rather than the actual experience. Whether a customer would recommend you depends on who they know and whether their friends even need what you sell. Someone can love your product and never recommend it because their circle has no use for it. Someone else can be lukewarm and still recommend it because you asked on a day a friend mentioned needing something similar. The longer post on the specific ways NPS gets misread has the extended version of that argument.
CSAT: what it measures and what it doesn't
Definition and formula
Customer Satisfaction Score (CSAT) is the older and more transactional cousin of NPS. It asks a direct satisfaction question, usually right after a specific interaction:
How satisfied were you with [the product / your support call / your stay]?
The standard scale is 1 to 5: Very dissatisfied, Dissatisfied, Neutral, Satisfied, Very satisfied. The score is the percentage of respondents who picked 4 or 5:
CSAT = (count of 4 and 5 ratings ÷ total responses) × 100
So a CSAT of 82% means 82 out of 100 customers rated their experience a 4 or 5. Some implementations use 1 to 7 or 1 to 10 scales, but 1 to 5 is by far the most common, and the "top two boxes" calculation is the convention.
What it measures and what it misses
CSAT is at its best when asked about something concrete and recent, like a support ticket that just got resolved or a hotel stay a guest just checked out of. The respondent is rating the thing they just did, while the experience is fresh, and the answer reflects that specific event rather than a vague impression of the brand overall. "How satisfied were you?" is also more intuitive to respondents than NPS, and gets answered without much thought.
The "top two boxes" calculation throws away information, though. A customer who picked 3 (neutral) is treated the same as a customer who picked 1 (very dissatisfied) as far as the headline number goes. Both fail to count. A drift from a population of mostly 5s to a population of mostly 4s shows up as no change in CSAT, even though something has clearly shifted.
CSAT also has a positivity bias. People are generally more willing to say they were satisfied than to say they were dissatisfied, especially when they don't feel strongly either way. Customers who genuinely had a bad time often just don't respond, which means CSAT reflects the satisfied middle plus the small number of unhappy people who took the time to complain. The silent dissatisfied are invisible.
Like NPS, CSAT tells you a number without telling you why. A CSAT of 78% is not actionable on its own; the actionable part is in the open-text question you put next to it, which most companies treat as optional. CSAT is strongest right after a specific, bounded interaction, and weakest as an overall ongoing satisfaction proxy detached from any specific event.
CES: the underrated middle child
Definition and formula
Customer Effort Score (CES) was introduced in 2010 by researchers at CEB (now part of Gartner). The argument was that loyalty is driven less by delight and more by reduced friction, so the metric measures how much effort the customer had to put in to get their problem solved.
The classic CES question is:
The company made it easy for me to handle my issue.
Respondents agree or disagree on a 1 to 7 scale (1 = strongly disagree, 7 = strongly agree). The score is usually calculated as the average of all responses, sometimes reported as the percentage who picked 5, 6, or 7 (the agreement end). Some implementations use a 1 to 5 scale or flip the question to "how much effort did it require?", but the constant is that you're measuring perceived effort to achieve a specific outcome.
What it measures and what it misses
CES is good at detecting friction. If your CES on the returns flow is bad, the returns flow is probably annoying; if your CES on a support interaction shows low effort, the interaction probably did its job. It's also more action-oriented than NPS or CSAT, because "make this easier" tends to be a clearer instruction than "make customers more loyal".
The catch is that CES only makes sense for resolution-oriented experiences. Asking for it after a meal at a restaurant is a category error: the customer wasn't trying to handle an issue, they were eating dinner. CES belongs in support, returns, signup, account management, anywhere the customer was attempting a discrete task. The 1 to 7 scale is also less intuitive than the others; respondents have to parse a Likert-style agreement statement, and response rates tend to be lower as a result. SaaS companies in particular find it useful for evaluating onboarding, where the cost of a high-effort experience shows up directly in churn.
Star ratings: the metric everyone uses but few read
Definition and formula
Star ratings need almost no introduction: a 1 to 5 scale, often with half-stars, displayed as an aggregate average. Public-facing star ratings are usually the rolling average of all reviews ever submitted, with no time decay and no segmentation.
Star average = sum of all star ratings ÷ count of ratings
Some platforms weight by recency (Yelp does some of this) or by reviewer quality, but the headline number is almost always a simple mean.
What it measures and what it misses
Star ratings carry public-facing reputation, in the sense of what someone searching for this business sees at a glance. A 4.8 average looks better than a 4.2 to a prospective customer who hasn't read a single review. The aggregate has commercial weight whether or not it accurately summarises the experience, and stars are a low-effort entry point for customers who would never fill in a form.
What stars miss is almost everything else. A 4.7 average tells you nothing about what's good or bad. A 4.7 made of forty 5s and ten 3s tells a completely different story than a 4.7 made of fifty straight 5s, but both display identically. The post on what a 4.2 doesn't tell you goes into this in detail.
Public stars also follow a specific psychology. Customers tend to leave reviews when they feel a duty to recommend or to warn, so the middle gets clipped. You see lots of 5s, some 1s, and a thin layer in between, which isn't how the actual customer experience is distributed. There's a longer piece on the four-point-eight star average phenomenon for the extended version. The aggregate is also decoupled from time: a business that had a tough patch two years ago and has been excellent since carries the old reviews in the average forever. As an internal management metric, the average on its own is close to useless. The work is reading the actual review text.
Comparison table
| Metric | Question format | Scale | Best for | Worst at |
|---|---|---|---|---|
| NPS | "How likely are you to recommend us to a friend?" | 0 to 10 (Promoter / Passive / Detractor) | Tracking long-run loyalty trends in large populations | Telling you why anything changed; small sample sizes |
| CSAT | "How satisfied were you with [event]?" | 1 to 5 (top-two-box reported as %) | Transactional satisfaction immediately after a specific event | Aggregate ongoing satisfaction; ignoring middle responses |
| CES | "The company made it easy to handle my issue." | 1 to 7 (agreement) | Detecting friction in support, onboarding, returns, cancellation | Anything that isn't a discrete resolution-oriented task |
| Star ratings | "Rate your experience" | 1 to 5 stars | Public reputation, prospective-customer signalling | Internal diagnostics; spotting what's actually wrong |
When each metric is genuinely useful
Where each one earns its keep, in short:
- NPS works best when you have hundreds or thousands of responses per quarter and treat the trend as the signal rather than the absolute number. For small businesses with a few dozen responses a month, the noise swamps anything you're trying to read.
- CSAT is at its best right after a specific, bounded interaction, paired with one open-text question. The score is a trigger for which responses to read first; the text is where the substance lives.
- CES belongs in resolution flows: support, returns, cancellation, signup, account management. Don't use it for a general experience survey, because the framing only makes sense when there was an effort involved.
- Star ratings are the public-facing summary they were always meant to be. Track the aggregate to know how prospective customers see you, but don't treat the average as an internal management metric.
The common mistake is picking one of these as a single primary number and using it for purposes it wasn't designed for. The metric will give you a reading. The reading often won't match what you wanted to know.
Why "which metric?" is usually the wrong question
Here's the awkward part of writing a guide like this. The premise of "which customer satisfaction metric should I use?" carries an assumption that's usually wrong: that one number, well-chosen, will tell you what's going on with your customers. It almost never does.
The metric at best is a flashing light pointing you towards where to look, but it isn't the looking itself.
The numbers are summaries. They compress a lot of information into a single value, and the compression destroys most of what's useful. A CSAT of 82% is the same number whether your customers are praising your speed and complaining about your prices, or whether they're praising your prices and complaining about your speed. The score doesn't carry the substance.
What changes how you run a business is the substance: the specific complaint about wait times on Saturdays, the cluster of customers in one week who all mentioned the same broken thing. Those are the things that lead to actual decisions. The metric at best is a flashing light pointing you towards where to look, but it isn't the looking itself.
Companies that get real value from customer satisfaction metrics treat them as triggers rather than verdicts. When the score moves, you go and read what's underneath. When it stays stable, you still go and read what's underneath, because a stable score often means changes are cancelling each other out at the aggregate level and you probably want to know which ones.
There's a related point in the post on why satisfied customers still leave: a score saying customers are satisfied isn't a guarantee they'll come back. Satisfaction and loyalty are different things, and a metric that captures satisfaction at one moment isn't necessarily predictive of behaviour at the next.
Picking the right metric matters less than building a habit of reading the qualitative feedback it's supposed to be summarising. If you only have time for one of the two, read the comments and skip the score.
What to do alongside (or instead of) a single number
A practical version of what to do with all this.
Pair every metric with an open-text question. Always. The "why" question is doing the work; the score is just the cover sheet. "How satisfied were you with your stay?" followed by "What made it that way?" is worth many times more than the satisfaction question on its own.
Read the qualitative feedback. Not skim. Read. For a small business with manageable volume, that means actually going through the comments. For higher-volume operations, it means using a layer of analysis (a person whose job is reading, or a tool that summarises the themes) so the substance doesn't get lost in the score.
Treat trends as the signal, not absolute values. A CSAT of 80% means almost nothing in isolation. A CSAT that's been sitting at 85% for six months and just dropped to 78% means quite a lot. When a number moves, the first move is to read what's underneath it.
Don't average across contexts, and don't optimise for the metric. A CSAT that mashes together your support team and your onboarding flow is essentially meaningless, because each is a different experience with different dynamics. And once a team starts closing tickets faster just to lift the score, you've gained a better number at the cost of actual quality of resolution. Once a measurement becomes the target, it stops measuring.
This is where AI-assisted analysis earns its place. The bottleneck for most small businesses is reading hundreds of responses a month and turning that pile into a useful summary. The numerical metrics don't address that bottleneck at all; they give you a number that says "something is happening", and you're back to reading. A tool that can read the comments for you and group them into themes, in plain language, is closer to the actual job than any single satisfaction score. Qria is built around that workflow: structured forms feeding into AI summaries and conversational analysis, so the metric becomes one input among several rather than the entire output.
The metric on its own won't run the business. The substance behind it, read regularly, might.
Frequently asked questions
What's the difference between NPS and CSAT?
NPS asks about a hypothetical social act ("would you recommend us") on a 0 to 10 scale, used as a long-term loyalty proxy. CSAT asks about transactional satisfaction ("how satisfied were you") on a 1 to 5 scale, immediately after a specific interaction. They measure different things at different timescales and aren't substitutes for each other.
Is NPS still relevant in 2026?
It's still widely used and still has a niche where it works: high-volume, consistent measurement treated as a long-run trend signal. For most small businesses, NPS ends up being asked to do work it can't really do, and a CSAT-plus-open-text approach gets you more useful information per response.
What's a good CSAT score?
The benchmark people quote is around 75 to 85% for service businesses, but the number varies significantly by industry and by how you phrase the question. Any absolute CSAT number on its own tells you very little. The trend within your own business and the open-text comments behind the score are far more informative than any industry benchmark.
When should I use CES instead of CSAT?
When the customer was trying to accomplish a specific task and you want to know whether your process helped or got in the way: support resolutions, returns, cancellations, signup, account management. CSAT is more appropriate for evaluating whether the experience as a whole was satisfying. CES is more appropriate for evaluating whether a particular flow was easy.
Can I use NPS, CSAT, and CES at the same time?
Technically yes, and large companies often do. The risk is that you end up with multiple numbers, none of which gets the time to actually be understood, while respondents start ignoring surveys that keep asking similar questions in slightly different forms. If you use more than one, run them at different moments (CES on support resolution, CSAT on transactions, NPS on a longer cadence) so each is measuring something distinct.
What's better than NPS, CSAT, or CES?
There isn't a single better metric. What actually moves the needle is reading the qualitative feedback alongside whichever score you're tracking, and using the score as a trigger to look at the substance underneath. Whichever metric you pick, the half of the form pulling its weight is the open-text follow-up.