How to read a clinical result

A science headline usually hands you one number and a mood. “AI cuts errors by 23%.” “Vaccine fails trial.” The number feels like a verdict, so you either believe it or you don’t.

But a real clinical result is never one number. It is a small bundle of them, and each one answers a different question. Learn what each is asking and the bundle stops being a wall of jargon — it becomes a short, honest sentence you can read for yourself.

We will use one real example, taken from our own coverage of a trial of an AI tool in Kenyan clinics:

the adjusted odds ratio was 0.77 (95% confidence interval 0.55 to 1.08, P = 0.13)

Five things are hiding in that line. We will take them out one at a time — and then, just as importantly, put them back together. One promise before we start: no single number here is a verdict. The meaning is in how they fit.

Endpoint

Before any number means anything, ask: a number about what?

The thing a study measures and counts is its endpoint. Everything else — the ratio, the range, the p-value — is about that one chosen outcome. Change the endpoint and every number changes with it.

In the AI trial, the main endpoint was treatment failure within 14 days — decided before the trial began, and judged by a panel of clinicians who did not know which patients had used the AI. That last detail matters: an outcome fixed in advance and scored blind is far harder to fool yourself with than one picked after the results are in.

Watch what happens when a study has more than one endpoint. In a norovirus vaccine trial we covered, the vaccine was tested two ways at once: did it stop actual illness (gastroenteritis), and did it stop infection detectable by a lab test? It missed the first and hit the second. Both results are real; they simply answer different questions. The trial’s pre-chosen main endpoint was the illness one — so, reported honestly, the trial missed its primary endpoint even though it clearly did something.

What it means: the endpoint is the scoreboard. Read it first.
What it does not mean: a good result on a secondary or after-the-fact measure is not the same as hitting the main, pre-registered one.
The trap: a headline can quote whichever endpoint sounds best. Always ask what was actually measured, and whether it was the outcome the researchers committed to in advance.

Odds ratio

Odds ratio 0.77. An odds ratio is a single number that compares two groups. The rule of thumb:

1.0 means no difference between the groups.
below 1.0 means the event was less common in the treated group.
above 1.0 means it was more common.

So 0.77 says the AI group had roughly three-quarters the odds of a bad outcome compared with the control group — if the number is real. (Hold that “if”; the next pieces are how we check it.)

The little “adjusted” — the a in aOR — means the researchers used statistics to account for other differences between the groups, here the fact that some clinics started out unlike others. You will meet close relatives of the odds ratio: the risk ratio (RR) and the hazard ratio (HR). They are computed differently, but you read them the same way: 1.0 is the “no difference” line.

What it means: a compact “how much more, or less” between two groups.
What it does not mean: it does not tell you how common the event was, or how many real people are affected. A ratio hides its baseline.
The trap: “25% lower odds” sounds dramatic. Whether it matters at all depends on how common the outcome was to begin with — which is the very next number.

Absolute vs relative

This is the one that fools almost everyone, so it is worth slowing down.

Take the norovirus numbers. Illness happened in 56.9% of the placebo group and 44.7% of the vaccinated group. You can describe that same gap two honest ways:

Absolute: 12.2 percentage points lower (56.9 minus 44.7).
Relative: about 21% lower (12.2 is roughly a fifth of 56.9).

Both are true. They describe the identical result. And they feel completely different — which is exactly why the bigger-sounding one, the relative number, is the darling of press releases.

“Percentage points” and “percent” are not the same words. Going from a 5% interest rate to 4% is a drop of one percentage point, but a 20% cut in the interest you pay. Mixing them up is how a tiny change gets sold as a huge one.

The clearest warning comes from rare events. A “50% reduction” sounds enormous. But if the event happened to 2 people in 1,000 and now happens to 1 in 1,000, that 50% is one person per thousand. Back in the AI trial, the odds ratio (0.77) sounded like a 23% improvement — yet in absolute terms the bad outcome happened to 2.0% of the control group and 2.2% of the AI group, a gap of a fraction of one percent. (Raw percentages and adjusted estimates can point in different directions when the groups differ; that is why both must be read carefully — here the adjusted 0.77 leans one way and the raw rates the other.)

What it means: always find the absolute numbers — the actual rates in each group.
What it does not mean: a big relative number does not promise a big real-world change.
The trap: relative figures with no baseline. If someone gives you only a percent reduction, ask “out of how many, and how common was it already?”

Confidence interval

95% CI 0.55 to 1.08. The single number (0.77) is the study’s best single guess. The confidence interval is the range of values that are also reasonably compatible with the data. A narrow interval means the study pinned the answer down; a wide one means “we are honestly not sure.”

One question does most of the work here: does the interval include “no effect”? For a ratio, “no effect” is 1.0. Our interval runs from 0.55 to 1.08 — it straddles 1.0. So the data are compatible with a real benefit (0.55), with nothing at all (1.0), and even with a slight harm (1.08). When the interval includes no effect, you cannot claim an effect — full stop. The study simply has not pinned the result down.

Compare the two norovirus endpoints, from the same trial:

Illness: difference 12.2 percentage points, 95% CI −4.24 to 28.61. That range crosses zero (no difference), so the result is uncertain.
Infection: difference 23.6 percentage points, 95% CI 7.4 to 38.0. That range is entirely above zero, so this one is a real signal.

Same study, same vaccine, two intervals, two different verdicts. The interval is where the honesty lives.

What it means: how sure we are, expressed as a range.
What it does not mean: the point estimate is not “the answer,” and the two ends are not equally likely — values near the middle are more plausible.
The trap: reading the single number and ignoring the range. The range is the point.

P-value

P = 0.13. The p-value answers a narrow, slippery question: if there were truly no effect, how often would pure chance alone produce a gap at least this big? P = 0.13 means about 13% of the time — common enough that we cannot rule out a fluke.

By long-standing custom, researchers often call a result “statistically significant” when P is below 0.05. It helps to know that 0.05 is a convention — a line drawn by habit, not a law of nature. Our P = 0.13 sits above it, so the AI result is “not significant.”

Two traps live here, and both are big:

“Significant” does not mean “large” or “important.” With a big enough study, a difference too small to matter can still clear the bar.
“Not significant” does not mean “proven to be zero.” Very often it means “this study could not tell.” Absence of evidence is not evidence of absence.

This is why, when you can, the confidence interval tells you more than the p-value: the interval shows the whole range of what is still on the table, instead of collapsing it into a single pass/fail stamp.

Sample size

How many people were in the study, and was that enough to see the effect they were looking for? That capacity — a study’s ability to detect a real effect when one exists — is called its power.

The AI trial enrolled about 9,700 patients, which sounds like plenty. But the bad outcome was rare — around 2% — and rare outcomes need enormous numbers to compare reliably. The authors are refreshingly blunt about it: to confirm an effect of the size they saw, you would need something like 100,000 patients. So “not significant” here mostly means “this trial was too small to tell,” not “there is definitely nothing there.”

Think of listening for a whisper in a noisy room. One quick listen tells you little; you may need many careful repeats before you can honestly say whether the whisper is real. An underpowered study is a single quick listen.

What it means: bigger studies can see smaller effects; rare outcomes demand big studies.
What it does not mean: a null result from a small study is not proof that nothing happened.
The trap: treating “we could not detect it” as “it is not there.”

Reading them together

Now read the whole line again, slowly:

the adjusted odds ratio was 0.77 (95% confidence interval 0.55 to 1.08, P = 0.13)

The endpoint tells you what was measured (serious treatment failure, judged blind, within 14 days). The odds ratio and the absolute rates tell you how big the effect looks — and the absolute rates keep the ratio honest (2.0% vs 2.2% is tiny). The confidence interval tells you how sure we are (not very — it includes “no effect”). The p-value warns you not to bet against chance (0.13 is easy for luck to produce). And the sample size tells you which kind of “no” this is (too small to tell, not proven nothing).

Put together, that intimidating line says something quite precise and quite modest: in this one trial, the tool might help a little, might do nothing, and we cannot yet tell which — and to tell, you would need a far larger study.

That is not a failure. It is an honest result, reported honestly. No single number in the bundle could have told you that. You needed all of them, read together — which is the whole point, and the reason we never print the number without the plain-language reading beside it.

Six questions, not a formula

There is no scoring recipe that turns a result into a verdict — anyone who offers you one is selling something. What you can carry with you is a short list of questions. They do not produce an answer; they keep you honest.

What was actually measured, and was it decided in advance? (endpoint)
How big is the effect — and what are the real numbers in each group? (odds ratio; absolute vs relative)
Does the confidence interval include “no effect”?
What is the p-value really saying — and is “significant” being confused with “important”?
Was the study big enough to see the effect it was hunting for? (power)
And the question behind all the others: what does this study not show?

Ask those, in that spirit, and you no longer need a headline to tell you what a study means. You can read it yourself.

About this guide

This is an evergreen explainer, not coverage of a single paper. It is prepared with AI assistance and human editorial review and revised over time; the date above is when it was last checked. It teaches how to read the numbers — it is not medical or statistical advice.