NPC Archive Item: Inadequate sample size calculation and reporting in clinical trials: what does it mean for me?

NOTE – This is an archive post from the NPC and has not been updated since first publication. Therefore, some hyperlinks may no longer be working.

12th August 2009

A review of publications of randomised controlled trials found that only a third adequately described the sample size calculations. This is needed to base the statistical validity of the study to show differences between treatments.

Level of evidence
Level 3 (other evidence) according to the SORT criteria.

Action
Healthcare professionals should be vigilant to claims made in publications of randomised controlled trials (RCTs) that may not be based on sound statistical principles. Assessing validity is difficult unless you practice it regularly. Busy clinicians should rely on independent evidence-based, critical appraisals of studies from public sector funded, trusted sources of information (e.g. NICE, Cochrane, CKS, NPC).

What is the background to this?
It is necessary to set the sample size for a clinical study comparing treatments at a sufficient level to exclude the possibility of both false positive and false negative results. When a statistically significant difference between treatments is shown, we need to have confidence that there really is a difference, and that the result was not just obtained by chance. Similarly, when no significant difference is shown between treatments, we wish to have confidence that this reflects a real equivalence between treatments.

The appropriate sample size is determined conventionally for RCTs from four parameters: the acceptable level of a type I error (i.e. showing a difference when one doesn’t really exist), the acceptable level of type II error (showing no difference when one really exists), the level and variability in response of the control group, and the difference in response that would be considered a clinically meaningful difference.

In order that the claims made in publications of clinical studies for differences between treatments can be critically evaluated, it is important that the levels of type I and type II errors are set appropriately and the assumptions used in calculating the sample size are fully described and justified in the methods section of the study.

A review of 215 reports of clinical superiority RCTs published in 2005 and 2006 in six general medical journals with high impact factors was carried out to assess the adequacy of reporting and accuracy of sample size calculations.

What did the study find?
The review found that only 73 (34%) publications adequately described sample size calculations, i.e. they provided enough data with which to recalculate sample size, the sample size calculation was accurate, and the assumptions in the control group differed less than 30% from observed data.

The authors concluded that sample size calculation is inadequately reported, often erroneous, and based on assumptions that are frequently inaccurate. Such a situation raises questions about how sample size is calculated in RCTs.

How does this relate to other studies?
According to the CONSORT statement, sample size calculations must be reported and justified in published clinical trials. The CONSORT statement states the following in explaining the reasons for this: “Reports of studies with small samples frequently include the erroneous conclusion that the intervention groups do not differ, when too few patients were studied to make such a claim. Reviews of published trials have consistently found that a high proportion of trials have very low power to detect clinically meaningful treatment effects. In reality, small but still clinically valuable true differences are likely, which require large trials to detect…… Many reviews have found that few authors report how they determined the sample size.”

Although the authors of the present review note that reporting of sample size has increased greatly over recent decades, their study suggests that there is still considerable room for improvement both in the reporting and review during the review process.

More information on evaluating the evidence from clinical trials can be found on the Evidence-informed decision-making section of NPC.

So what?
This article raises concerns about the methods used for determining, and the accuracy of, sample size calculations used in RCTs. It also raises concerns over the suitability of the peer review process prior to publication of these trials in medical journals.

It is unreasonable to expect busy healthcare professionals to be able to carry out detailed critical appraisal of clinical trials, yet alone get involved in evaluating the validity of sample size estimations. This study emphasises the need for healthcare professionals to base their prescribing decisions on evidence-based reviews of studies from public sector funded, trusted sources of information, such as those provided by NICE, Cochrane, CKS, NPC) and not merely rely on the claims made in publications of clinical trials.

Study details
Charles P, et al. Reporting of sample size calculation in randomised controlled trials: review. BMJ 2009;338:b1732. Published online 12th May 2009

Design
MEDLINE was searched for all primary reports of two arm parallel group randomised controlled trials of superiority with a single primary outcome published in six high impact factor general medical journals between 1 January 2005 and 31 December 2006. The authors checked completeness of reporting of the sample size calculation, systematically replicated the sample size calculation to assess its accuracy, then quantified discrepancies between a priori hypothesised parameters necessary for calculation and a posteriori estimates.

Results
Of the 215 selected articles, 10 (5%) did not report any sample size calculation and 92 (43%) did not report all the required parameters. The difference between the sample size reported in the article and the replicated sample size calculation was greater than 10% in 47 (30%) of the 157 reports that gave enough data to recalculate the sample size. The difference between the assumptions for the control group and the observed data was greater than 30% in 45 (31%) articles and greater than 50% in 24 (17%). Only 73 trials (34%) reported all data required to calculate the sample size, had an accurate calculation, and used accurate assumptions for the control group.

Sponsorship
Non-commercially funded

Feedback
Please comment on this blog in the NPC discussion rooms, or using our feedback form.

Make sure you are signed up to NPC Email updates — the free email alerting system that keeps you up to date with the NPC news and outputs relevant to you.