Owen Yang

For those who need to calculate sample size now. Just go to sampsize at once and you probably do not need to read what I write below.

Non-inferiority studies are more common

I would not say it is novel, but I would say a non-inferiority study has become more often. I am immediately imagine two reasons why one would need to conduct a non-inferiority test.

In developing a new treatment where the old treatment has already been effective, it would sometimes be unethical to compare the new treatment with placebo. When it is not feasible to prove the new treatment is better than the old treatment, with a non-inferiority study you can now ‘legally’ prove the new treatment is not inferior to the old treatment.

In developing a device, or a diagnostic test, sometimes you would just want to upgrade the test, or prove that the test can be slightly modified without its effect being affected. You would like to demonstrate that despite some minor changes, it is still the same test.

Sample size calculation consideration of a non-inferiority study

The key difference in sample size calculation in a non-inferiority study and in a normal study (in this context commonly referred to as ‘superiority study’ is to decide a meaningful non-inferiority margin. The non-inferiority margin is basically a maximal difference between the two groups that can be ignored, and therefore any difference within this margin would be considered trivial. Clearly this is a clinical judgment. However, if you think about this, it is not much different from a normal study, as in a normal study you would still decide a margin beyond which the difference is meaningful (i.e. effect size).

As you can imagine without actually calculating anything, the smaller the non-inferiority margin, you are detecting a smaller difference, and so you will need a larger sample size. The only differences are although you would like to have the power to detect the difference, what you hope for is not detecting the difference with this statistical power because it means there is not difference. For a non-inferiority trial you also do not care about whether it is superior or not, and therefore you are only testing one side of the difference. But do not worry about this, because most software has already taken care of this.

Other inputs

in some situations you will need to provide the distribution of your outcomes, such as expected proportions (if it is a yes/no outcome) or expected standard deviations (if it is a numerical outcome). If you are conducting a head-to-head trial then the expected effect size should be 0 (or relative risk of 1). Some softwares require you to give that information in addition to your non-inferiority margin.

If it is a diagnostic test but not a treatment trial, it will usually not be a head-to-head comparison, and it will be a more complex matter. I will need to think about how to make it simple, and so not to be covered here.

Other settings: to re-cap

You can tweak your ‘default’ settings in some situations, for example your false positive rate and false negative rate of test. The use of these terms is not entirely politically correct but I find it least jargonist. Depending on your perspective, I usually say your α=0.05 is your false positive rate because it is the rate which you find positive when there is no difference, and so p=0.02 vaguely means the chance of false positive is very small. Your β=0.1 means a statistical power of (1-β)=0.9. If you are okay with my terminology, then it means when your test is negative, the false negative rate is 0.1. Therefore, when you find no difference (or non-inferiority) the false negative rate is 0.1.