Owen Yang

it is a little annoying when we apply for any research fund or ethics, we are not only asked to do a sample size calculation, but also increasingly asked the sample size calculation to be signed off by a statistician. To be fair, once we have some basic understanding, we do not really need to ask a statistician about the sample size calculation in most of our studies. It is certainly true for a good old cohort study.

Power calculation 101

The number of people needed depends on two main factors and a few technical factors. For a binary outcome (a yes-or-no outcome), for example death or development of cancer, the two main factors are the effect size (how large the effect or the association is) and the effective event number.

The event number in this case would be the number of people who eventually develop the outcome (death or cancer). Here I put ‘effective’ event number here because what tend to matters is the number that occurs in your smallest exposure group. If you are comparing liver cancer risk between 100 patients who received blood transfusion and 10000 patients who did not received blood transfusion, it would be the cancer number in your 100 patients who received blood transfusion. This is because that would be the weakest link, and so would be the number that determines your total cohort size.

Therefore, it is the effect size and the event number that are the major factors of cohort size needed. The larger your effect size, the smaller the event number is needed. Imagine you do not need a large cohort to make sure the difference between a 10% risk in groups A and a 1% risk in group B (a relative risk of 10) may be observed due to randomness, but you do need a large cohort to make sure the difference between a 2% risk in group A and a 1% risk in group B (a relative risk of 2) may be observed purely due to randomness.

For a prospective study, such as a cohort study or a clinical trial, the event number will be a factor of baseline group size and time. Imagine that 100 people to be followed up for 10 years (100*10) could have a similar event number to 200 people followed up for 5 years (200*5), because they both add up to 1000 person-years. Nevertheless, we all know that 10 years and 5 years may have different clinical meanings, and therefore we need to use our judgement to decide what the sweetest way is to achieve that expected event number.

The technical settings

The technical settings that affect sample sizes are our p value cut-off and our false negative rate that we are comfortable with. These are sometimes called type 1 and type 2 errors. Usually they are at the level of 0.05 (p-value) and 0.2 (false negative rate or 0.1-0.3). When we are more confident with these numbers perhaps we could be the judge of it instead of using the default.

Then there should be plenty of easy-to-use programme or on-line resource for power calculation such as this one. They tend to get things complicated and ask you a bunch of questions, but keep your mind clear, what they try to get at is your effective effect number. For example they might ask you the ratio between exposed and unexposed, and then ask you what you expect the event rate is in the exposed and unexposed. With this information the calculator just try to figure out the weakest link, i.e. the effective event number.

If the exposure or the outcome is continuous (i.e. not binary or categories)

For a medical study I generally try to avoid having an outcome that is a continuous variable, but there are scenarios this is legitimate, such as blood pressure. Anyway, a standard deviation (i.e SD) of the continuous variable is basically a surrogate to the event number (or vice versa). Imagine when the standard deviation is small, it is much easier to tell the difference between the two groups. For a standard deviation of 0.1 in each group, it would be relative easy to tell an observed difference of 1 between group A and group B is not due to random error, but for a standard deviation of 1 in each group it would be much more difficult to conclude a that an observed difference of 1 is definitely not due to random error.

I find that most on-line calculator does not tailor to study design in which either the exposure or the outcome is continuous. I use STATA command power, which seems to cover most scenarios.

Adjustment almost does not matter

What if we are looking for the effect after adjustment for a bunch of things? For example, does the needed cohort size different if we want to investigate the association between blood transfusion and liver cancer, but adjusted for factors such age, sex, and alcohol consumption?

The way to calculate sample size needed is not ‘affected’ because they have almost been ‘taken care of.’ If our effect size (relative risk) has changed after adjustment, the sample size depends on the effect size that we are interested in: if it is the adjusted effect size that is more important to our research, then it is the one that should be based on when calculating adequate cohort size.

This many not cover all our imagination of designs, but I believe we are in for a good start for most of our questions.