Owen Yang

If you start a cohort, you need to unleashed the power of repeat observations.

Basic level: repeat observations for outcomes

Most cohort studies, medical or non-medical ones, are set out to investigate predictors of a certain outcome of interest. For example, a cohort of rheumatic arthritis may be designed to investigate which immune profile may be associated with a higher rate of treatment success. Treatment success or failure in this case would be the outcome of interest. We may have multiple outcomes of interest such as side effect, pain, quality of life, or death.

At the time of recruitment of the cohort (i.e. at baseline), we will generally collect information on the predictor, or whichever factors that we think might be relevant. After recruitment, it would be a follow-up game, during which we gather information to know whether and when these outcomes happen. The total period or the frequency of follow up is largely decided by how often and how long it is required for the outcomes to happen. If the outcome changes frequently, such as numbers of inflamed joints, then a sensible frequency of follow up (such as every 3 months) is needed to capture these changes.

It would be smart to obtain sufficient consent at the beginning of the study as to how often, and in which way, the participants are likely to be followed up. Sometimes participants can agree to share their medical records in the next few years at the very beginning, if the outcome is something that can be captured there.

Repeat observations for exposures

Compared to follow up for outcomes, follow up for exposures (or predictors) is a much more interesting business, but is not that generally appreciated when we apply for funding. The reason that it is interesting is because it leads to more creative questions, and not everyone has the imagination to ask these questions or have the set of minds to comprehend why these questions can be important.

Without a repeat observation for exposures, what we are left is a snapshot of exposure at baseline. If a high expression of interferon-gamma receptors at baseline is associated with a poor outcome of rheumatoid arthritis, what we know is that high expression of interferon-gamma receptors at baseline is a predictor of poor outcome over the period of time. It can be interpreted pretty much care-free if you are a statistician. Nothing more. Nothing less.

With repeat observations, we are asking questions about details that can reveal more truth behind the biology. We would like to know how stable the expression of interferon-gamma receptors is over time, because if it is fairly volatile, a one-off measure of the expression at the baseline is not representative of the level of the period, and we would want to know whether those who remain high expression over time tend to have even poorer outcome. We would probably also like to know whether the change of expressions reflect treatment over time, and whether the risk of poor outcome can be reversed when the expression was lowered.

Short-term or long-term questions on repeat observations for exposure

If these all sounds very complex but you do wish to explore the power of repeat observations, let me oversimplify by emphasising there are two types of questions here that one should not be getting confused with.

The first question is a short-term association, in which we ask whether a recent exposure or a recent characteristic (such as obesity) can be a predictor of the outcome. In this case, the repeat observations are useful because we can have the most up-to-date information so that we can actually investigate the question about this short-term association.

The second question can also be benefited from updating of exposure information from time to time, but instead we ask a question about a long-term association. We ask whether an exposure or a characteristic can predict the outcome in a longer-term, for example whether obesity can predict heart attack in 10 years. In this case, the repeat observations are useful because we can explore whether the obesity characteristic is stable over time in this population, but we do not use the most recent information to do the prediction. We can use the baseline information, or sometimes there are statistical methods to be used to investigate the long-term association with a ‘usual’ body size.

It is quite common that a reviewer of your research article just randomly throw you a question asking why you should or should not use updated information in your analysis. Remember to stand your ground and not to be easily swayed.