There are a few ways to categorise a clinical prediction model. To keep it simple, here we mainly care about the combinations of predictors in a statistical sense. 

Think

What should medical researchers do if they find a risk factor of diabetes that no one has reported before? What do they do if they find an interaction between two factors that may jointly affect the risk of diabetes, if not reported before?

Any random finding can be due to chance, due to confounding, or due to imperfect sampling that results in the finding not being able to be generalised into the intended population. Findings can also be due to genuine mistakes. Even the findings from clinical trials may not be reproducible. This is bread and butter of medical science. 

Is there any reason that we should not assume this when developing a clinical prediction model?

Type 1 known major drivers of risks plus factors that are not known drivers but want to ‘improve model fit’

If there are known major drivers of risks in the intended population, for example age, sex, and BMI that are known to predict diabetes in the general population, I really feel there is little justification for advanced model exploitation approaches among these factors at the time of developing a clinical model. Here I mean using automatic algorithm to investigate where there is hidden non-linearity or hidden interactions (e.g. special combination of factors that resulted in high risk) when one try to develop a clinical prediction model.

There is also little justification to include a new factor that are not previously known to improve model fit. The model developer is unlikely to be an expert in the new factor. For example, if someone find blood level of triglyceride in their dataset is a predictor of diabetes, it is unlikely that the developer understands the variation of triglyceride measurement, and the natural variation of triglyceride within and between individuals.

Blood pressure is another common an interesting example. It will be very common for someone to find that a randomly measured blood pressure in a dataset can explain additional risk of cardiovascular or non-cardiovascular risk. However, the implication of using random blood pressure is really unclear. If you actually measure your own blood pressure 10 times a day for 14 days, you could start to appreciate the difficulty to incorporate this into a clinical model. Of course, if you are just a model developer (more of a model fitter), blood pressure is just one of your 100s of parameters, you would naturally feel it is not your job for you to understand everything.

This is why a new finding should be treated, tested seriously, and reproduced. A new finding should not just be a random byproduct in the development process of a clinical model. One cannot suddenly announce blood pressure is a risk factor for diabetes without some background experience of the ‘subject matter.’ If the new finding is just a marker of something unknown, it takes broad experience and knowledge to assess whether it is appropriate to assume that they will be reproducible. 

There are also ethical consideration that this finding may be confounded by social inequality, although this would be beyond the scope of this discussion here. 

Type 2 omics with and without major drives 

Another common type of clinical models starts with -omics. This can be genomics, but can also be any analysis that has large predictor-to-case ratio, for example 100s or 1000s of clinical information. Sometimes this is just a pure -omics only model, but most of the time the ambition is a prediction model taking everything into account.

Here we mean screening 10s, 100s, or 1000s factors of different characteristics to optimise the model. We do not mean using the few robust predictors that came from previous -omics model, because this would only mean a few (potential) predictors. For example, if we screen 1000s factors in previous projects and found apoE allele genotype is a robust predictor of dementia in a few populations, and then develop a clinical model based on apoE and other factors known to be predictors (such as age, sex, smoking), then this is no longer an -omics approach that we discuss here.

So here we mean that some people is ambitious enough to develop a model straight from many -omics and non -omics predictors to an extent that the number of candidate predictors (and various ways to combine them) is much larger than the number of cases (or events, or outcomes).

I have to say this is really challenging.

it may not be very challenging to find some ‘positive signals’ and get excited, but reproduce this exciting finding is normally very challenging. 

Being responsible for using any finding from this would also be challenging. Unfortunately most data specialists who have the strongest voice in this field do not need to take the responsibility of individual misclassification. I would like to advise the specialists to reflect how much they react to under-diagnosis and over-diagnosis when they are the patients. When we apply our results to real lives, we need to think in an absolute term, not in false discovery rates.

The current practice in the -omics approach is using the theoretical reproducibility, such as p value or false discovery rate, or their variants, which ignores the fact that the main source of errors are not random. 

Trying to assume reproducibility of -omics on top of the reproducibility of a clinical reproduction model, both without a real evidence of reproducibility, will become quite opportunistic. 

This is the best data we have got (and why AI/ML has little to do with it)

It would be surprising not to mention AI (artificial intelligence) at all. AI specialists may laugh at it because this is probably more of a machine learning (ML) problem. ML in medicine so far is only used to fit the model, yet our main trouble here is model fitness is not a good reflection of reproducibility.

In fairness, ML can be used to optimise reproducibility, but no one bother to collect the data for ML to optimise. If you think about it, data collection is hard work, and ‘data scientists’ do not tend to collect data. So far the main reason we do the whole AI/ML or prediction model is because we do not want to collect the relevant data. So when someone says how well AI/ML has been used successfully in other fields, this may not be true because the approaches may not be the same. Successful ones, such as social media, tend to collect ongoing data and keep trying different interventions, to prove and to improve the prediction. 

Reproducibility is likely to be achieved when the predictors are known risk factors because this is how science usually works.