When a clinician submits a research paper and get reviewed by peers, the word ‘misclassification’ sometimes comes up as a query to the research validity. Sometimes it is a clinician reviewer who is very worried about misclassification in a study.

Misclassification literally

When we speak about misclassification as a jargon in medical research, it literally means that some people are classified incorrectly. For example, when we are interested in the risk of stroke among individuals who have diabetes, we need to classify individuals into either having or not having diabetes. We also need to classify individuals into those who had stroke or did not have stroke. For all sorts of reasons, misclassification errors are inevitable.

Occasionally we may not mean classifying individuals but other things such as time, group, or diagnosis. But the concept would be similar.

Clinical guidelines are not always best guides for classification

A common blindspot that clinicians have is that we mix up guidelines and the truth. When we work for some giant institutions, such as the NHS, we have to follow guidelines. The diagnosis of diabetes could be a combination of HbA1c over a period of time, age range, or additional features that may suggest the patient may or may not have diabetes. This is based on science in part, but also based on cost-effectiveness or political will.

It is not uncommon for a large study to use self-reported diabetes as a way to classify patients, and this itself will be enough for clinician to say this study is rubbish. Certainly, it will be a good practice to have a formal or informal assessment how ‘accurate’ this self-reported diabetes is, and how it may or may not impact the study itself. But in many cases (and many countries) a self-identified cancer would be sufficient for a topic like this. And to be fair, most clinicians do not really know the accuracy of this clinically diagnosed diabetes. We just have to be open-minded once in a while.

Clinicians’ strength

A clinician can actually shines when one truly see beyond the clinical definition of diabetes. For a non-biologic person, diabetes is a word that itself has its importance, and when they assess misclassification, they tend to use the clinical diagnosis as the gold standard and compare against it. By contrast, a biologist (or a good clinician) may not feel ‘diabetes causes stroke’ itself a meaningful sentence. Diabetes is a complex process, and what could be more important is which process in diabetes actually causes stroke (if there is any). Is it the genetic determinant of diabetes, the treatment of diabetes, or the consequences of diabetes that causes stroke?

This is slightly off-track, but I have had many conversations with mathematicians or statisticians who try to use advanced methods to see whether body mass index causes something, being cancer, heart attack, and so on. It is really difficult for me to comprehend that body mass index, an indicator, can cause something. However, I do understand that sometimes we need to pretend, or to put aside some issues in order to move things forward.

Classic concerns regarding misclassification

Diluting will be the first layer of concern for a soft definition. If only 50% of individuals self-reported as having diabetes actually have diabetes, then the stroke risk will be diluted by the 50% from the general population.

Selection bias will be a next layer of concern, and it comes in different forms. In the dilution example above, the 50% dilution may not necessarily be from the ‘general’ general population, but a selected type of individuals who for some reason believe they have diabetes. The selected group of individuals would, depending on their risk of stroke, ‘contaminate’ the observed risk in this diabetes group.

In another form of selection bias, if the definition of diabetes is strict to an extent that it misses many individuals have diagnosis, this strict group of diabetes patients may be selected few who has a different risk of stroke for some reason that is unrelated to diabetes.

Under-diagnosis and social inequality

There are also similar considerations, but this time regarding under-diagnosis. Even if you stick to the best clinical criteria of diabetes (if there is such thing), you would have to be able to access to the diagnostic resource in order to get diagnosis. Therefore, this could involve some bias related to social inequality. Unfortunately, most clinicians I know would be very comfortable ignoring this bias and think ‘this is not our job and this is not science.’

It is probably acceptable for a specialist doctor to think that their job is to treat specialist patients, and before the patient become a specialist patient, it is waste of their expertise. However, it is important not to generalise this feeling when you lead a group of young doctors, because you are transferring this value to them. This is not good for our healthcare.

In all, I am not too worried about those who actually thinks about the pros and cons of misclassification and makes an informed decision about it. I am only worried about those who is so institutionalised that there is no room for doing things other way.