Owen Yang

The word ‘propensity score’ should have had a broader meaning, but because it is heavily used in one context, let us just focus on this.

A propensity score approach is trying to tackle the limitation of an observational study that aims to prove an exposure causes an outcome, for example to prove statin (exposure) prevents liver cancer (outcome). The best approach, if we can afford and if it is feasible, we would like to conduct a randomised control trial to prove this. This is not always possible. Even if it is possible, some weaker evidence still need to be generated to justify such a clinical trial. An observational study is therefore sometimes still needed to support this justification.

Propensity score matching is trying to pretend a clinical trial

Let us imagine that we find statin users are found to have 20% reduced risk of liver cancer in an observation study. As a nature in an observational study, an apparent association between statin and liver cancer risk could be confounded, and a major source of confounding is why statin has been prescribed at the first place. In a randomised clinical trial there is no cause (because they are randomly assigned), but in an observational study statin is prescribed for a reason. It could well be these reasons, instead of statin itself, are the root causes of these apparent association.

A propensity score is essentially a panel of factors that predicts the propensity (or likelihood) that an individual is exposed (from 0 to 1). They like to use the world ‘assigned’, for example the propensity that an individual is ‘assigned’ to the exposed group. This is because the use of ‘assigned’ make sense to those who understands how a clinical trial works. But obvious it can be confusing to others because there is no active assigning going on.

The most straightforward way to use this score is to stratify by the propensity score when one is conducting the analysis for the association between the exposure and the outcome. Again they like to use the work ‘matching’, for example propensity score matching, because it feels like we are conducting a matched clinical trial.

Misconception of the benefit of a propensity score matching

Sometimes one can be misled and think the main benefit of a propensity score matching is to address confounding. This is certainly not incorrect, but I feel a better understanding is that propensity score is a compromise. If we would like to match for every factor individually, we soon will realise that matching rarer factors (say rheumatoid arthritis, around 1%) is difficult. For common factors (such as hypertension or being overweight), we will also find it a huge headache when you would like to match multiple factors simultaneously (in 2 by 2 combinations, for example). It will certainly be much easier to match the propensity score, which is a simple probability between 0 and 1.

However, when the procedure has gone technical, we tend to forget what the purpose is. Will you match a patient with diabetes but no rheumatoid arthritis, with a patient with rheumatoid arthritis but not diabetes, when they have the same probability to be prescribed with statin? Some career statisticians will tell you that by involving as many factors and possible, we spread the risk of this type of individual question. But if there are factors that we are really worried about, then we probably could single them out as a separate matching factor on top of the propensity score matching.

Two interesting propensity score scenarios about statin

I use statin as an example here because this is really interesting. I do not have good answers for them but for me they are certainly thought-provoking.

There are, as far as I know, two commonest way to prescribe statin in the world. The first way is to prescribe statin when someone has high cholesterol-predicted cardiovascular risk, for example because there is high total cholesterol, high LDL, or low HDL/LDL ratio. In this case, cholesterol is the only indication of statin prescription. In this case, how to you match them appropriately to conduct an observational study? Are you worried that any association is not due to statin, but due to anything related to cholesterol?

The second way to prescribe statin is even more intriguing, in my opinion. In many countries a 10-year cardiovascular risk is calculated, and patients are prescribed with statin when the risk is higher than a number, say 10%, regardless of cholesterol level. In this case, how would you match to address the confounding if there is an association between statin prescribing and liver cancer? How will the propensity score be different from the initial 10-year cardiovascular risk score?