Owen Yang
I had my medical school in the 20th century and sort of assume med stats education has improved from then.
But some how I might be wrong?
I cannot remember what is the last paper in clinical medicine where T test is quoted explicitly. But from time to time I still have queries from young doctors asking whether they should do T test or ANOVA, or one of the named test (McNemar?).
In reality I think the scientific society has moved forward.
The main advance is that we have moved beyond ‘p value’ to effect sizes. In the 20st century a lot emphasis is focused on whether there is ‘nominal’ difference between the two groups. Does this medication reduce blood pressure or reduce the risk of heart attack?
Effect sizes
Effect size is a quantified statement, about ‘how much.’ How much does amlodipine reduce blood pressure on average? The ‘how much’ question is much better than the ‘does it’ question because of many reasons. As a simple example, if the answer to the ‘does it’ question is yes (for example a p value of 0.02) but the ‘how much’ question is negligible (say 0.5 mmHg difference, which is next to nothing), then the p value in this case is really just superficial.
The common forms of a effect size should really be in the unit of a measured outcome (such as mmHg for blood pressure) or in some form of a relative risk (such as 1.2 fold increase of heart attack risk in one group compared to the reference group, or ‘control group’).
There is usually a confidence interval for each estimated effect size. If we have a good and health, 21st century quantified mind, we should see beyond the taught statistical knowledge that how important ‘0’ or ‘1’ is. The collage exam will ask you to emphasise that if the confidence interval for a relative risk covers 1 (for example a confidence interval of risk between 0.9 and 1.1) then it is not statistical significant. Although it is true, I think this is described from a twisted (or outdated) philosophy that we should care whether it is statistical significant.
If the confidence interval is between 0.2 and 5.0, it is not statistical significant. However, there is a non-trivial chance that the real number of the relative risk is as low as 0.2 or as high as 5.0. It is fair to say the science has not improved much from this study in general.
Meanwhile, a relative risk of 1.10 and a relative of 10.0 bear very differently on the priority of this study, or this medicine. This can be a difference in number needed to treat (NNT) from 10 to 1000. Simply saying p value is significant at 0.05 level without looking at the actual number of risk is a huge ignorance.
Do not let statisticians lead everything
In my humble limited experience, there is a subgroup of statisticians who do not think it is their job to understand the medical context of research. Do not let statisticians lead everything. They seem confidence, but this is because they know they do not need to be responsible to the medical implications. In my opinion, a good statistician should be a good qualitative researcher, and is interested in the context of their data. Sadly this is not always true, just like a clinical researcher can feel good about themselves without knowing the basic statistics. We all have our limit.
Three models for 99% of your questions
Therefore, most of our clinical research questions will likely be solved by linear regression, logistic regression, or Cox regression. A few really top doctors asked me about what happened if it is not normally distributed. I would say just convert your outcome to binary outcome if the distribution is really worrying.
For those mathematicians shouting at the background. I do not want to care whether there is a T test or an ANOVA test, or Waldo test behind these regressions.
The one clinically intuitive question that baffles all statisticians
Nevertheless, there is one type of question that we do not have a straightforward method to deal with: categorical outcomes. One example would be cancer stages, or asthma stages. The stages are neither linear or categorical variables, and therefore there is no simple sentence that can give you sufficient quantitative details on whether giving one medication ‘improve the stage.’
Let me know how you may discuss with statisticians on this type of questions. This is unfortunately quite common.