This is going to be another article that annoys a large group of professionals. But hear me out. If you are a clinician then my level of understanding may be the level that you need.

As I probably mentioned a few times privately and publicly, the genetic world has developed a parallel universe of jargons and statistical methods. So bear with me.

A genetic locus which may affect a trait by changing its sequence (A/T/C/G)

QTL is quantitative trait locus (so QTLs for loci?). It is basically a place (normally a basepair, or a genetic digit) what is thought to affect a trait. So eQTL is expression QTL, meaning the trait is expression of some gene. One who has some classic genetic knowledge would imagine perhaps something upstream of a gene or even in a gene that either is the promotor or affecting the promotor in someway, or enhancer etc.. To be honest I am not the right person to explain what promotors or enhancers are. In a nutshell, it is a locus in a genetic sequence where the sequence of it may affect the expression of a gene.

Similarly, if you see pQTL that is probably referring to a locus that may affect the level (or a concentration) of a protein. So normally eQTL means mRNA expression, and perhaps pQTL means protein expression, but in fact I don’t think it actually needs to be some expression of a gene. A trait can be anything. The level of a protein itself is a trait, and it does not really matter whether this is determined by one or more genes. Some genetic statistician would probably like to imply this is somehow a genetic process.

From what I can gather, there are QTLs that definitely affects the trait, but more of them are just very weak statistical signals that may or may not collectively cause reproducible effect on the trait. They are both fine, but we need to be careful when someone shows you the results. Do not assume all QTL has some well-evidenced biological genetic mechanisms that affect the trait, or the expression.

cis- and trans- QTLs

The wording of cis- and trans- to me is confusing because it does not really refer to the side of the gene, but whether the locus is next to the gene (cis) or far from the gene (trans). From my understanding there is no clear definition of cis or trans, but more of a concept thing or open to some variation in definition depending on the context.

How QTLs are defined or discovered

Intuitively to me, the entire QTL franchise is based on two statistical fancy boxes. The first fancy box is the Mendelian randomisation, which assumes that in a well-mixed population genetics are random, and therefore an observed genetic effect is likely to be unbiased. In short, when you find there is an association between a genetic marker and a trait in a well-mixed population, such as the white people in the UK Biobank, it can more or less be assumed causal because in that population the genetic marker is randomly assigned. I just realised I have to explain this complicated concept in one paragraph, but please look it up: Mendelian randomisation. There are caveats in it but this is no time to discuss it, and the word ‘casual’ means very differently to our intuition.

The second fancy box is multiple comparisons. In genetics it is commonly taken for granted that the false discovery rate or p value can be used to filter signals, and so the QTLs are mainly discovered by checking the association between 100,000s of gene markers and 1000s to 10000s of traits. More resourceful research will try to validate their findings by using a separate population and see whether this can be replicated.

How QTLs may be used

There can be two common uses of these QTLs. The first is someone purely publish their findings of a random signal, and write an extended discussion how plausible this can be, hoping someone would do the hard work to prove they are correct at the first place. The hard work will be a lot of experiments, conditioned knock-outs, or clinical trials. I find these are generally good findings and has scientific contributions, but I do not like is the fact that it attracts more funding than people who actually do the experiments. When you do 100K * 10K associations you ought to find something, and so a more guaranteed successes. When you do the hard work, you have the pressure to prove your hypothesis is right nowadays, and when you are right you are not the first one who find it, because everything has been reported previously in the 100K * 10K association study. There is no glamour, only blood and sweat. I feel for them.

The second is to use them to model the trait, and do a further association study. So in a study full of genetics but no real measures on the topic, you can do some sort of proof-of-principle analysis. If you are interested in the association between vitamin D and risk of cancer, you can use the QTL for vitamin D to do an analysis without actually having the vitamin D level in the study. The example I give is exactly what a Mendelian Randomisation is, but there are examples where more complex designs can be used to test some hypotheses. Again, this approach is fine in normal statistics because we generally can feel the strength and limitation to it. The reality that I do not like is when this is done in the name of genetics, suddenly it becomes more credible and fancy, and therefore palatable to journals and funders.