Owen Yang

When we apply for research grants, sample size justification has become a checkbox item. Large and small sample sizes both receive criticisms. From a small-sized study, the possibility of a chance finding is high, and it is difficult to draw meaningful conclusions. When the sample size is large, the study becomes expensive, and the grant funders expect to see the feasibility of the applicant to conduct such a large study. ‘Value for money’ is also a checkbox required by research funders. One should ensure the sample size is not unnecessarily large that money is wasted on little marginal benefit.

Not surprisingly, research capitalism [1] operates similarly to other forms of capitalism. Those who propose a research project with most tangible promises will most likely receive research grants. These promises are most likely based on their previous achievements, fueled by previous capitals. Those who hold research capitals have the power not only to fuel their future research, but also to choose who they want to invest. They may, at least by receiving finite capitals available, block the progress of their competitors.

This system has scientific and economic merits, but at time I wonder whether this could inadvertently exaggerate health inequity. 

UK Biobank, or the concept of it, is probably one of the greatest scientific achievements of our time. In the UK Biobank, 500,000 participants contributed their personal and biological information almost on a completely altruistic basis. Information saves lives. By generating knowledge at this industrial scale, this project has accelerated scientific achievement that may benefit human health of the next generations. 

Another achievement from the UK Biobank and other big data is that a new generation of data-intensive research protocols have been developed. A large cohort of scientists are populated to follow these protocols, producing new findings at lower cost that may benefit human health.

But who has been benefited? The UK Biobank consists of an overwhelmingly 94.6% of white population. One may argue this reflects the UK population of the older generation. [2]

The real benefit of a large sample size is multiplicative. Where the sample size is large, one can harness the large data and investigate the most refined determinants of health. Carbohydrates, proteins, and fats are commonly consumed together, but with a large sample size one can look at the details, such as higher protein and lower carbohydrate, may contribute to health benefit. [3] There are even genetic methods developed to predict one’s tendency (or preference) of consuming certain type of diet. [4] This knowledge could potentially unlock the biological secrets of life-long health.

The relatively small amount of non-white participants in the UK Biobank could ‘theoretically’ still be useful. Because of the large total size, even a small proportion of ethnic minorities could be the world’s most valuable source of information. 

Research using this theoretically valuable source, however, is far less exciting than one should expect. 

Minority research does not attract the most generous funders and the most ambitious researchers. Throughout this big data revolution, the research journals and funders have been awed by fascinating findings generated by data-intensive protocols. The size of  information from the minority, despite relatively large within the ethnic-specific world, will never catch up and be sufficient for up-to-date data-intensive protocols. Findings from ‘conventional methods’ are no longer palatable, and may only appear convincing when the they concur with what was known in the majority population. Since publishing scientific articles and attracting research funding are the main achievements academic researchers demonstrate to progress their career, the brain drain away from these ethnic minority studies is evident.

One may dream to develop new methods that suits diversity [5]. However, many research protocols are so established that they create a monopoly of free, ready-made computer programs and homogeneous protocol users. This creates an entry barrier to any new developers that wishes to shift a research paradigm. 

It is a common practice for genetic researchers to drop information of all non-white population in UK Biobank research. Many rationalize this as the only way to retain ‘population structure’ necessary to validate the established research protocols. Most researchers I know have not had a second thought about the implications in the social inequity. ‘It is only scientific.’ They would probably have said. I have never read any of these scientific articles where this issue is proactively addressed. 

Ethnicity, diversity, and inclusion will be re-invented as a checkbox in every grant application form, but the barrier to use data for social minority groups will continue to increase, and the value for money will continue to decrease. Changes are difficult in the current research funding and career progression structure that celebrates research capitalism so dearly.

REFERENCES

  1. Münch, R. Academic Capitalism (Oxford Research Encyclopedias) https://doi.org/10.1093/acrefore/9780190228637.013.15. Accessed on 1 May 2022.
  2. Fry A, Littlejohns T J, Sudlow C et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol. 2017;186(9):1026-1034.  doi: 10.1093/aje/kwx246.
  3. Bradbury KE, Yong T Y N, and Key T J.  Dietary Intake of High-Protein Foods and Other Major Foods in Meat-Eaters, Poultry-Eaters, Fish-Eaters, Vegetarians, and Vegans in UK Biobank. Nutrients 2017;9(12):1317.  doi.org/10.3390/nu9121317
  4. Cole JB, Florez J C, and Hirschhorn J N. Comprehensive genomic analysis of dietary habits in UK Biobank identifies hundreds of genetic associations. Nat Commun 2020;11(1):1467.  doi: 10.1038/s41467-020-15193-0.
  5. Constantinescu A W, Mitchell R E, Zheng J et al. A Framework for Research into Continental Ancestry groups of the UK Biobank. Hum Genomics 2022;16(1):3.  doi: 10.1186/s40246-022-00380-5.