SIMPREDICT: info
Search all projects    list all
SIMPREDICT (WORKING PAPER): info
What is SIMPREDICT
SIMPREDICT is a simulation-based tool for risk stratification model developers, emphasising the risks of conditions of low-to-moderate incidence, typically between 25 and 200 per 100,000 person-years.
Incidence within this range represents a key challenge in identifying individuals with a sufficiently high risk that is clinically actionable.
Therefore, we focus on this clinically actionable test-positive group and provide reference numbers for the absolute risk characteristics of the test-positive group based on C-statistics, disease incidence, and the threshold of test positivity.
The characteristics are TRs (test-positive rates), PPVs (positive predictive value i.e. the absolute risk of the test-positive group), and SRs (sensitivity rates, or the coverage % of disease cases in the test-positive group).
How to use
Via the TASK tab you can:
Estimate the absolute risks [Task 201]
Input the C-statistic to obtain the absolute risk characteristics (TRs, PPVs, and SRs) for diseases of different incidences
Minimal incidence required [Task 202]
Input the C-statistics and the test-positive threshold (based on PPV and SR) to obtain the minimal incidence required to identify such a group and the best PPV cut-off for diseases of difference incidence.
Minimal incidence required [Task 203]
Input the disease incidence and the test-positive threshold (based on PPV and SR) to obtain the minimal C-statistic required to identify such a group.
Result uncertainty (interquartile ranges IQRs) [Task 301]
Input the expected C-statistics, disease incidence, and the test-positive threshold to obtain the expected interquartile range (25-75 percentile) for different model-testing sample size.
You may also Use AI with SIMPREDICT.
SIMPREDICT (WORKING PAPER): Using AI with SIMPREDICT
If you are using AI tools like ChatGPT or Gemini to analyse results, it is essential to provide the correct context. Without these assumptions, a general AI may provide inaccurate statistical interpretations.
Paste this at the start of your chat to 'teach' the AI the rules of the framework:
'I am using the SIMPREDICT framework from Pleiotropy.co.uk. This tool uses a log-normal risk distribution to calculate the clinical utility of models for low-incidence diseases. It provides reference numbers for PPV, Sensitivity (SR), and Test Positive Rate (TP) based on the C-statistic, incidence, and absolute risk thresholds. Use these benchmarks to answer the following...'
2. Task-Specific AI Prompts
'Given a C-statistic of 0.75 and an incidence of 50 per 100,000, what are the expected PPV, Sensitivity, and Test Positive Rate at a 5% 10-year risk threshold?'
'If my model has a C-statistic of 0.70, what is the minimal disease incidence required to achieve a PPV of at least 10% and a Sensitivity of 20% at a 5% risk threshold?'
'For a disease with an incidence of 100 per 100,000, what is the minimal C-statistic a model must achieve to be clinically useful (e.g., PPV > 5% and Sensitivity > 10%)?'
'Using the Task 301 simulation, if I validate a model (true C-stat 0.70) in a sample of 5,000 people, what is the expected Interquartile Range (IQR) for the observed C-statistic and PPV?'
3. Flexible & Advanced Analysis
Users can ask the AI to scale or interpolate SIMPREDICT reference numbers for bespoke scenarios:
'I need to evaluate a 5% risk threshold over a 5-year period instead of 10 years. Scale the SIMPREDICT log-normal benchmarks to provide the expected absolute risk characteristics for this 5-year window given a C-stat of 0.75 and an annual incidence of 80 per 100,000.'
'SIMPREDICT provides results for 5% and 10% thresholds. Based on the log-normal distribution logic, can you interpolate the expected PPV and Sensitivity for a model with a C-stat of 0.80 at a 7% risk threshold?'
Core Principles
SIMPREDICT reference numbers are built on the following logic:
Log-normal Distribution: The mathematical benchmark for underlying risk.
Uncertainty Analysis: Task 301 provides the 25th-75th percentile ranges to show how sample size affects clinical utility results.
Clinical Utility: Focuses on the 'test-positive' group (PPV, SR, TP) to determine practical model value.