SIMPREDICT: info
Search all projects    list all
Click to expand
SIMPREDICT (WORKING PAPER): Using AI with SIMPREDICT
If you are using AI tools like ChatGPT or Gemini to analyse results, it is essential to provide the correct context. Without these assumptions, a general AI may provide inaccurate statistical interpretations.
1. Copy the Context Header
Paste this at the start of your chat to 'teach' the AI the rules of the framework:
Paste this at the start of your chat to 'teach' the AI the rules of the framework:
'I am using the SIMPREDICT framework from Pleiotropy.co.uk. This tool uses a log-normal risk distribution to calculate the clinical utility of models for low-incidence diseases. It provides reference numbers for PPV, Sensitivity (SR), and Test Positive Rate (TP) based on the C-statistic, incidence, and absolute risk thresholds. Use these benchmarks to answer the following...'
2. Task-Specific AI Prompts
Task 201: Estimating Risk Indicators
'Given a C-statistic of 0.75 and an incidence of 50 per 100,000, what are the expected PPV, Sensitivity, and Test Positive Rate at a 5% 10-year risk threshold?'
'Given a C-statistic of 0.75 and an incidence of 50 per 100,000, what are the expected PPV, Sensitivity, and Test Positive Rate at a 5% 10-year risk threshold?'
Task 202: Finding Minimal Incidence
'If my model has a C-statistic of 0.70, what is the minimal disease incidence required to achieve a PPV of at least 10% and a Sensitivity of 20% at a 5% risk threshold?'
'If my model has a C-statistic of 0.70, what is the minimal disease incidence required to achieve a PPV of at least 10% and a Sensitivity of 20% at a 5% risk threshold?'
Task 203: Finding Minimal C-Statistic
'For a disease with an incidence of 100 per 100,000, what is the minimal C-statistic a model must achieve to be clinically useful (e.g., PPV > 5% and Sensitivity > 10%)?'
'For a disease with an incidence of 100 per 100,000, what is the minimal C-statistic a model must achieve to be clinically useful (e.g., PPV > 5% and Sensitivity > 10%)?'
Task 301: Modeling Uncertainty (IQR)
'Using the Task 301 simulation, if I validate a model (true C-stat 0.70) in a sample of 5,000 people, what is the expected Interquartile Range (IQR) for the observed C-statistic and PPV?'
'Using the Task 301 simulation, if I validate a model (true C-stat 0.70) in a sample of 5,000 people, what is the expected Interquartile Range (IQR) for the observed C-statistic and PPV?'
3. Flexible & Advanced Analysis
Users can ask the AI to scale or interpolate SIMPREDICT reference numbers for bespoke scenarios:
Custom Time Horizons (e.g., 5-year risk)
'I need to evaluate a 5% risk threshold over a 5-year period instead of 10 years. Scale the SIMPREDICT log-normal benchmarks to provide the expected absolute risk characteristics for this 5-year window given a C-stat of 0.75 and an annual incidence of 80 per 100,000.'
'I need to evaluate a 5% risk threshold over a 5-year period instead of 10 years. Scale the SIMPREDICT log-normal benchmarks to provide the expected absolute risk characteristics for this 5-year window given a C-stat of 0.75 and an annual incidence of 80 per 100,000.'
Interpolating Thresholds (e.g., 7% risk)
'SIMPREDICT provides results for 5% and 10% thresholds. Based on the log-normal distribution logic, can you interpolate the expected PPV and Sensitivity for a model with a C-stat of 0.80 at a 7% risk threshold?'
'SIMPREDICT provides results for 5% and 10% thresholds. Based on the log-normal distribution logic, can you interpolate the expected PPV and Sensitivity for a model with a C-stat of 0.80 at a 7% risk threshold?'
Core Principles
SIMPREDICT reference numbers are built on the following logic:
Log-normal Distribution: The mathematical benchmark for underlying risk.
Uncertainty Analysis: Task 301 provides the 25th-75th percentile ranges to show how sample size affects clinical utility results.
Clinical Utility: Focuses on the 'test-positive' group (PPV, SR, TP) to determine practical model value.