Uses and pitfalls with AI for decision support - harmful self-fulfilling prophecies

WEON masterclass 2024 - AI-based prediction models in healthcare: from development to implementation

Wouter van Amsterdam, MD, PhD

Department of Data Science Methods, Julius Center, University Medical Center Utrecht

2024-05-30

Uses of AI in health care

AI may have many uses in health care

Use AI to make health care

more efficient or easier

administration / documentation
translation

better: change decisions

diagnosis (e.g. skin cancer from imaging)
prognosis (e.g. survival given medical image)
treatment effect (e.g. genetic biomarker)

prognosis (e.g. survival given medical image)
treatment effect (e.g. genetic biomarker)

Tip

Whereas treatment effect estimation is typically thought of as a causal task requiring causal approaches (e.g. randomized controllerd trials), prognosis models are often advertised for making treatment decisions.

The in-between: using prediction models for (medical) decision making

prognosis (e.g. survival given medical image)

Using prediction models for decision making is often thought of as a good idea

For example:

give chemotherapy to cancer patients with high predicted risk of recurrence
give statins to patients with a high risk of a heart attack

TRIPOD+AI on prediction models (collinsTRIPODAIStatement2024?)

“Their primary use is to support clinical decision making, such as … initiate treatment or lifestyle changes.”

This may lead to bad situations when:

ignoring the treatments patients may have had during training / validation of (AI) prediction model
only considering measures of predictive accuracy as sufficient evidence for safe deployment

When accurate prediction models yield harmful self-fulfilling prophecies

Building models for decision support without regards for the historic treatment policy is a bad idea

The question is not “is my model accurate before / after deployment”,

but did deploying the model improve patient outcomes?

Treatment-naive prediction models

\[\begin{align} E[Y|X] \class{fragment}{= E[E_{t~\sim \pi_0(X)}[Y|X,t]]} \end{align}\]

Treatment-naive prediction models

Results from (vanamsterdamWhenAccuratePrediction2024a?)

good or bad discrimination post deployment may be a sign of a harmful or a beneficial policy change
models that are perfectly calibrated before and after deployment are certainly not useful for decision making because they didn’t change the distribution

Is this obvious?

Prediction modeling is very popular in medical research

Recommended validation practices and reporting guidelines do not protect against harm

because they do not evaluate the policy change

Bigger data does not protect against harmful prediction models

More flexible models do not protect against harmful prediction models

Gap between prediction accuracy and value for decision making

What to do?

Evaluate policy change (cluster randomized controlled trial)
Build models that are likely to have value for decision making

Building and validating models for decision support

Deploying a model is an intervention that changes the way treatment decisions are made

How do we learn about the effect of an intervention?

With a randomized experiment

for using a decision support model, the unit of intervention is usually the doctor
randomly assign doctors to have access to the model or not
measure differences in treatment decisions and patient outcomes
this called a cluster RCT
if using model improves outcomes, use that one

Using cluster RCTs to evaluated models for decision making is not a new idea (Cooper et al. 1997)

“As one possibility, suppose that a trial is performed in which clinicians are randomized either to have or not to have access to such a decision aid in making decisions about where to treat patients who present with pneumonia.”

What we don’t learn

was the model predicting anything sensible?

So build treatment-naive prediction models and trial them for decision support?

Not a good idea

baking a cake without a recipe
hoping it turns into something nice
not pleasant to people that need to taste result of the experiment
- (i.e. patients may have side-effects / die)

We should build models that are likely to be valuable for decision making

Build models that predict expected outcomes under hypothetical interventions (prediction-under-intervention models)
doctor / patient can pick the treatment with best expected outcomes, depending on patient’s values
whereas treatment-naive prediction models average out over the historic treatment policy, prediction-under-intervention allows the user to select a treatment option

Hilden and Habbema on prognosis (Hilden and Habbema 1987)

“Prognosis cannot be divorced from contemplated medical action, nor from action to be taken by the patient in response to prognostication.”

prediction-under-intervention is not a new idea, but language and methods on causality have come a long way since (Hilden and Habbema 1987).

Estimand for prediction-under-intervention models

What is the estimand?

prediction: \(E[Y|X]\)
treatment effect: \(E[Y|\text{do}(T=1)] - E[Y|\text{do}(T=0)]\)
prediction-under-intervention: \(E[Y|\text{do}(T=t),X]\)

using treatment naive prediction models for decision support

prediction-under-intervention

Take-aways

when developing or evaluating (AI) prediction models for medical decisions, think about
- what is the effect of using this model on medical decisions?
- what is the effect of this policy change on patient outcomes?
deploying models for decision support is an intervention and should be evaluated as such
prediction-under-intervention models have a foreseeable effect on patient oucomes when used for decision making

From algorithms to action: improving patient care requires causality (amsterdamAlgorithmsActionImproving2024?)

When accurate prediction models yield harmful sel-fulfilling prophecies (vanamsterdamWhenAccuratePrediction2024a?)

New summerschool: Introduction to Causal Inference and Causal Data Science

Learn more about causal data science

Dates: 5 Aug. - 9 Aug. 2024
Location: Utrecht
Instructors:
- Oisin Ryan
- Bas Penning-de Vries
- Wouter van Amsterdam
Sign up still possible

Course website

References

Cooper, Gregory F., Constantin F. Aliferis, Richard Ambrosino, et al. 1997. “An Evaluation of Machine-Learning Methods for Predicting Pneumonia Mortality.” Artificial Intelligence in Medicine 9 (2): 107–38. https://doi.org/10.1016/S0933-3657(96)00367-3.

Hilden, Jørgen, and J. Dik F. Habbema. 1987. “Prognosis in Medicine: An Analysis of Its Meaning and Rôles.” Theoretical Medicine 8 (3): 349–65. https://doi.org/10.1007/BF00489469.

Karmali, Kunal N., Donald M. Lloyd-Jones, Joep van der Leeuw, et al. 2018. “Blood Pressure-Lowering Treatment Strategies Based on Cardiovascular Risk Versus Blood Pressure: A Meta-Analysis of Individual Participant Data.” PLOS Medicine 15 (3): e1002538. https://doi.org/10.1371/journal.pmed.1002538.

Keogh, Ruth H., and Nan Van Geloven. 2024. “Prediction Under Interventions: Evaluation of Counterfactual Performance Using Longitudinal Observational Data.” Epidemiology (Cambridge, Mass.) 35 (3): 329–39. https://doi.org/10.1097/EDE.0000000000001713.

Uses and pitfalls with AI for decision support - harmful self-fulfilling prophecies

Uses of AI in health care

AI may have many uses in health care

Use AI to make health care

The in-between: using prediction models for (medical) decision making

Using prediction models for decision making is often thought of as a good idea

When accurate prediction models yield harmful self-fulfilling prophecies

Building models for decision support without regards for the historic treatment policy is a bad idea

Treatment-naive prediction models

Treatment-naive prediction models

Prediction modeling is very popular in medical research

Recommended validation practices and reporting guidelines do not protect against harm

because they do not evaluate the policy change

Bigger data does not protect against harmful prediction models

More flexible models do not protect against harmful prediction models

Gap between prediction accuracy and value for decision making

Building and validating models for decision support

Deploying a model is an intervention that changes the way treatment decisions are made

How do we learn about the effect of an intervention?

So build treatment-naive prediction models and trial them for decision support?

We should build models that are likely to be valuable for decision making

Estimand for prediction-under-intervention models

More on prediction-under-intervention models

Take-aways

New summerschool: Introduction to Causal Inference and Causal Data Science

Learn more about causal data science

References