1 Correct interpretation of prediction models
Researchers built a prediction model \(f\) that aims to predict the risk of a heart attack (\(=Y\)) conditional on features \(X=\{\)age,bmi\(\}\) when intervening on treatment \(T=\)statin (assumed to be a binary variable). Assume that the model was fit on a sufficiently large training set without parametric form bias. In addition, assume this DAG:
These numbers are produced by the model \(f\):
statin | age | bmi | \(f\) |
---|---|---|---|
1 | 50 | 20 | 10% |
0 | 50 | 20 | 15% |
1 | 50 | 25 | 20% |
0 | 50 | 25 | 18% |
1 | 55 | 25 | 23% |
0 | 55 | 25 | 21% |
Read the following statements:
- for a patient of age=50 and bmi=25 who is not using a statin, the causal effect of reducing bmi by 5 points is a risk reduction of 18-15=3%
- for a patient of age=50 and bmi=20 who is not using a statin, the causal effect of starting a statin is a risk reduction of 15-10=5%
- for a patient of age=50, reducing bmi from 25 to 20 causes the effect of statins to become smaller on an absolute risk scale.
- for a patient with a bmi of 25 who is taking a statin, the causal effect of aging by 5 years is an increase in risk of 3%
2 Validation of prediction models
2.1 Pre and post-deployment validation of prediction models
Researchers built a prediction model to identify patients with a high risk of developing sepsis in the hospital, a life-threatening disease. The prediction model uses the patient’s age, temperature and and blood pressure, and had good discriminative performance in the training data. The model is deployed, doctors are alerted of high risk patients and are able to prevent 90% of sepsis cases in this high risk group compared to before deployment of the model, so it is a glaring success. Post-deployment, a follow-up study is conducted to test if the model is still predicting accurately.
2.2 Selecting models for decision support
Researchers from the Netherlands developed two models for 10-year cardiovascular disease: Zrisk and Brisk. The intended use of the models is to better prescribe an expensive cholesterol lowering drug: go-lesterol. Both models use go-lesterol as an input variable, but each has a different set of other co-variates. Both models were trained on the same large observational dataset, and were tested in two external studies:
- A Nationwide registry in Sweden on the entire population (10 million people, 100.000 cases of heart attack during 5-year follow-up)
- An RCT with 2000 participants where go-lesterol was assigned randomly, 10 year follow-up and 50 heart attack cases
The results on AUC (a measure of discrimination between 0.5 and 1, where higher means better) are a bit puzzling:
study | Zrisk | Brisk | p-value |
---|---|---|---|
Sweden | 0.7 | 0.85 | <0.00001 |
RCT | 0.72 | 0.65 | 0.032 |
Assume that Sweden and the Netherlands are comparable in terms of health care and heart attack rates.
3 Other uses of causality in prediction modeling
Researchers developed a deep learning model that detects heart attacks on an ECGs. They used ECGs from both the emergency room as well as out-patient clinic ECGs. The emergency room has a different type of ECG machine than the out-patient clinic, and heart attacks are much more frequent in the emergency room than in the out-patient clinic. By conducting model interpretability studies, researchers found that the deep learning model can recognize whether an ECG is from the out-patient clinic or the emergency room, and in fact uses this information in its prediction. This finding is explained to the cardiologist, who says that the model is not ‘robust’, that this should be fixed or the model cannot be used.