In this part, we are going to study what happens when we fail to adhere to certain trial principles in the conduct of an observational study. We will concentrate on the effect of opting for surgery or not at a clear time zero, a time at which the decision is to be made for an individual from the target population. The outcome of interest is 6-month all-cause mortality.
Start by simulating data using the code below. Make sure to check out the annotations too to get a sense of what the different variables represent.
Why simulation?
Simulation means we’re in control. It allows us to give access to those variables that are otherwise never observed directly — the counterfactuals. We can use them to illustrate whether assumptions are met and approximately directly what the value of the causal estimand is.
Code
# Specify sample size and generate baseline characteristicsn <-1e6X <-rbinom(n,1,.3)# What would happen, in terms of survival, if surgery were allocated?# First, simulate how long patients are alive in the wait time of 3 monthssurv_3m <-pmin(rexp(n,exp(-3+2*X)),3)# After the wait time, the time alive may be affected by surgerysurv1 <- surv_3m +ifelse(surv_3m<3,0,rexp(n,exp(-2+1.5*X)))# What would happen, in terms of survival, if surgery weren't allocated?surv0 <- surv_3m +ifelse(surv_3m<3,0,rexp(n,exp(-2+1.5*X)))# Allocate surgery by way of randomisationalloc <-rbinom(n,1,.5) # alloc=1 <=> surgery allocated# Did the patient actually undergo surgery?received <- surv_3m==3& alloc # alloc=1 <=> surgery received# Derive (f)actual time alive (under causal consistency!)surv <-ifelse(alloc,surv1,surv0)# Introduce administrative right censoring at 6 monthsstatus <-ifelse(surv<=6,1L,0L)time <-pmin(surv,6)# Store everything in a data framedata <-data.frame(id=seq_len(n),X,alloc,received,time,status,surv1,surv0)
The first 6 rows of the data frame data will look something like this:
head(data)
QuestionA1. Using the data on the counterfactuals, estimate the probability of surviving beyond 6 months if allocated to surgery.
Answer
with(data, mean(surv1>6))
[1] 0.418095
QuestionA2. Using the data on the counterfactuals, estimate the probability of surviving beyond 6 months if not allocated to surgery.
Answer
with(data, mean(surv0>6))
[1] 0.418138
QuestionA3. Estimate the causal effect of surgery on 6-month survival as quantified by the marginal causal difference in probabilities. Express the result in words — what does it mean?
Answer
with(data, mean(surv1>6)-mean(surv0>6))
[1] -4.3e-05
Surgery increases one’s 6-month survival probably with approximately 0 percentage points.
QuestionA4. As already mentioned, we typically don’t have direct access to the counterfactuals, so we must instead learn about the causal effect from indirect evidence. Compare the observed survival risk among those who actually received surgery to those who didn’t. Contrast your answer with that of question A3. Why it is similar/different?
Answer
The effect of surgery appears much larger that the estimate found previously. While results can differ slightly because of random variability, much of the difference is attributable to immortal time bias due to misclassification. Indeed, all those who received surgery were “immortal” during the wait time of 3 months — otherwise, they wouldn’t have received surgery. Those who didn’t receive surgery include those who weren’t allocated to it in the first place as well as those who were allocated but didn’t survive long enough to receive it. These are ingredients for an unfair comparison.
QuestionA5. Next, compare (in terms of 6-month survival probability) the allocated versus not-allocated, among a selection of individuals. Select those individuals that were either not allocated to surgery or were both allocated and survived long enough to receive surgery (3 months). Contrast your answer with that of question A3. Why it is similar/different?
Answer
Again, the effect of surgery appears much larger that the estimate found previously. Much of the difference is attributable to immortal time bias due to selection. Again, all those selection individuals who received surgery were “immortal” during the wait time of 3 months — otherwise, they wouldn’t have received surgery. Those who were selected but didn’t receive surgery do not have this (unfair) “advantage”.
QuestionA6. Next, compare the observed survival risk among those allocated to surgery to that of those not allocated to surgery. Contrast your answer with that of question A3. Why it is similar/different?
Answer
The study represents a randomised controlled trial. The baseline variable alloc is therefore (marginally) independent of the counterfactuals surv0 and surv1. You may like to check this by running the following lines of code:
Additionally relying on positivity and consistency, this means that the counterfactual outcome probabilities are identified by the arm-specific factual outcome probabilities.
QuestionA7. Even though this study represents a trial, the methods that we use need not align with trial principles. Which of the approaches used in the previous questions does align with trial principles?
Answer
The approaches of questions A4 and A5 do not, because a contrast is made between groups defined by post-randomisation information. Question A6 does align with trial principles.
QuestionA8 (bonus). What if we didn’t have information on alloc? Knowing that individuals are treated the same in the first three months, regardless of their allocation status or any other characteristics, we can identify the effect of allocation on 6-month survival. We might just randomly allocate those individuals whose actual allocation status is unknown. Do this and evaluate the causal difference in means. Is your estimate close to the true allocation-survival effect?
Answer
For your information: a statistically more efficient analogue of this approach would be not to randomly allocate individiuals whose allocation status is unknown, but to use them in both arms with a suitable weight.
In this part of the practical, we consider an observational study on the effect of the use of some drug on overall survival since diagnosis of some disease. For simplicity, there will be no drop-out or censoring other than administrative censoring at 10 months after diagnosis.
Start again by simulating data using the provided code. This time, don’t worry about the actual code. All you need to know about it is that by running it, you get in your global environment two functions, one called drawSample and one called wide2long, which reshapes the output of drawSample in a convenient way.
Code
exposure_model <-~Alag+L+Llagoutcome_model <-~A+Alag+Alag2+L+LlagdrawSample <-function(n=1e6,a=NA,randomiseAt=NA){ Llag <-rep(0.5,n) # initialise Alag2 <-rep(0.5,n) # initialise Alag <-rep(0.5,n) # initialise alive <-rep(T,n) # initialise K <-10if(length(a)!=K) a <-c(a,rep(NA,length=K-length(a))) A <- L <- times <-matrix(nrow=n,ncol=K)for(k in1:K){ Lk <-suppressWarnings(ifelse(alive,rbinom(n,1,plogis(-1+Alag+Llag)),NA)) mm <-model.matrix.lm(exposure_model,data.frame(Alag,L=Lk,Llag),na.action=na.pass)if(k%in%(randomiseAt+1L)){ Ak <-ifelse(alive,rbinom(n,1,.5),NA) } else{if(!is.na(a[k])){ Ak <-ifelse(alive,rep(a[k],n),NA) } else{ Ak <-suppressWarnings(ifelse(alive,rbinom(n,1,plogis(as.matrix(mm)%*%c(-2,1,2,1))),NA) ) } } mm <-model.matrix.lm(outcome_model,data.frame(A=Ak,Alag,Alag2,L=Lk,Llag),na.action=na.pass) Tk <-suppressWarnings(ifelse(alive,rexp(n,rate=1/100*exp(as.matrix(mm)%*%c(-2,1,1,1,1,1))),0) ) alive <- Tk>1 Tk <-pmin(Tk,1) L[,k] <- Lk A[,k] <- Ak times[,k] <- Tk Llag <- Lk Alag2 <- Alag Alag <- Ak } Time <-rowSums(times) Status <- alive*1 out <-data.frame(Time,Status,L=L,A=A)return(out)}wide2long <-function(data){ colL <-grep("^L",colnames(data)) # columns corresponding to the Lk's colA <-grep("^A",colnames(data)) # columns corresponding to the Ak's priorA <-cbind(0,data[,colA[-length(colA)],drop=F])for(i inseq_len(ncol(priorA)-1L)) priorA[,i+1L] <- priorA[,i]+priorA[,i+1L] priorA <- (priorA>0)*1 nrows <-rowSums(!is.na(data[,colL,drop=F])) # number of rows per subject n <-nrow(data) id <-seq_len(n) out <-data.frame(id=rep(id,nrows)) # long format data frame with id variable w <-cumsum(nrows) # index for individuals' last rows row <-seq_along(out$id) m <-match(out$id,id) row <- out$row <- row-c(0,row[w[-n]])[m] # row numbers out$nrows <-rep(nrows,nrows) out$start <- row-1# start time of period described by row out$stop <- out$start+1 out$stop[w] <- data$Time # end time of period described by row out$L <- data[cbind(out$id,colL[row])] out$A <- data[cbind(out$id,colA[row])] out$priorA <- priorA[cbind(out$id,row)] out$event <-0 out$event[w] <-1-data$Statusreturn(out)}wide <-drawSample()long <-wide2long(wide)
Print the first 20 rows of the data frame long for a first impression.
head(long,20)
The data frame has 9 columns and many rows. Looking at the id and row columns, you’ll find that each participant is represented by potentially more than one row. The total number of rows per participant is given in the column nrows. The follow-up time since diagnosis, measured in months, of each participant is broken up into consecutive intervals of length at most 1. The start of the interval is given in column start and the end in stop. At the end of each interval, we ask whether the event of interest has yet occurred. If the answer is “yes”, a 1 will appear in the column event; and otherwise a 0. Columns A and L represent the received treatment level (1: drug treatment; 0: no drug treatment) at the start of the given interval and the value of a binary time-varying characteristic just before the start of the interval, respectively. Finally, the column priorA indicates if the patient has yet had treatment prior to the start of the given interval.
QuestionB1. Suppose that we’d like to learn from these data, the effect of starting treatment versus not starting at 3 months after diagnosis. What criteria would someone need to meet at the least (and when) to be considered eligible for randomisation in your target trial?
Answer
For the treatment to be a first treatment at 3 months, the participant shouldn’t have had treatment before. Bear in mind that at the moment of randomisation, we cannot do anything anymore about the past of the patients; eg, we cannot ‘undo’ or ‘assign’ previous treatment. Also, participants can only be eligible for randomisation at start==3.
Run the lines of code below to add an eligibility indicator to the data frame.
Code
long$eligible <-with(long, start==3&!priorA)
QuestionB2. How many participants are eligible for randomisation at 3 months of follow-up?
Answer
Let’s compare the eligible initiators with the eligible non-initiators. First, take a subset of individuals who are eligibile some time during follow-up and then add an indicator for whether the individual is an iniator or not.
Code
From a strict target trial emulation perspective, including prevalent users (rather than incident users, i.e., the initiators) doesn’t align with the design of the randomised trial we want to emulate. Prevalent users are not eligible for inclusion into our target trial and so we exclude them from our emulation study too!
QuestionB3. How many individuals are in each of the arms?
Answer
with(long[long$row==1,],table(arm))
arm
0 1
182702 68922
Next, compute the time elapsed since the target moment of randomisation (3 months) until the end of follow-up.
Code
QuestionB4. What is the crude difference in 5-month event risk between the initiators and non-initiators?
Answer
fit <-lm(I(time<=5)~arm,data=subset(long,start==3))est_crude <-unname(fit$coef["arm"])cat("Crude risk difference: ",round(est_crude*100,1),"%",sep="")
Crude risk difference: 13.5%
Since this is an observational study, it is quite possible that the initiators and non-initiators lack exchangeability. We’ll address this in the next part. In the remainder of this part, we’ll repeat the study analysed above, except that we’ll now actually randomise at 3 months — i.e., we’ll conduct the target trial itself, albeit with a finite number of participants.
First, generate the data by running the following code:
Code
trial <-wide2long(drawSample(randomiseAt=3))
Next, prepare the data (trial) for the same analysis as before.
Code
# Assessing eligibilitytrial$eligible <-with(trial, start==3&!priorA)id_eligible <-unique(trial[trial$eligible,"id"])trial <- trial[trial$id%in%id_eligible,,drop=F]# Distinguish between trial armsid_init <-unique(trial[with(trial,eligible&A==1),"id"])trial$arm <-ifelse(is.na(match(trial$id,id_init)),0,1)# Compute time to event of interestis_last_row <-with(trial, row==nrows)time <- trial$stop[is_last_row]status <- trial$event[is_last_row]m <-with(trial, match(id,id[is_last_row]))trial$time <- time[m]-3
B5. What is the difference in 5-month event risk between the initiators and non-initiators?
Answer
In this part, you’ll implement the basics of inverse probability weighting (IPW) for estimating intention-to-treat (ITT) effects.
IPW for the ITT effect of initiating drug treatment at 3 months rests on the assumption that the arms are exchangeable conditional on a set of patient characterstics. Here, we’ll assume exchangeability conditional on the current and previous values of the time-varying variable represented in the column L. For a given interval, the current value of L is given on the same row of the data frame. Run the following line to add the previous value Llag.
Code
With the data prepared, fit a logistic regression model with dependent variable A and independent variables L and Llag.
Code
fit <-glm(A~L+Llag,data=subset(long,start==3),family=binomial)
QuestionC1. Given the model fit, what is the estimate of the propensity score for an untreated individual with L=0 and Llag=1? And what should be the individual’s inverse probability weight?
Answer
QuestionC2. Next, estimate the effect of initiating drug use at 3 months from the weighted data. Hint: repeat the analysis of question B4, except this time, add a weight argument. Compare your answer with that of question B5.
Answer
fit <-lm(I(time<=5)~arm,weight=weight,data=subset(long,start==3))est_IPW <-unname(fit$coef["arm"])cat("IPW-adjusted risk difference: ",round(est_IPW*100,1),"%",sep="")
IPW-adjusted risk difference: 9.6%
With IPW, we can actually check whether it balances L and Llag across the study arms.
Code
# Before weightingwith(subset(long,start==3), c(initiators=mean(L[arm==1]),noninitiators=mean(L[arm==0])))
# After weightingwith(subset(long,start==3), c(initiators=weighted.mean(L[arm==1],weight[arm==1]),noninitiators=weighted.mean(L[arm==0],weight[arm==0])))
QuestionC3. Has the balance improved with respect to these characteristics?
Part D: g-computation
In this part, we’ll again revisit the observational study of part B. Instead of IPW to adjust for a lack of exchangeability, you’ll be using g-computation.
As we’ve established, g-computation for inference about marginal counterfacutal risks is really just standardision of conditional (counterfactual) risks. The conditional risks that we’re after now are (1) the risk of sustaining the event of interest within 5 months from the target intervention (randomisation) time (3 months) if—possibly contrary to fact—patients were to initiate drug treatment, and (2) the risk if patients were not to initiate, both conditional on a set of variables that explain any lack of exchangeability. In the previous, we assumed that the variables L and Llag should be together sufficient to control for confounding, so we’ll also do this here.
Let’s therefore regress time<=5 on arm, L and Llag:
Code
fit <-with(subset(long,start==3),lm(I(time<=5)~arm+L+Llag))
QuestionD1. Looking at the previous line of code, what assumption(s) do you think (are) made in addition to consistency, conditional exchangeability and positivity? Does the validity of our IPW implementation also rest on this (these) assumption(s)?
Answer
The model that is fit with the previous line of code is non-saturated, which means that it imposes parametric constraints on the data distribution. In other words, we’ve been making modelling assumptions with respect to the conditional outcome distribution here. IPW instead involves modelling the conditional treatment (or exposure) distribution. (We’re also assuming that data from different individuals are independent and identically distributed — a fairly standard assumption in statistics.)
The next step is standardisation, which can be implemented as follows:
Code
QuestionD2. Compare the result with your answer to question C2. Can you change the conditional outcome model (make less restrictive) to bring the estimates closer together?
Answer
Note that we are able to fit a fully saturated model here because we’re only adjusting for a few binary variables. Two treatment levels and two binary variables together form at most \(2^3=8\) strata. With many more categorical variables or continuous variables, it will be necessary to impose parametric constraints. However, you may consider using flexible modelling strategies, such as regression on cubic spline transformation of continuous variables or flexible machine learning techniques.
Part E: g-estimation (OPTIONAL)
Feel free to skip this part it you believe it to be too technical for you at present.
In this final part of the practical, we’ll revisit the observational study of part B once more. This time, we’ll use g-estimation.
G-estimation starts by postulating a model for a contrast between conditional counterfactual risks, here comparing the initiating drug treatment versus not initiating at three months. For brevity of notation, let \(Y^a\) be the binary indicator for the occurence of the event of interest within the first 5 months after the target randomisation time, if—possibly contrary to fact—treatment \(A\) were equal to \(a\) (\(a=0,1\)). Let us assume that \[\mathbb{E}[Y^a\mathrel{|}A=a,L,L_{\text{lag}}]-\mathbb{E}[Y^0\mathrel{|}A=a,L,L_{\text{lag}}]=\beta a\] for some constant \(\beta\). This assumption implies that the treatment-outcome effect (expressed on an additive scale) does not vary across levels of \(L,L_{\mathrm{lag}}\).
QuestionE1. Does this model imply that \(Y^a\) is independent of \(L\)?
Answer
No. It could for example be that \(\mathbb{E}[Y^a\mathrel{|}A,L,L_{\text{lag}}]=\alpha+ L+\beta a\) for all \(a\), in which case \(Y^a\) depends on \(L\) and yet \(\mathbb{E}[Y^a=1\mathrel{|}A=a,L,L_{\text{lag}}]-\mathbb{E}[Y^0\mathrel{|}A=a,L,L_{\text{lag}}]=\beta a\), as per the model above.
Next, we must relate the risk \(\mathbb{E}[Y^0\mathrel{|}A,L,L_{\text{lag}}]\) to the distribution of observed variables. To this end, we employ the consistency condition (\(Y=Y^1A+Y^0(1-A)\)), which implies \(Y^0=Y-(Y^1-Y^0)A\) and, so, \(\mathbb{E}[Y^0\mathrel{|}A,L,L_{\text{lag}}]=\mathbb{E}[Y - \beta A \mathrel{|} A,L,L_{\text{lag}}]\). If we knew \(\beta\), we could thus estimate \(\mathbb{E}[Y^0\mathrel{|}A,L,L_{\text{lag}}]\) for all individuals by \(\widehat {Y^0}(\beta):=\widehat{\mathbb{E}}[Y - \beta A\mathrel{|}A,L,L_{\text{lag}}]\). This forms the basis for the next step.
Code
data <-subset(long,start==3)data$Y <- (data$time<=5)*1data$A <- data$armhatY0 <-function(beta,data) with(data,Y-beta*A)
Now comes the exciting bit: we know that \(\widehat {Y^0}(\hat\beta)\) is independent of \(A\) given \(L\) and \(L_{\text{lag}}\) if \(\hat\beta=\beta\), or at least that’s what we’re assuming implictly (as a consequence of the conditional exchangeability assumption). So, we look for values of \(\beta\) that are compatible with this assumption:
Code
QuestionE2. Looking at the figure, what values of \(\hat\beta\) are compatible with our conditional exchangeability assumption?
Answer
When \(\hat\beta\) is somewhere around 10%. Note that the validity of this approach depends on the extent to which the modelling assumptions hold.