Day 3 Practical: Target Trials in Practice

Author

Bas Penning de Vries

Part A: an RCT with wait time

In this part, we are going to study what happens when we fail to adhere to certain trial principles in the conduct of an observational study. We will concentrate on the effect of opting for surgery or not at a clear time zero, a time at which the decision is to be made for an individual from the target population. The outcome of interest is 6-month all-cause mortality.

Start by simulating data using the code below. Make sure to check out the annotations too to get a sense of what the different variables represent.

Why simulation?

Simulation means we’re in control. It allows us to give access to those variables that are otherwise never observed directly — the counterfactuals. We can use them to illustrate whether assumptions are met and approximately directly what the value of the causal estimand is.

Code

# Specify sample size and generate baseline characteristics
n <- 1e6
X <- rbinom(n,1,.3)

# What would happen, in terms of survival, if surgery were allocated?
# First, simulate how long patients are alive in the wait time of 3 months
surv_3m <- pmin(rexp(n,exp(-3+2*X)),3)
# After the wait time, the time alive may be affected by surgery
surv1 <- surv_3m + ifelse(surv_3m<3,0,rexp(n,exp(-2+1.5*X)))
# What would happen, in terms of survival, if surgery weren't allocated?
surv0 <- surv_3m + ifelse(surv_3m<3,0,rexp(n,exp(-2+1.5*X)))

# Allocate surgery by way of randomisation
alloc <- rbinom(n,1,.5) # alloc=1 <=> surgery allocated

# Did the patient actually undergo surgery?
received <- surv_3m==3 & alloc # alloc=1 <=> surgery received

# Derive (f)actual time alive (under causal consistency!)
surv <- ifelse(alloc,surv1,surv0)

# Introduce administrative right censoring at 6 months
status <- ifelse(surv<=6,1L,0L)
time <- pmin(surv,6)

# Store everything in a data frame
data <- data.frame(id=seq_len(n),X,alloc,received,time,status,surv1,surv0)

The first 6 rows of the data frame data will look something like this:

head(data)

Question A1. Using the data on the counterfactuals, estimate the probability of surviving beyond 6 months if allocated to surgery.

Answer

with(data, mean(surv1>6))

[1] 0.418095

Question A2. Using the data on the counterfactuals, estimate the probability of surviving beyond 6 months if not allocated to surgery.

Answer

with(data, mean(surv0>6))

[1] 0.418138

Question A3. Estimate the causal effect of surgery on 6-month survival as quantified by the marginal causal difference in probabilities. Express the result in words — what does it mean?

Answer

with(data, mean(surv1>6)-mean(surv0>6))

[1] -4.3e-05

Surgery increases one’s 6-month survival probably with approximately 0 percentage points.

Question A4. As already mentioned, we typically don’t have direct access to the counterfactuals, so we must instead learn about the causal effect from indirect evidence. Compare the observed survival risk among those who actually received surgery to those who didn’t. Contrast your answer with that of question A3. Why it is similar/different?

Answer

p1 <- with(data, mean((1-status)[received==1]))
p0 <- with(data, mean((1-status)[received==0]))
p1-p0

[1] 0.2739132

The effect of surgery appears much larger that the estimate found previously. While results can differ slightly because of random variability, much of the difference is attributable to immortal time bias due to misclassification. Indeed, all those who received surgery were “immortal” during the wait time of 3 months — otherwise, they wouldn’t have received surgery. Those who didn’t receive surgery include those who weren’t allocated to it in the first place as well as those who were allocated but didn’t survive long enough to receive it. These are ingredients for an unfair comparison.

Question A5. Next, compare (in terms of 6-month survival probability) the allocated versus not-allocated, among a selection of individuals. Select those individuals that were either not allocated to surgery or were both allocated and survived long enough to receive surgery (3 months). Contrast your answer with that of question A3. Why it is similar/different?

Answer

data$S <- with(data, (alloc==1&time>=3)|alloc==0)
p1 <- with(data, mean((1-status)[alloc==1&S]))
p0 <- with(data, mean((1-status)[alloc==0&S]))
p1-p0

[1] 0.1778597

Again, the effect of surgery appears much larger that the estimate found previously. Much of the difference is attributable to immortal time bias due to selection. Again, all those selection individuals who received surgery were “immortal” during the wait time of 3 months — otherwise, they wouldn’t have received surgery. Those who were selected but didn’t receive surgery do not have this (unfair) “advantage”.

Question A6. Next, compare the observed survival risk among those allocated to surgery to that of those not allocated to surgery. Contrast your answer with that of question A3. Why it is similar/different?

Answer

p1 <- with(data, mean((1-status)[alloc==1]))
p0 <- with(data, mean((1-status)[alloc==0]))
p1-p0

[1] 0.0004386019

The study represents a randomised controlled trial. The baseline variable alloc is therefore (marginally) independent of the counterfactuals surv0 and surv1. You may like to check this by running the following lines of code:

with(data, cor(alloc,surv0)) # approx. 0
with(data, cor(alloc,surv1)) # approx. 0

Additionally relying on positivity and consistency, this means that the counterfactual outcome probabilities are identified by the arm-specific factual outcome probabilities.

Question A7. Even though this study represents a trial, the methods that we use need not align with trial principles. Which of the approaches used in the previous questions does align with trial principles?

Answer

The approaches of questions A4 and A5 do not, because a contrast is made between groups defined by post-randomisation information. Question A6 does align with trial principles.

Question A8 (bonus). What if we didn’t have information on alloc? Knowing that individuals are treated the same in the first three months, regardless of their allocation status or any other characteristics, we can identify the effect of allocation on 6-month survival. We might just randomly allocate those individuals whose actual allocation status is unknown. Do this and evaluate the causal difference in means. Is your estimate close to the true allocation-survival effect?

Answer

data$arm <- with(data, ifelse(time>=3,received,rbinom(n,1,.5)))
p1 <- with(data, mean((1-status)[arm==1]))
p0 <- with(data, mean((1-status)[arm==0]))
p1-p0

[1] 0.001204302

For your information: a statistically more efficient analogue of this approach would be not to randomly allocate individiuals whose allocation status is unknown, but to use them in both arms with a suitable weight.

data$arm <- with(data, ifelse(time>=3,received,NA))
pseudo_data <- data
w <- is.na(pseudo_data$arm)
pseudo_data$weight <- ifelse(w,1/2,1)
pseudo_data$arm[w] <- 1
copy <- pseudo_data[w,,drop=F]
copy$arm <- 0
pseudo_data <- rbind(pseudo_data,copy)
p1 <- with(pseudo_data, sum((weight*(1-status))[arm==1])/sum(weight[arm==1]))
p0 <- with(pseudo_data, sum((weight*(1-status))[arm==0])/sum(weight[arm==0]))
p1-p0

[1] 0.0007947028

Part B: an observational study

In this part of the practical, we consider an observational study on the effect of the use of some drug on overall survival since diagnosis of some disease. For simplicity, there will be no drop-out or censoring other than administrative censoring at 10 months after diagnosis.

Start again by simulating data using the provided code. This time, don’t worry about the actual code. All you need to know about it is that by running it, you get in your global environment two functions, one called drawSample and one called wide2long, which reshapes the output of drawSample in a convenient way.

Code

exposure_model <- ~Alag+L+Llag
outcome_model <- ~A+Alag+Alag2+L+Llag

drawSample <- function(n=1e6,a=NA,randomiseAt=NA){
  Llag <- rep(0.5,n) # initialise
  Alag2 <- rep(0.5,n) # initialise
  Alag <- rep(0.5,n) # initialise
  alive <- rep(T,n) # initialise 
  K <- 10
  if(length(a)!=K) a <- c(a,rep(NA,length=K-length(a)))
  A <- L <- times <- matrix(nrow=n,ncol=K)
  for(k in 1:K){
    Lk <- suppressWarnings(ifelse(alive,rbinom(n,1,plogis(-1+Alag+Llag)),NA))
    mm <- model.matrix.lm(exposure_model,data.frame(Alag,L=Lk,Llag),
      na.action=na.pass)
    if(k%in%(randomiseAt+1L)){
      Ak <- ifelse(alive,rbinom(n,1,.5),NA)
    } else{
      if(!is.na(a[k])){
        Ak <- ifelse(alive,rep(a[k],n),NA)
      } else{
        Ak <- suppressWarnings(
          ifelse(alive,rbinom(n,1,plogis(as.matrix(mm)%*%c(-2,1,2,1))),NA)
        )
      }
    }
    mm <- model.matrix.lm(outcome_model,data.frame(A=Ak,Alag,Alag2,L=Lk,Llag),
      na.action=na.pass)
    Tk <- suppressWarnings(
      ifelse(alive,rexp(n,rate=1/100*exp(as.matrix(mm)%*%c(-2,1,1,1,1,1))),0)
    )
    alive <- Tk>1
    Tk <- pmin(Tk,1)
    L[,k] <- Lk
    A[,k] <- Ak
    times[,k] <- Tk
    Llag <- Lk
    Alag2 <- Alag
    Alag <- Ak
  }
  Time <- rowSums(times)
  Status <- alive*1
  out <- data.frame(Time,Status,L=L,A=A)
  return(out)
}

wide2long <- function(data){
  colL <- grep("^L",colnames(data)) # columns corresponding to the Lk's
  colA <- grep("^A",colnames(data)) # columns corresponding to the Ak's
  priorA <- cbind(0,data[,colA[-length(colA)],drop=F])
  for(i in seq_len(ncol(priorA)-1L)) priorA[,i+1L] <- priorA[,i]+priorA[,i+1L]
  priorA <- (priorA>0)*1
  nrows <- rowSums(!is.na(data[,colL,drop=F])) # number of rows per subject
  n <- nrow(data)
  id <- seq_len(n)
  out <- data.frame(id=rep(id,nrows)) # long format data frame with id variable
  w <- cumsum(nrows) # index for individuals' last rows
  row <- seq_along(out$id) 
  m <- match(out$id,id)
  row <- out$row <- row-c(0,row[w[-n]])[m] # row numbers
  out$nrows <- rep(nrows,nrows)
  out$start <- row-1 # start time of period described by row
  out$stop <- out$start+1
  out$stop[w] <- data$Time # end time of period described by row
  out$L <- data[cbind(out$id,colL[row])]
  out$A <- data[cbind(out$id,colA[row])]
  out$priorA <- priorA[cbind(out$id,row)]
  out$event <- 0
  out$event[w] <- 1-data$Status
  return(out)
}

wide <- drawSample()
long <- wide2long(wide)

Print the first 20 rows of the data frame long for a first impression.

head(long,20)

The data frame has 9 columns and many rows. Looking at the id and row columns, you’ll find that each participant is represented by potentially more than one row. The total number of rows per participant is given in the column nrows. The follow-up time since diagnosis, measured in months, of each participant is broken up into consecutive intervals of length at most 1. The start of the interval is given in column start and the end in stop. At the end of each interval, we ask whether the event of interest has yet occurred. If the answer is “yes”, a 1 will appear in the column event; and otherwise a 0. Columns A and L represent the received treatment level (1: drug treatment; 0: no drug treatment) at the start of the given interval and the value of a binary time-varying characteristic just before the start of the interval, respectively. Finally, the column priorA indicates if the patient has yet had treatment prior to the start of the given interval.

Question B1. Suppose that we’d like to learn from these data, the effect of starting treatment versus not starting at 3 months after diagnosis. What criteria would someone need to meet at the least (and when) to be considered eligible for randomisation in your target trial?

Answer

For the treatment to be a first treatment at 3 months, the participant shouldn’t have had treatment before. Bear in mind that at the moment of randomisation, we cannot do anything anymore about the past of the patients; eg, we cannot ‘undo’ or ‘assign’ previous treatment. Also, participants can only be eligible for randomisation at start==3.

Run the lines of code below to add an eligibility indicator to the data frame.

Code

long$eligible <- with(long, start==3&!priorA)

Question B2. How many participants are eligible for randomisation at 3 months of follow-up?

Answer

n_eligible <- sum(long$eligible)
p_eligible <- round(sum(long$eligible)/max(long$id)*100,1)
cat(n_eligible," (",p_eligible,"%) participants",sep="")

251624 (25.2%) participants

Let’s compare the eligible initiators with the eligible non-initiators. First, take a subset of individuals who are eligibile some time during follow-up and then add an indicator for whether the individual is an iniator or not.

Code

id_eligible <- unique(long[long$eligible,"id"])
long <- long[long$id%in%id_eligible,,drop=F]
id_init <- unique(long[with(long,eligible&A==1),"id"])
long$eligible <- NULL
long$arm <- ifelse(is.na(match(long$id,id_init)),0,1)

Prevalent versus incident users

From a strict target trial emulation perspective, including prevalent users (rather than incident users, i.e., the initiators) doesn’t align with the design of the randomised trial we want to emulate. Prevalent users are not eligible for inclusion into our target trial and so we exclude them from our emulation study too!

Question B3. How many individuals are in each of the arms?

Answer

with(long[long$row==1,],table(arm))

arm
     0      1 
182702  68922

Next, compute the time elapsed since the target moment of randomisation (3 months) until the end of follow-up.

Code

is_last_row <- with(long, row==nrows)
time <- long$stop[is_last_row]
status <- long$event[is_last_row]
m <- with(long, match(id,id[is_last_row]))
long$time <- time[m]-3
long$status <- status[m]

Question B4. What is the crude difference in 5-month event risk between the initiators and non-initiators?

Answer

fit <- lm(I(time<=5)~arm,data=subset(long,start==3))
est_crude <- unname(fit$coef["arm"])
cat("Crude risk difference: ",round(est_crude*100,1),"%",sep="")

Crude risk difference: 13.5%

Since this is an observational study, it is quite possible that the initiators and non-initiators lack exchangeability. We’ll address this in the next part. In the remainder of this part, we’ll repeat the study analysed above, except that we’ll now actually randomise at 3 months — i.e., we’ll conduct the target trial itself, albeit with a finite number of participants.

First, generate the data by running the following code:

Code

trial <- wide2long(drawSample(randomiseAt=3))

Next, prepare the data (trial) for the same analysis as before.

Code

# Assessing eligibility
trial$eligible <- with(trial, start==3&!priorA)
id_eligible <- unique(trial[trial$eligible,"id"])
trial <- trial[trial$id%in%id_eligible,,drop=F]

# Distinguish between trial arms
id_init <- unique(trial[with(trial,eligible&A==1),"id"])
trial$arm <- ifelse(is.na(match(trial$id,id_init)),0,1)

# Compute time to event of interest
is_last_row <- with(trial, row==nrows)
time <- trial$stop[is_last_row]
status <- trial$event[is_last_row]
m <- with(trial, match(id,id[is_last_row]))
trial$time <- time[m]-3

B5. What is the difference in 5-month event risk between the initiators and non-initiators?

Answer

# Estimate 5-month risks
fit <- lm(I(time<=5)~arm,data=subset(trial,start==3))
est_ref <- unname(fit$coef["arm"])
cat("'True' risk difference: ",round(est_ref*100,1),"%",sep="")

'True' risk difference: 9.6%

Part C: inverse probability weighting

In this part, you’ll implement the basics of inverse probability weighting (IPW) for estimating intention-to-treat (ITT) effects.

IPW for the ITT effect of initiating drug treatment at 3 months rests on the assumption that the arms are exchangeable conditional on a set of patient characterstics. Here, we’ll assume exchangeability conditional on the current and previous values of the time-varying variable represented in the column L. For a given interval, the current value of L is given on the same row of the data frame. Run the following line to add the previous value Llag.

Code

long$Llag <- with(long, ifelse(row==1,NA,c(0,L)[-nrow(long)]))
head(long)

With the data prepared, fit a logistic regression model with dependent variable A and independent variables L and Llag.

Code

fit <- glm(A~L+Llag,data=subset(long,start==3),family=binomial)

Question C1. Given the model fit, what is the estimate of the propensity score for an untreated individual with L=0 and Llag=1? And what should be the individual’s inverse probability weight?

Answer

print(ps <- unname(predict(fit,newdata=data.frame(L=0,Llag=1),type="response"))) # Propensity score

[1] 0.2675911

1/(1-ps) # IPW, since individual is untreated

[1] 1.365357

Compute the inverse probability weight for all individuals in the data frame.

Code

ps <- unname(predict(fit,newdata=long,type="response"))
long$weight <- 1/ifelse(long$A,ps,1-ps)

Question C2. Next, estimate the effect of initiating drug use at 3 months from the weighted data. Hint: repeat the analysis of question B4, except this time, add a weight argument. Compare your answer with that of question B5.

Answer

fit <- lm(I(time<=5)~arm,weight=weight,data=subset(long,start==3))
est_IPW <- unname(fit$coef["arm"])
cat("IPW-adjusted risk difference: ",round(est_IPW*100,1),"%",sep="")

IPW-adjusted risk difference: 9.6%

With IPW, we can actually check whether it balances L and Llag across the study arms.

Code

# Before weighting
with(subset(long,start==3), c(initiators=mean(L[arm==1]),noninitiators=mean(L[arm==0])))

   initiators noninitiators 
    0.6479934     0.1848146

with(subset(long,start==3), c(initiators=mean(Llag[arm==1]),noninitiators=mean(Llag[arm==0])))

   initiators noninitiators 
    0.3369896     0.1279625

# After weighting
with(subset(long,start==3), c(initiators=weighted.mean(L[arm==1],weight[arm==1]),noninitiators= weighted.mean(L[arm==0],weight[arm==0])))

   initiators noninitiators 
    0.3118046     0.3117638

with(subset(long,start==3), c(initiators=weighted.mean(Llag[arm==1],weight[arm==1]),noninitiators= weighted.mean(Llag[arm==0],weight[arm==0])))

   initiators noninitiators 
    0.1854195     0.1853514

Question C3. Has the balance improved with respect to these characteristics?

Part D: g-computation

In this part, we’ll again revisit the observational study of part B. Instead of IPW to adjust for a lack of exchangeability, you’ll be using g-computation.

As we’ve established, g-computation for inference about marginal counterfacutal risks is really just standardision of conditional (counterfactual) risks. The conditional risks that we’re after now are (1) the risk of sustaining the event of interest within 5 months from the target intervention (randomisation) time (3 months) if—possibly contrary to fact—patients were to initiate drug treatment, and (2) the risk if patients were not to initiate, both conditional on a set of variables that explain any lack of exchangeability. In the previous, we assumed that the variables L and Llag should be together sufficient to control for confounding, so we’ll also do this here.

Let’s therefore regress time<=5 on arm, L and Llag:

Code

fit <- with(subset(long,start==3),lm(I(time<=5)~arm+L+Llag))

Question D1. Looking at the previous line of code, what assumption(s) do you think (are) made in addition to consistency, conditional exchangeability and positivity? Does the validity of our IPW implementation also rest on this (these) assumption(s)?

Answer

The model that is fit with the previous line of code is non-saturated, which means that it imposes parametric constraints on the data distribution. In other words, we’ve been making modelling assumptions with respect to the conditional outcome distribution here. IPW instead involves modelling the conditional treatment (or exposure) distribution. (We’re also assuming that data from different individuals are independent and identically distributed — a fairly standard assumption in statistics.)

The next step is standardisation, which can be implemented as follows:

Code

data <- data1 <- data0 <- subset(long,start==3)
data1$arm <- 1
data0$arm <- 0
data$cond_risk_initiate <- predict(fit,newdata=data1)
data$cond_risk_not_initiate <- predict(fit,newdata=data0)
est_gcomp <- with(data, mean(cond_risk_initiate-cond_risk_not_initiate))
cat("G-computation-adjusted risk difference: ",round(est_gcomp*100,1),"%",sep="")

G-computation-adjusted risk difference: 10.2%

Question D2. Compare the result with your answer to question C2. Can you change the conditional outcome model (make less restrictive) to bring the estimates closer together?

Answer

One option is to fit a fully saturated model:

fit <- with(subset(long,start==3),lm(I(time<=5)~arm*L*Llag))
data <- data1 <- data0 <- subset(long,start==3)
data1$arm <- 1
data0$arm <- 0
data$cond_risk_initiate <- predict(fit,newdata=data1)
data$cond_risk_not_initiate <- predict(fit,newdata=data0)
est_gcomp <- with(data, mean(cond_risk_initiate-cond_risk_not_initiate))
cat("G-computation-adjusted risk difference: ",round(est_gcomp*100,1),"%",sep="")

G-computation-adjusted risk difference: 9.6%

Note that we are able to fit a fully saturated model here because we’re only adjusting for a few binary variables. Two treatment levels and two binary variables together form at most \(2^3=8\) strata. With many more categorical variables or continuous variables, it will be necessary to impose parametric constraints. However, you may consider using flexible modelling strategies, such as regression on cubic spline transformation of continuous variables or flexible machine learning techniques.

Part E: g-estimation (OPTIONAL)

Feel free to skip this part it you believe it to be too technical for you at present.

In this final part of the practical, we’ll revisit the observational study of part B once more. This time, we’ll use g-estimation.

G-estimation starts by postulating a model for a contrast between conditional counterfactual risks, here comparing the initiating drug treatment versus not initiating at three months. For brevity of notation, let \(Y^a\) be the binary indicator for the occurence of the event of interest within the first 5 months after the target randomisation time, if—possibly contrary to fact—treatment \(A\) were equal to \(a\) (\(a=0,1\)). Let us assume that \[\mathbb{E}[Y^a\mathrel{|}A=a,L,L_{\text{lag}}]-\mathbb{E}[Y^0\mathrel{|}A=a,L,L_{\text{lag}}]=\beta a\] for some constant \(\beta\). This assumption implies that the treatment-outcome effect (expressed on an additive scale) does not vary across levels of \(L,L_{\mathrm{lag}}\).

Question E1. Does this model imply that \(Y^a\) is independent of \(L\)?

Answer

No. It could for example be that \(\mathbb{E}[Y^a\mathrel{|}A,L,L_{\text{lag}}]=\alpha+ L+\beta a\) for all \(a\), in which case \(Y^a\) depends on \(L\) and yet \(\mathbb{E}[Y^a=1\mathrel{|}A=a,L,L_{\text{lag}}]-\mathbb{E}[Y^0\mathrel{|}A=a,L,L_{\text{lag}}]=\beta a\), as per the model above.

Next, we must relate the risk \(\mathbb{E}[Y^0\mathrel{|}A,L,L_{\text{lag}}]\) to the distribution of observed variables. To this end, we employ the consistency condition (\(Y=Y^1A+Y^0(1-A)\)), which implies \(Y^0=Y-(Y^1-Y^0)A\) and, so, \(\mathbb{E}[Y^0\mathrel{|}A,L,L_{\text{lag}}]=\mathbb{E}[Y - \beta A \mathrel{|} A,L,L_{\text{lag}}]\). If we knew \(\beta\), we could thus estimate \(\mathbb{E}[Y^0\mathrel{|}A,L,L_{\text{lag}}]\) for all individuals by \(\widehat {Y^0}(\beta):=\widehat{\mathbb{E}}[Y - \beta A\mathrel{|}A,L,L_{\text{lag}}]\). This forms the basis for the next step.

Code

data <- subset(long,start==3)
data$Y <- (data$time<=5)*1
data$A <- data$arm
hatY0 <- function(beta,data) with(data,Y-beta*A)

Now comes the exciting bit: we know that \(\widehat {Y^0}(\hat\beta)\) is independent of \(A\) given \(L\) and \(L_{\text{lag}}\) if \(\hat\beta=\beta\), or at least that’s what we’re assuming implictly (as a consequence of the conditional exchangeability assumption). So, we look for values of \(\beta\) that are compatible with this assumption:

Code

betas <- seq(-.2,.2,length=100)
estimates <- numeric(length=length(betas))
for(i in seq_along(betas)){
  data$hatY0 <- hatY0(betas[i],data)
  estimates[i] <- lm(hatY0~A+L+Llag,data=data)$coef["A"]
}
plot(betas,estimates,xlab=bquote(hat(beta)),ylab="Departure from conditional exchangeability",type="l")
abline(h=0,lty=3)

Question E2. Looking at the figure, what values of \(\hat\beta\) are compatible with our conditional exchangeability assumption?

Answer

When \(\hat\beta\) is somewhere around 10%. Note that the validity of this approach depends on the extent to which the modelling assumptions hold.