Day 1 Practical: Potential Outcomes

Part 2 Bonus Exercises: Estimating Causal Effects

Author

Oisin Ryan

In this practical, we’ll get to grips with the basics of estimating causal effects using stratification, matching and propensity scores in a multivariate setting. We will need a number of R packages in these exercises. We assume you have completed or are familiar with the content of the univariate adjustment practical from today.

We will use a number of R packages in this practical, which you can install as follows

Code
 install.packages("tableone")
 install.packages("MatchIt")
 install.packages("survey")

In this practical we will get some practice using some tools for causal effect estimation which have been inspired by the potential outcomes framework. To do this we will use data from Schafer and Kang (2008). We will reproduce only a small part of their analyses, but you can read up on even more ways to estimate causal effects from nonrandomized studies in their paper.

Schafer, J. L., and Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological methods, 13, 279-313. DOI: 10.1037/a0014268

Data and Research Question

The data are found in the datafile called SchaferKangData.dat which you can download from here. The dataset represents a (simulated) two-wave observational study on emotional distress and dieting behaviour in adolescent girls. Our research question is: What is the effect of dieting on emotional distress? We will apply several different methods we learned about in the lecture to answer this question. For null hypothesis tests, you can use a significance level of 0.05 throughout.

Exercise 1

Load and inspect the data. The outcome variable is DISTR.2 a continuous variable measuring emotional distress at the second measurement wave. The treatment variable is DIET, a binary variable indicating the presence or absence of a specified dieting behaviour at wave I. All other variables should be considered measurements of covariates at wave 1.

Code
df <- read.table("SchaferKangData.dat", header=T)
head(df)

Checking Covariate Balance

The “naive” difference between those who did and did not diet would only be a valid estimate of the causal effect only under the assumption of exchangeability (i.e. no confounding). We know already that this is not a reasonable assumption based on the design of the study from which this data originates.

If the treatment assignment was randomized, then we would not expect to see systematic differences between the control and treated groups on our observed covariates. Observing such differences (for covariates that we believe to be potential confounders or causes/effect modifiers of the outcome) is a cause for concern.

Exercise 3

Investigate whether the diet (DIET ==1) and control (DIET ==0) groups differ from each other on certain covariates. You can use tableone::CreateTableOne() as a quick summary method.

We can check this in a number of ways. For instance, by plotting this histograms of the observed variables across dieting groups, for example:

Code
p1 <- hist(df[,"SLFWGHT"][df$DIET == 0], plot = F)
p2 <- hist(df[,"SLFWGHT"][df$DIET == 1], plot = F)
plot( p1, col=rgb(0,0,1,1/4), xlim=c(1,5), main = "Covariate distributions across groups",
      xlab = "Self-rating of Weight")  # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(1,5), add=T)  # second
abline(v = mean(df[,"SLFWGHT"][df$DIET == 0]), col = "blue")
abline(v = mean(df[,"SLFWGHT"][df$DIET == 1]), col = "red")

Alternatively, you can compute standardized mean differences (SMDs) between the two groups on any covariate(s) of interest. In epidemiology and health sciences, researchers often present a table of such SMD values, along with the means in each group of interest, as the first table of the results section, and so this is sometimes referred to as table 1. Let’s use the function CreateTableOne() from the package tableone to check this. Often, researchers use a rule of thumb that considers an SMD bigger than .1 to represent a worrying degree of imbalance between the covariate distributions. While we do not endorse these kinds of arbitrary cut-off values in general, it is common in the literature to see this or other values used as a reference.

Code
table1 <- tableone::CreateTableOne(vars=c("DISTR.1","BLACK", "NBHISP", "GRADE",
                      "SLFHLTH", "SLFWGHT", "WORKHARD", "GOODQUAL", 
                          "PHYSFIT", "PROUD", "LIKESLF", "ACCEPTED", 
                      "FEELLOVD"), strata="DIET", data=df, 
                        test=FALSE)
print(table1, smd=TRUE)
                      Stratified by DIET
                       0           1           SMD   
  n                    4780        1220              
  DISTR.1 (mean (SD))  0.62 (0.42) 0.71 (0.45)  0.214
  BLACK (mean (SD))    0.26 (0.44) 0.17 (0.38)  0.197
  NBHISP (mean (SD))   0.15 (0.35) 0.15 (0.36)  0.021
  GRADE (mean (SD))    9.16 (1.39) 9.37 (1.34)  0.152
  SLFHLTH (mean (SD))  2.20 (0.93) 2.35 (0.91)  0.171
  SLFWGHT (mean (SD))  3.19 (0.76) 3.84 (0.70)  0.895
  WORKHARD (mean (SD)) 2.14 (0.91) 2.05 (0.85)  0.093
  GOODQUAL (mean (SD)) 1.80 (0.67) 1.84 (0.71)  0.055
  PHYSFIT (mean (SD))  2.24 (0.93) 2.53 (0.93)  0.320
  PROUD (mean (SD))    1.76 (0.77) 1.86 (0.79)  0.126
  LIKESLF (mean (SD))  2.09 (0.99) 2.52 (1.06)  0.420
  ACCEPTED (mean (SD)) 2.14 (1.00) 2.35 (1.06)  0.207
  FEELLOVD (mean (SD)) 1.78 (0.83) 1.93 (0.90)  0.172

Let’s take this result as an indication that the covariate distributions may be unbalanced across treatment groups.

Conditional Exchangeablity, Covariate Adjustment, and Propensity Scores

As we have seen, in observational settings the assumption of exchangeability is unlikely to hold. However, if we have gathered sufficient information on potential confounders, then we may be able to achieve conditional exchangeability (exchangeability within levels of the covariates). As such, to obtain an unbiased estimate of the causal effect, we must appropriately adjust for group differences in the covariates of interest.

Throughout the first three days of the course, we will be looking at a number of different methods for covariate adjustment. In the second practical you have in fact already practiced one type of covariate adjustment, known as stratification; in that context, we had a single covariate we wished to adjust for, and we did so by comparing treated and untreated individuals in each strata (i.e. level) of the covariate. Stratification, as a general strategy, becomes more difficult when we have many covariates, those covariates have many levels, and we have a relatively small sample size.

In this part of the practical we will introduce you to four concepts which are relevant for covariate adjustment:

  • Matching, in which we try to create pairs of a treated and an untreated individual that have the same covariate values
  • Propensity scores, a tool for summarizing information about multivariate confounding
  • (Inverse Probability) Weighting, in which we use propensity scores to create a pseudo-population that is balanced on the covariates

Matching

Matching as a method of covariate adjustment is conceptually similiar to idea of stratification. The basic approach of exact matching is to search for people in both treatment groups that have the same score on all of the covariates you wish to adjust for (i.e., that have the same profile of covariate values). This can be thought of as an extreme version of stratification; in principle every possible combination of covariate values is a new strata. After matching, the matched treated and control groups are guaranteed to be balanced on their covariate values. A key downside of this adjustment approach is that we only include strata in our analysis for which there is at least one person in both groups that take on that combination of values. All other strata (i.e., all individuals who do not have an “exact” match in the other group) are dropped from the analysis.

To perform matching, we must first play close attention to what kind of causal effect we wish to estimate. When estimating the ATE we look for an eligible match for all individuals in the study.

Exercise 4

In this exercise, we will match every member of the treated group (DIET = 1) to an untreated individual (DIET = 0) and vice versa. We can do this using the function matchit() from the package MatchIt in R. You can do this by specifying a regression equation DIET ~ [matching-variables], with options method = "exact" and estimand = "ATE". Here we try to recreate the matching procedure we used in the main exercises - exact matching with replacement.

Code
library(MatchIt, quietly = T)
match1 <- matchit(DIET ~ DISTR.1 + as.factor(BLACK) + as.factor(NBHISP) 
                    + GRADE + SLFHLTH + SLFWGHT + WORKHARD + GOODQUAL 
                    + PHYSFIT + PROUD + LIKESLF + ACCEPTED + FEELLOVD, 
                    method = "exact",  data = df, estimand = "ATE")
Exercise 5

Inspect the resulting matched dataset using summary(). Are the covariates now balanced? How many individuals are in the matched dataset, and how much information was lost?

Code
# look for objects sum.matched and nn
summ1 <- summary(match1)

The first thing we should notice is that using exact matching we lose a lot of information; with many covariates, many of which have 5 categories, it is difficult to find an exact match for every treated individual. We end up only matching around 10 treated and 11 untreated individuals. Clearly, exact matching is not a feasible strategy for this datase, although it does achieve covariate balance.

Code
summ1$nn
                 Control Treated
All (ESS)     4780.00000  1220.0
All           4780.00000  1220.0
Matched (ESS)   10.88889     9.8
Matched         11.00000    10.0
Unmatched     4769.00000  1210.0
Discarded        0.00000     0.0

Since exact matching is not a feasible adjustment strategy in this situation, we can change our matching strategy. Rather than searching for exact matches, we could instead search for near matches; this strategy may be less succesful at balancing covariates, but will allow us to experience less data loss.

Exercise 6

Perform matching with method = "nearest". In order to do this with matchit() we must switch our estimand of interest from the ATE to the ATT. This means we will take every treated unit and find the best match available for them in the control units. Inspect how many matches you obtain and the covariate balance as you did above.

Code
match_nearest <- matchit(DIET ~ DISTR.1 + as.factor(BLACK) + as.factor(NBHISP) 
                    + GRADE + SLFHLTH + SLFWGHT + WORKHARD + GOODQUAL 
                    + PHYSFIT + PROUD + LIKESLF + ACCEPTED + FEELLOVD, 
                    method = "nearest",  data = df, estimand = "ATT")
summ_nearest <- summary(match_nearest)
Exercise 7

Use the function match.data() to save the matched dataset to a data.frame. Now, estimate the ATT by comparing the matched treated and control groups. Compare to the estimate you would obtain using a naive method: by comparing the groups in the original dataset, without any adjustment

Code
df.match <- match.data(match_nearest)
t.test2 <- t.test(DISTR.2 ~ DIET, df.match)
m1 <- t.test2$estimate[2]
m0 <- t.test2$estimate[1]

# unadjusted mean difference
t.test1 <- t.test(DISTR.2 ~ DIET, df)
# Difference in means between the groups:
m1_naive <- t.test1$estimate[2]
m0_naive <- t.test1$estimate[1]

The original sample consisted of 6000 individuals, and the matched sample consists of 2440 individuals. This is because the number of treated individuals is 1220, and these were all matched with a non-treated person. The covariate balance is no longer perfect, because we do not perform exact matching. But we can see that most covariates are very close in mean across the matched groups now.

First let’s look at the naive estimate:

Code
data.frame("mean difference" =m1_naive - m0_naive, "p value from t-test"= t.test1$p.value,
           row.names = FALSE)

The naive estimate implies that those who diet experience more distress than those who do not diet.

Now let’s look at our adjusted estimated of the ATT:

Code
data.frame("mean difference" =m1 - m0, "p value from t-test"= t.test2$p.value,
           row.names = FALSE)

Here the difference between the matched cases implies that among those girls who are likely to diet, dieting (the treatment condition: DIET=1) seems to result in less distress than not dieting (the no treatment condition: DIET=0). Note the “girls who are likely to diet” comment: This is referring to the fact that we have taken the sample that did diet, and then found matches for these participants who did not diet. However, given the relatively large SE (here shown as a p-value) this is not a significant difference; Our best guess is a a negative effect, but we have too much uncertainty around this estimate to be confident

Propensity Scores

In the lecture we introduced the concept of a propensity score, the probability of receiving treatment given the value of a set of covariates. Propensity scores are convenient tools to summarize information on a range of covariates, and they can be used to adjust for confounding in many different ways. Estimating a propensity score is relatively straightforward. In these exercises you will see how this can be done, and what important considerations are for a propensity score model.

Exercise 9

To compute a propensity score, run a logistic regression model in which the binary treatment variable X (values 0 and 1) is the outcome variable, and the covariates are the predictors. Make sure to save the probability for each person of scoring 1 on X. You can use the glm function from the stats package for this.

Code
# Run the logistic regression analysis
logreg <- glm(DIET ~ DISTR.1 + as.factor(BLACK) + as.factor(NBHISP) 
    + GRADE + SLFHLTH + SLFWGHT + WORKHARD + GOODQUAL 
        + PHYSFIT + PROUD + LIKESLF + ACCEPTED + FEELLOVD, 
            family = binomial(), data = df)
# Obtain a prediction of the probability of treatment (i.e., DIET=1) 
ps <- predict(logreg, type = "response")
# Add this predicted probability to the datafile
df$ps <- ps
Exercise 10

Make a histogram to look at the propensity score distribution seperately amongst those who diet and those who do not diet. Discuss what you see, and why this is important. Which causal identifiability assumption is relevant here?

Code
df1 <- df[ which(df$DIET == 1), ]
df0 <- df[ which(df$DIET == 0), ]
hist0 <- hist(df0$ps, breaks=30, plot=FALSE)
hist1 <- hist(df1$ps, breaks=30, plot=FALSE)
# see answers for how to plot these histograms!

Plotting the histograms allows us to assess the assumption of positivity. What we see is that the two groups seem to overlap well in their propensity scores, even in the tails. If this would not be the case, that would be an indication that the assumption of positivity is violated. That is, at each possible combination of the covariates, which is now summarized with the propensity score, there should be both treated and non-treated individuals in our sample.

Code
plot( hist0, col=rgb(0,0,1,1/4), xlim=c(0,1), 
    xlab="Propensity score", main="Histogram of propensity scores")  
plot( hist1, col=rgb(1,0,0,1/4), xlim=c(0,1), add=T) 

You may notice that the syntax we used to fit the propensity score model is very similar to that we used with matchit(). In fact, behind the scenes, for certain specifications, matchit() uses propensity scores for matching; first, it fits a propensity score model using the matching covariates specified, then it matches on the propensity score not on the variable itself. So, you already have some experience with using propensity scores for covaraite adjustment with matching! We can even plot the distribution of propensity scores amongst matched and unmatched individuals:

Code
plot(match_nearest, type = "hist")

Recall that the PS is a sort of summary of multivariate information. From these histograms we can see that, although the distributions of covariates are different in the observed groups, after matching, the covariate distribution amongst the controls has been shifted onto the distribution of covariates amongst the treated.

Important to note here is that we must choose the PS model we wish to fit. In this situation we took a simple choice: modeling linear effects of all covariates. We may also choose to include, for instance, quadratic terms, interaction terms, or other non-linear effects of covariates. This is a disadvantage of PS methods: We must correctly model the treatment assignment model. The more data we have, the more flexibly we can model the PS!

For now, let’s see how else we can use propensity score models to estimate causal effects.

Inverse Probability Weighting

Inverse Propensity Weighting (IPW) or Inverse Probability of Treatment Weighting (IPTW) is a popular and flexible method of using propensity scores to account for unbalanced covariates. As with matching, we can use different variations of IPTW to estimate different causal effects (i.e., the ATE, ATT or ATC).

To estimate the ATE, the estimated propensity scores \(\hat{\pi}_i\) are used to determine the probability that an individual would have received the treatment that they in fact received:

  • for the treated (\(X_i=1\)) this is simply \(\hat{\pi}_i\)

  • for the non-treated (\(X_i=0\)) this is \(1-\hat{\pi}_i\).

Next, we can use these probabilities as weights, by taking their inverse: That way, a case that received a treatment that she was very likely to receive, will get a small weight, while a case that received a treatment that she was very unlikely to receive, will get a large weight. Thus, the inverse probability weight indicates the number of persons from the population that this person represents. For the treated, this weight is \(1/\hat{\pi}_i\); for the untreated, it is \(1/(1-\hat{\pi}_i)\). For this reason we refer to IPW as creating a “pseudo-population”.

We can compute our estimate of the IPTW-weighted ACE by using techniques which allow us to include weights directly. Typically, most glm() functions in R have a weights option. For this practical, we are going to use tools from the survey package.

Exercise 11

First, compute the weights using the formula above, and add the weights as a new column to your dataset.

Code
library(survey, warn.conflicts = F,quietly = T)
library(tableone, quietly = T)
weight <- ifelse(df$DIET==1,1/(df$ps),1/(1-df$ps))
df$weight <- weight
Exercise 12

Second, use the svydesign and svyCreateTableOne functions to create a weighted dataset, and to check the covariate balance (SMDs) of the weighted dataset.

Code
weighteddata <- svydesign(ids = ~ 1,
                          data = df,
                          weights = ~ weight)
weightedtable <- svyCreateTableOne(
                        vars=c("DISTR.1","BLACK", "NBHISP", "GRADE",
                               "SLFHLTH", "SLFWGHT", "WORKHARD", "GOODQUAL", 
                                   "PHYSFIT", "PROUD", "LIKESLF", "ACCEPTED", 
                               "FEELLOVD"), 
                        strata = "DIET", data = weighteddata, test = FALSE)

This shows that in the pseudo-population that was created using IPW, the two groups do not differ much any more on most of the covariates. There is one covariate SLFWGHT, whose standardized mean difference is a little large, implying that in this situation IPW is not entirely successful here in mimicing random assignment for that covariate. In practice, we would like to investigate this a little more. Typically, when using IPWs we also want to explore their distribution, in particular to check for very large weights: Often, weights are truncated at their 99th percentile value to avoid the situation where one or two cases have too large an influence on the results. For the sake of this practical, we will skip this step and proceed.

Code
print(weightedtable)
                      Stratified by DIET
                       0              1             
  n                    6014.23        6368.45       
  DISTR.1 (mean (SD))     0.64 (0.43)    0.63 (0.43)
  BLACK (mean (SD))       0.24 (0.43)    0.25 (0.44)
  NBHISP (mean (SD))      0.15 (0.35)    0.13 (0.33)
  GRADE (mean (SD))       9.20 (1.38)    9.26 (1.41)
  SLFHLTH (mean (SD))     2.23 (0.94)    2.25 (0.91)
  SLFWGHT (mean (SD))     3.33 (0.80)    3.15 (1.00)
  WORKHARD (mean (SD))    2.12 (0.90)    2.14 (0.86)
  GOODQUAL (mean (SD))    1.81 (0.67)    1.79 (0.68)
  PHYSFIT (mean (SD))     2.30 (0.95)    2.31 (0.90)
  PROUD (mean (SD))       1.78 (0.77)    1.77 (0.76)
  LIKESLF (mean (SD))     2.18 (1.02)    2.14 (1.00)
  ACCEPTED (mean (SD))    2.18 (1.01)    2.16 (1.02)
  FEELLOVD (mean (SD))    1.81 (0.84)    1.79 (0.83)
Exercise 13

Estimate the ATE using the IP-weighted group difference. You can do this using, for instance, svyglm() with the weighted data, or by using the original data object, together with a weights column, with most regular glm() functions. What is the causal effect estimate?

Code
msm <- svyglm(DISTR.2 ~ DIET, design = weighteddata)
Code
summary(msm)

Call:
svyglm(formula = DISTR.2 ~ DIET, design = weighteddata)

Survey design:
svydesign(ids = ~1, data = df, weights = ~weight)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.660949   0.007029  94.032   <2e-16 ***
DIET        -0.005064   0.030394  -0.167    0.868    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.2397251)

Number of Fisher Scoring iterations: 2
Code
confint(msm)
                  2.5 %     97.5 %
(Intercept)  0.64716947 0.67472808
DIET        -0.06464777 0.05451944

This shows that the effect of dieting is now estimated to be -0.005 (SE=0.030), which is not significantly different from zero. Recall that we are here estimating the ATE.

Bonus Exercise

IPTW can also be used to estimate the ATT. Recall that for the ATT we want to make the control group resemble the treated group with respect to their covariates. For ATT estimation, we give all treated units a weight of 1. We compute the weights for the control group as: \(\hat{\pi}_i/ (1 - \hat{\pi}_i)\). Calculate these new weights, and repeat the procedure above to get a second estimate of the ATT. Compare to the estimate you obtained from matching earlier.

Code
weight_att <- ifelse(df$DIET==1,
                     1, # treated group weight
                     df$ps/(1-df$ps) # untreated group weights
                     )

Conclusion

In this practical we have examined two popular methods for covariate adjustment: Matching, and Inverse Probability of Treatment Weights. We considered matching directly on covariates, and matching using propensity scores. The goal of these techniques is to create two groups which are balanced on the values of relevant confounders.

In matching, this is done by creating pairs of a treated and an untreated person who have (almost) identical (propensity or covariate) scores, resulting in a smaller but balanced dataset. Matching is a form of conditioning on confounding variables. In inverse probability weighting, this is instead done by weighing each person’s observation by their inverse probability of received treatment, thereby creating a balanced pseudo-population. Notably, while this technique also adjusts for confounders, it does not condition on them. This distinction becomes important in more complex scenarios which we will see later in the week. When we use either approach, we should check whether it balances the covariates, for instance, by considering the standardized mean differences (for other options, see for instance: Austin, P. c. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine, 28, 3093-3107. https://doi.org/10.1002/sim.3697).

Many other options exist for using propensity scores to mimic RCTs, with many of these surveyed and compared in Schafer & Kang (2008). We will continue to explore different methods of adjustment throughout the rest of the week. Each adjustment method has its own advantages and disadvantages, and rely on their own asssumptions.

It is important to always keep in mind that all the techniques discussed here were developed under the assumption of no unobserved confounding. This means that all relevant confounders should be included as covariates. Sensitivity analysis is a valuable approach to investigate how the effect of unobserved confounding can be further investigated (not covered in this course; for more information see for instance: Blackwell, M. (2013). A selection bias approach to sensitivity analysis for causal effects. Political Analysis, 22, 169-182. https://doi.org/10.1093/pan/mpt006).