Introduction

Michael Taylor

2018/05/30

knitr::opts_chunk$set(cache = TRUE)

Learning Objectives

After this lesson you should be able to:

  1. Identify the features of a causal DAG

  2. Understand the rules of d-separation

  3. Construct a causal DAG that reflects assumptions of how treatments, outcomes, and other factors relate to one another

  4. Distinguish between different structural sources of bias

Estrogens and Uterine Cancer: The problem

library(DiagrammeR)

What is a DAG?

This is our first causal diagram. This is a causal diagram with three variables, or nodes: L, A, and Y. These variables are connected by arrows, which are also known as directed edges.

This diagram is drawn when we know that L has an effect on A and A has an effect on Y and L has an effect on Y that is not mediated through A.The arrows indicate the direction of causality and for this reason it is refered to as a directed graph.

Another important property of this graph is that it is acylic meaning there are no cycles. No matter which variable you start from you can never get back to the same variable. L causes A and A causes Y, but then Y doesn’t cause L. If you think of time as going from left to right on the graph, then the graph is acyclic just means that the past affects the future, but the future doesn’t affect the past. And because these graphs are directed and they’re acyclic, we refer to them as DAGs– directed acyclic graphs.

DAG’s are drawn from expert knowledge under study. If our expert knowledge is insufficient to exclude any possible effect we draw all the arrows and say the DAG is complete. On the other hand, if our expert knowledge allows us to exclude some causal effects, then we will omit some of the arrows in the DAG. Knowledge is represented by missing arrows.

DAGs are used for many things, not only for causal inference. We are going to be dealing with causal DAGs. A DAG is a causal DAG if when two variables on the graph share a cause, that cause is also represented on the graph. This is known as the Causal Markov Condition.

Cause and Effect

The distinction between causation and association is crucial in research and in fact, causal diagrams are just a tool to navigate between association and causation. Suppose you have a large population of individuals and that they are all given a treatment. We can then wait and see how many of them die. Suppose 20% of them die. Now suppose we have a time machine that can take us back in time. Now we don’t give treatment to anyone in the population. We wait and see that 50% of them die. If we could do this, we would have proven that the treatment has, on average, a causal effect on death in this population. It prevents death.

Of course, in practice we cannot take people back in time. What we can do is to compare two groups of people that are similar, essentially identical, with respect to their risk of death. Quantifying causal effects requires the contrast of the same, or very similar, populations and their different levels of treatment. Formally, we say that causal effects are defined by counterfactual contrasts. In this course, we will not explicitly use counterfactual theory, but counterfactual theory underlines everything we do.

Suppose we have a population of individuals. Some of them receive treatment, and some of them do not. Let’s say that 30% of the treated die, and only 10% of the untreated die. That doesn’t mean that the treatment has a causal effect on death. Perhaps treatment has no effect, but it is given to people who are at a higher risk of death anyway. Yet, we say that treatment and death are associated, because the risk of death is different in the treated and the untreated. Quantifying associations simply requires the contrast of two groups of individuals under different levels of treatment.

The association may be present, because treatment has a causal effect or because the groups of individuals are different. So there may be association without causation. We can think about association in another equivalent way. We say that treatment and death are associated when having information about treatment status allows us to predict death better on average, which is precisely what happens in our example. If we learn that someone is treated, then we will predict she has a greater risk of death, even if treatment does not cause death. Causal diagrams are very helpful, because they represent both association and causation simultaneously.

MIGUEL HERNAN: Why do we like causal graphs? After all, we have seen that causal graphs are simple pictures, that even a five year old can understand. How can they be so helpful to people conducting research? Well, causal DAGs are helpful because they are two things at the same time. On the one hand, they’re causal models. They are qualitative causal models, but causal models. On the other hand, they are statistical models. That is, they’re models that represent associations and independencies between variables. That means that we can draw a causal graph using our expert knowledge, our causal knowledge, and at the same time, we are building a statistical model without knowing it. And this dual nature is based on the fact that the causal effects imply associations. And lack of causal effects imply independencies. And this is very important, because when we are conducting research we find biases. And these biases are associations. Therefore, we can use causal graphs to conceptualize those biases and to identify them in our research.

OK, this may have sounded pretty abstract. There is a mathematical theory underlying causal graphs, but we don’t need to master that theory in order to use causal graphs. It’s kind of like you don’t need to know how a car works in order to ride it. Later in this lesson we’ll talk more about theory, but for now we are going to see how causal graphs work with examples, with informal examples.

Let us consider the simplest possible causal DAG with two variables, A arrow Y. A is a variable that represent cigarette smoking and can take two values. One, if the person is a smoker, zero if a person is not. Y is a variable that represents lung cancer and can also take two values. One if the person develops lung cancer, and zero if the person doesn’t. So we say that A and Y are binary variables or dichotomous variables. We draw this causal graph because we believe that there is a causal effect of smoking, A, on cancer Y. And to do that, we use our expert knowledge. We didn’t use any data. But what if we had data? Well, if we had data, then we could compute the association between A and Y. So if A has a causal effect on Y, as represented by our graph, A arrow Y, do we expect to find an association between A and Y in our data?

To answer this question, suppose that we have a database with millions of people. And for each person we know whether they were cigarette smokers, A, and whether they develop lung cancer, Y. What does an association between smoking, A, and cancer, Y, mean? If we go back to our definition of association, we said that smoking and lung cancer are associated if the proportion of individuals with cancer is different among smokers and nonsmokers. But this is precisely what we expect to happen if smoking causes cancer. We expect A and Y to be associated when A has a causal effect on Y. And we expect A and Y to be independent when A does not have an effect. We can think about association in an equivalent way. We say that A and Y are associated when having information about A allows us to predict Y better on average. And that is precisely what happens in our example. If we learn that someone is a cigarette smoker, then we will predict she has a risk of lung cancer that is greater than the average risk of cancer in the population. And that’s what I meant when I said that DAGs are both causal and statistical models. Because if we use our expert knowledge to draw a causal graph, with no arrow from A to Y, then we are also drawing a statistical model that says that A and Y are independent, that they are not associated.

And more generally, graph theory gives us a rule. We can only exclude an association between A and Y if there is no arrow from A to Y. Informally, we can see the arrow between A and Y as a pipe. A pipe that carries association right on water. If the arrow is there, then a flow of association between A and Y is expected. Let’s now consider another question. When we drew our graph, A arrow Y, we didn’t include any variables between A and Y. But the effect of smoking A on lung cancer Y is obviously mediated by some variables. For example, by the damage to the DNA of the cells of the lung that smoking causes. So we could then have drawn a causal graph, A arrow B arrow Y, where B is cell damage. And B, we say is a mediator of the effect of A on Y.

library(ggdag)
## Loading required package: ggplot2
## 
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
## 
##     filter
dagify(Y ~ B,
       B ~ A,
       latent = 'B',
       exposure = 'A',
       outcome = 'C',
       labels = c('A' = 'Cigarette smoking',
                  'B' = 'Cell damage',
                  'Y' = 'Cancer')
       ) %>% 
  ggdag(use_labels = 'label')

But we didn’t do it. We didn’t include B on our graph. And that’s because causal graphs do not need to include mediators when the goal is to estimate the total effect of A on Y. If we needed information on mediators to estimate causal effects, then it will be impossible to estimate most causal effects. Not even using randomized experiments, because we typically don’t have any information on mediators in randomized experiments. OK, but let’s say that we decide to include the mediator B, cell damage, into our graph. This is the graph that we would draw if we believed that there is an effect of A on B, that there is an effect of B on Y, and that there is no direct effect of A on Y through pathways other than the A B Y pathway. Again, this level of detail in the specification of the graph is unnecessary when we are interested in the total effect of A on Y.

But let’s say that we have a graph with B. In this case, we can ask a new type of question. We can ask a question about conditional independence. We can ask the question, are A and Y associated conditional on B or within levels of B? Is there an association between A and Y, among individuals with a particular value of B? And to answer this question, we will need data on A, Y and of course, B. So suppose again that we have a database with millions of people and for each person we know whether they were cigarette smokers, whether they had cell damage, and whether they developed lung cancer. And let me make a clarification here. The arrows of causal graphs are not meant to be deterministic. That means that when we have an arrow from cigarette smoking A to cell damage B, that doesn’t mean that for every single smoker we’re going to see cell damage. Because some smokers may never develop cell damage; some nonsmokers may develop cell damage for other reasons. OK, so with this data on A, Y and B, we can answer the question on whether A and Y are associated conditional on B. For example, we can restrict our analysis to a subset of individuals with cell damage with B equals 1. And to represent it graphically that we are conditioning on a particular value of B, 1, we put a square box around B on the graph. And now we can check in the subset of the population with cell damage B equals 1 whether there is an association between A and Y.

We just check whether the proportion of individuals with lung cancer is different among smokers and nonsmokers. If the proportions are different, we will say that there is an association between A and Y conditional on B equals 1. Or another way to say this, we will check whether A contains information not already included in B that allows us to predict Y better. If the correct DAG is really A arrow B arrow Y, do we expect to find an association between A, cigarette smoking and Y,lung cancer, among people with B equals 1 with cell damage? Well, according to this graph, the effect of smoking is entirely mediated through cell damage. Therefore, if someone has cell damage, then learning that she’s a smoker does not provide any additional information with respect to the risk of Y. You can think of it in this way. If we know that someone with cell damage has a 10% chance of developing cancer, and then we learn that she is a cigarette smoker, that doesn’t change the number. She still has a 10% chance of developing cancer. Because under our graph, smoking can only affect cancer risk through cell damage. And similarly, if we know that someone without cell damage has a 1% chance of developing cancer, thenl earning that she is a cigarette smoker does not change that number. She still has a 1% chance of cancer. We say that there’s no conditional association between A and Y with the levels of B. And that’s true for all levels of B, whether we are conditioning on B equals 1, cell damage, or B equals 0, no cell damage.

And this is another example of why DAGs are both causal and statistical models. We use our expert knowledge to draw a causal graph with no direct arrows from A to Y. And that implies a statistical model that says A and Y are independent conditional on B. That they are not associated with the levels of B. More generally, there’s a rule in graph theory. The flow of association between A and Y is interrupted when we condition on the mediator, B. The box around B blocks the association between A and Y. So if there is no direct arrow from A to Y, we say that there is no association between A and Y conditioned on B, even though A has a causal effect on Y. OK, it’s time to discuss another causal structure.

Confounding

dagify(Y ~ L,
       A ~ L,
       labels = c("A"="Yellow fingers",
                  "Y"="Lung cancer",
                  "L"="Cigarette smoking"),
       exposure = 'L',
       outcome = 'Y') %>% 
  tidy_dagitty(layout = "tree") %>% 
  ggdag_dseparated(controlling_for = 'L',
                   use_labels = "label")

In this segment, we are going to continue to explore the relation between causal structures and association using graphs. Now we’re going to consider a graph in which the variables A and Y share a cost, L.

For example, A can be yellow fingers, having yellow fingers: yes, no. Y can be lung cancer: yes, no. And L, cigarette smoking: yes, no.

There is an arrow from L to Y because smoking has a causal effect on cancer. There is an arrow from L to A because cigarette smoking has a causal effect on yellow fingers. People who are heavy smokers for many years tend to develop yellow fingers. But there is no arrow from A to Y because having yellow fingers doesn’t have an effect on cancer. So we drew this causal graph using our expert knowledge. We didn’t use any data. But what if we had data?

Well, if we had data, then we could compute the association between A and Y. So if A doesn’t have a causal effect on Y, as represented by this graph with no arrow from A to Y, do we expect to find an association between A and Y in our data? To answer this question, again suppose that we have a database with millions of people. And for each person, we know whether they have yellow fingers (A), whether they develop lung cancer (Y). What does an association between yellow fingers and cancer mean? Remember our definition of association, yellow fingers and lung cancer are associated if the proportion of individuals with lung cancer is different among those with and without yellow fingers.

But that is precisely what we expect to happen here. People with yellow fingers are more likely to have lung cancer than people without yellow fingers And that’s not because yellow fingers cause lung cancer, it is because having yellow fingers is a marker of smoking, which causes lung cancer. So we do expect A and Y to be associated even though A has no causal effect on Y. Another way of saying this is that there is an association between A and Y because having information about A allows us to predict Y better on average. If you learn that someone has yellow fingers, it is likely that person has an above average risk of lung cancer. And this association is a bias. When we’re using data to estimate the causal effect of A on Y, any association between A and Y that is not due to the effect of A and Y is considered a systematic bias. In particular, when there is a component of the association between A and Y that is due to a common cause of A and Y, like L in our causal graph, we say that that is confounding. In our example, in naive investigator might conclude that yellow fingers cause cancer because yellow fingers and lung cancer are associated. And that would be an example of a biased effect estimate. And one of the most important goals of causal inference is to eliminate bias due to confounding. We’ll have a full lesson to talk about this.

What we have seen here is another example of how DAGs are both causal and statistical models. When we used our expert knowledge to draw a causal graph with no arrow from A to Y because we know that yellow fingers don’t cause cancer, and with a common cause of A and Y because we know that cigarette smoking causes both yellow fingers and lung cancer, when we were doing that, we were also drawing a statistical model that says that A and Y are not expected to be independent, that they are expected to be associated. More generally, graph theory gives us a rule, which is that we cannot exclude an association between A and Y when A and Y have a common cause, L, even if there is no arrow from A to Y. Informally, we can see that there is a flow of association between A and Y that is expected through L. Now let’s say this again, because this simple graphic rule is related to confounding, and therefore very important for causal inference. The presence of a common cause of A and Y makes us expect an association between A and Y, even if A doesn’t cause Y. Let’s now move to questions about conditional independence.

So far, we have considered the association between A and Y without conditioning on a third variable. That is, we have considered the unconditional, the marginal association between A and Y. We will now consider the conditional association between A and Y within levels of L. For example, is there an association between yellow fingers and lung cancer among never smokers? To answer these questions, we need data on A, Y, and L. Suppose again we have a database with millions of people, and for each person we know whether they were cigarette smokers, whether they had yellow fingers, and whether they developed lung cancer. With this data, we can answer the question of whether A and Y are associated conditional on L. For example, we can restrict the analysis to the subset of individuals who are never smokers. Remember, we use a square box around a variable to indicate that we’re conditioning on it. So now we can check in the subset of the population who are never smokers whether there is an isolation between A and Y. We just check whether the proportion of individuals with lung cancer is different among those with and without yellow fingers. If the proportions are different, we will say that there is an association between A and Y conditional on L where L is equal to never smoking. Another way to say this is that we will check whether A contains information not already included in L that allows us to predict Y better. So if the correct DAG is one with arrows from L to A and L to Y, but no arrow from A to Y, do we expect to find an association between A, yellow fingers, and Y, lung cancer, in one particular level of L, never smokers? Well, according to this graph, the association between yellow fingers on cancer was a result of yellow fingers being a marker of smoking. Therefore if someone is a never smoker, learning that she has yellow fingers does not provide any additional information regarding the risk of Y. Think of it in this way. If we know that a never smoker has a 1% chance of developing cancer, then learning that she has yellow fingers does not change that number. She still has a 1% chance of developing cancer. OK, one more time. Learning that someone has yellow fingers when we already know she’s not a smoker does not provide any additional information regarding her risk of lung cancer. It just makes us wonder why she has yellow fingers. She may be a painter or something. But that is not associated with the risk of lung cancer. Now, there’s nothing special about the subset of never smokers. The same rationale applies to other subsets of the population defined by different levels of smoking. We said that L was a binary variable that can take values 1 or 0. But in practice, we’ll have variables that can take many different values. What happens if we condition on each of those values? Well we still be able to say that there is no association between A and Y? Well, let’s say that L is cigarette smoking, but can take values from, say, 0, no smoking, to 40, being a smoker of 40 cigarettes per day. What if we condition on the value of L equals to 40? We will say that there is no association between A and Y, between yellow fingers and lung cancer among people who smoke 40 cigarettes per day. Let’s think about this. If we know that a smoker of 40 cigarettes per day has a 10% chance of developing cancer, then learning that she has yellow fingers does not change that number. She still has a 10% chance of cancer. Therefore, there is no conditional association between A and Y within levels of L regardless of which level of L we condition on. We can condition on L equal to never smoking or L to 40 cigarettes, and there is no association in either case. This is another example of why DAGs are both causal and statistical models. We used our expert knowledge to draw a causal graph with arrows from L to A, and from L to Y, and with no arrow from A to Y. And that implied a statistical model that says that A and Y are independent conditional on L, that they are not associated with levels of L. More generally, graph theory gives us a rule that says that the flow of association between A and Y is interrupted when we condition on their common cause, L. The box around L blocks the association between A and Y. So there is no arrow from A to Y, we say that there is no association between A and Y conditional on L. And this simple graphical rule is very important for causal inference because if conditioning on L blocks the flow on association between A and Y, then conditioning on L is a way to fight confounding. And we will have a full lesson of confounding, but first we need to learn a few more things about DAGs. Now we are ready to discuss another causal structure.

sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252   
## [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.1252    
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
## [1] bindrcpp_0.2.2   ggdag_0.1.0      ggplot2_2.2.1    DiagrammeR_1.0.0
## 
## loaded via a namespace (and not attached):
##  [1] ggrepel_0.8.0      Rcpp_0.12.17       tidyr_0.8.1       
##  [4] visNetwork_2.0.3   assertthat_0.2.0   rprojroot_1.3-2   
##  [7] digest_0.6.15      V8_1.5             ggforce_0.1.2     
## [10] R6_2.2.2           plyr_1.8.4         backports_1.1.2   
## [13] evaluate_0.10.1    blogdown_0.6       pillar_1.2.3      
## [16] rlang_0.2.1        lazyeval_0.2.1     curl_3.2          
## [19] rstudioapi_0.7     rmarkdown_1.9      labeling_0.3      
## [22] downloader_0.4     readr_1.1.1        udunits2_0.13     
## [25] stringr_1.3.1      htmlwidgets_1.2    igraph_1.2.1      
## [28] munsell_0.4.3      compiler_3.4.4     influenceR_0.1.0  
## [31] rgexf_0.15.3       xfun_0.1           pkgconfig_2.0.1   
## [34] htmltools_0.3.6    tidyselect_0.2.4   tibble_1.4.2      
## [37] gridExtra_2.3      bookdown_0.7       codetools_0.2-15  
## [40] XML_3.98-1.11      viridisLite_0.3.0  dplyr_0.7.5       
## [43] MASS_7.3-50        grid_3.4.4         jsonlite_1.5      
## [46] dagitty_0.2-2      gtable_0.2.0       magrittr_1.5      
## [49] units_0.5-1        scales_0.5.0       stringi_1.1.7     
## [52] viridis_0.5.1      brew_1.0-6         boot_1.3-20       
## [55] RColorBrewer_1.1-2 tools_3.4.4        glue_1.2.0        
## [58] tweenr_0.1.5       purrr_0.2.5        hms_0.4.2         
## [61] Rook_1.1-1         ggraph_1.0.1       yaml_2.1.19       
## [64] colorspace_1.3-2   tidygraph_1.1.0    knitr_1.20        
## [67] bindr_0.1.1
## Adding cites for R packages using knitr
knitr::write_bib(.packages(), "packages.bib")
## Warning in citation(pkg, auto = if (pkg == "base") NULL else TRUE): no date
## field in DESCRIPTION file of package 'DiagrammeR'
## Warning in citation(pkg, auto = if (pkg == "base") NULL else TRUE): could
## not determine year for 'DiagrammeR' from package DESCRIPTION file

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.