[MUSIC]. In other modules we’ve talked about

things like regression analysis and experiments to learn about causality. Why don’t we just

stop there, why isn’t that all we need to know? Well, experiments require treatments

to be randomly assigned to units, that’s often not the case. Regression analysis requires

us to measure every possible confounder, that’s often impossible. So what do we do if we can do an experiment

and can’t use regression analysis? Next, we’re going to talk about something called instrumental

variables analysis. It’s one of the oldest and most important ways for learning about

causality using observational data. Now it’s pretty complicated so pay really close attention. There are six steps involved in doing instrumental

variables analysis. Step one, we observe a variable, called the instrument, that is correlated

with the outcome variable. So remember we’ve got our outcome variable: the variable we

are trying to affect. We’ve got our treatment variable: the variable that we are interested

in learning the effect of the treatment on the outcome. Now we have a third variable called the instrument,

and when we look at our data it seems to be that the instrument is correlated with the

outcome, so units that have higher outcome levels of the outcome variable tend to have

higher levels of the instrument for example, or maybe it’s negatively correlated, units

with higher values of the outcome variable tend to have negative values of the instrument. Step two, we assume that the instrument does

not have a causal effect on the outcome variable, so the correlation that we see between the

instrument and the outcome is not because the instrument has a causal effect on the

outcome variable. Instead, that correlation is picking up the effect of some confounding

variable. Step three, we assume that the instrument

does have a causal effect on the treatment variable. So in step two, we assume the instrument

does not have a causal effect on the outcome but it does have a causal effect on the treatment. Step four, we assume that the instrument is

randomly assigned to units or is as-if randomly assigned. Step five, because of step four, the causal

effect of the instrument on the treatment variable is their correlation in the data.

So, here we’re thinking of a new randomized experiment where we randomly assign the instrument

to people and I want to know what’s the causal effect of the instrument on the treatment

variable. So because I randomly assigned it, I know that whatever correlation I see between

the instrument and the treatment in the data is the causal effect of the instrument on

the treatment. Step six, since the instrument is randomly

assigned by step four, it is not correlated with any other possible confounder except

for the treatment. So where does that leave us? We’ve got this

variable called the instrument that’s correlated with the outcome. The instrument doesn’t have

a causal effect on the outcome, so this correlation is not necessarily picking a causal effect

of the instrument. It’s got to be picking up the causal effect of a confounder. The

instrument does have a causal effect on the treatment, so we might be picking up the causal

effect of the treatment on the outcome in this correlation here, but it could be something

else. But now the instrument’s randomly assigned and because of that it can’t be correlated

with any other confounders except the treatment. So we’ve ruled out all possible explanations

for the correlation between the instrument and the outcome except one: that there is

a causal effect of the treatment on the outcome and that’s what we are trying to get at, that

is the essence of instrumental variables analysis. [MUSIC].

great simple explanation!

Thank you so much – Great explanation!

Would you please elaborate the term "confounder", you are frequently using? The material is very helpful.

Thanks so much for making this material and making it public, it has been extremely helpful for my exam review!

I don´t understand why you said that the treatment doesn´t really have an effect on the outcome. It is because it needs an IV to promote an effect?

I've read everywhere else that the IV must be correlated with the independent i.e. treatment variable, yet here you say it should be correlated with the dependent variable? So confused.

This confused me even more. need an example or better explanation.

In the case when IV is positively correlated to Treatment, and Treatment is negatively correlated to Outcome variable, would we still observe correlation between IV and outcome variable?

Is the treatment variable assumed to be dichotomous?

This seems to depend upon randomization to evenly distribute other confounders, which isn't always the case. Randomization works on average, but it can fail in any instant

Wow! This is GREAT!

I have, at least, two questions.

1. In bullets 2, 3, and 4, you use the word "assume". Does this mean we only need to logically explain this assumption?

2. How is this different from mediation analysis of between?

I am hoping for your favorable response. Thanks.