# Key ideas, terms & concepts in Structural Equation Modeling; Patrick Sturgis (part 2 of 6)

In the first video about structural
equation models I gave some backgrounds to what structural equation modeling is
the historical path that led to its development some of the key ideas and
the ways that it can be applied in social science settings. | In this video
I’m going to talk about some of the key ideas terms and concepts in structural
equation modeling. This is important because SEM is rather different to other
areas of statistics some of the ideas that are important in understanding and
applying SEM are quite unfamiliar and so it’s important to have a grounding
and familiarity with these ideas before we move on to other applications | So in
this video I ‘ll be talking about path diagrams the way that we represent
equations and theories in the form of diagrams in SEMs. I ‘ll talk about the
difference between exogenous and endogenous variables. I’ll talk about the
way that structural equation modeling analyzes not the raw data but the
variance covariance matrix of the variables that were interested in |
I ‘ll talk a little bit about how parameters are estimated using maximum-likelihood
in structural equation modeling and I’ll also go over how we apply what are
called parameter constraints how we we don’t always estimate every parameters
in the model some of the parameters are fixed to values before we start fitting
the model. I ‘ll also talk about how we assess the overall fit of a model in structural
equation models and the importance of the idea of what it called nested models
for assessing model fit and I ‘ll also talk a bit about identification of structural
equation models again that’s something which is linked to model fit and
something that we don’t encounter so much in regression context that many
people are familiar with .| So the first thing I’m going to talk about path diagrams
and path diagram is one one of the reasons why structural equation modeling
is very appealing to many social scientists in particular. This is because
social scientists don’t always have such a strong grounding in mathematics and
are less comfortable with reading complex equations and so on and so path
diagrams are another way of presenting the same information as we can get in a
in an equation but they do this visually and that’s often a clear way of seeing
what is being presented in Equation compared to Greek letters and symbols
and so on. | So if we write our path diagrams correctly then we can read
directly between an equation and a path diagram they tell us exactly the same
things so in this example here we could write a bivariate regression equation in
the usual way where our dependent variable Y is a function of our
independent variable X and we are going to solve
this equation using data and we’re going to solve for the unknown parameter beta
what is the relationship between X and Y. Now we can we can also write that same
information down in the form of path diagram a simple path diagram in
this case so we have here Y is now represented as a rectangle X is also a
rectangle we have an arrow running from x to y in a single direction and we have
a a small circle pointing into y which represents the error term in the equation
and you can see that there’s a b above the line to indicate that the the
parameter represented by the straight single arrow is a regression coefficient
and this is quite clear I think visually in a sense that what the model is
implying at least is that X causes Y and that there is some
coefficient beta which summarizes what that causal effect is and there isn’t an error
in that equation. | There are conventions for path diagrammatic
notation so that we use it consistently. There are some variations in how
different conventions are applied and so on but this is the the general form
where we have a a latent variable is represented in the form of an ellipse
an observed variables some variable that we’ve actually measured in our dataset
directly would be represented as in the last slide using a rectangle. Error
variances are all small circles and this is similar to a measured latent variable it’s a circular form but it’s
actually a small circle now. This indicates that these error terms are also
latent variables but we don’t actually label them there is a kind of unknown or
residual latent variables. We also indicate the relationships between
variables using lines with arrows a curved line
with a double arrows at each end indicates a covariance between two variables we call
this sometimes a non-directional path or unanalysed relationship because this
is used to show that two variables are related to one another but our model does
not specify anything about the direction of that relationship it may be because
it’s not an important part of the theory but we know that the two variables are
associated. | Lastly a straight line with a single arrow one end indicates a
directional path a regression coefficients so we’re saying if we use a
a single headed arrow then we are indicating the direction of the
relationship between two variables in our model. And we can put these basic
symbols together to form more complex models but ones which have a clear
meaning and which can indeed be translated back into the standard
equation notation. | Here are some examples of some quite simple path diagrams here
we’re just looking at measurement models these are confirmatory factor models and
we have here eta1 which is a latent variable shown as an ellipse an
eta1 here is shown to cause 3 observed variables X1 to X3 we can also
think of that as eta1 the latent variable being measured by the observed variables X1 to X3 and at the top of
the diagram we have three error variances e1 to e3 so those are the errors
for each of those equations that eta1 is predicting x1 with some error
is predicting x2 with error and so on. | So that’s a simple path diagram for a
factor model and that could be written as an equation but we are in this
instance using a path diagram. We can extend this to make a slightly more
complicated path diagram now we have two latent variables eta1 and eta2 they are essentially the same diagrams as we saw in the previous slide but now we have two
latent variables and we have six observed variable six variables in
rectangles each one of which has an error term now we’ve also added in here a
curved line with an arrow at each end this is to show that in our model the two latent
variables are correlated with one another we’re not saying anything about
the direction of the relationship between eta1 and eta2 we’re just saying that
we think that there is some kind of relationship between them. | In this path
diagram we’ve now introduced a theoretical statement about the
direction of the relationship between eta1 and eta2 so we no longer have this curved
arrow but we have a straight line with an arrow at one end so what we’re saying
here is that eta2 is a cause of eta1 and this again would be
similar to the first diagram that we saw a bivariate relationship bivariate
regression with eta1 regressed on eta2 and we would then have to solve for
the unknown beta co-efficient above the straight line with the arrow at the end
but this is now a bivariate regression of a latent variable onto another latent variable. | When we are building path diagrams and systems of equations, in structural
equation modeling we need to distinguish between two important kinds of variables
exogenous variables and endogenous variables. Now an endogenous variable as
the name suggests is something which is caused within the system it’s a
variable that has if you like an arrow pointing into it it is a dependent
variable in one or more equations. And exogenous variable on the other hand is
akin to an independent variable in that terminology it’s a variable that is not
caused by anything within the system of equations that we are presenting as our
SEM and that doesn’t mean to say that we we believe that exogenous variables are
in some sense not caused by any other variables it’s simply that within
our own model the variables in the model it doesn’t have any direct calls. Now an
important part of SEM is that variables can be both exogenous and endogenous so we can have an arrow pointing into a variable making it an endogenous and that
variable itself can have an arrow pointing at another variable making it
an exogenous variable in in that limited sense although it’s a now a different
kind of variable because it has a narrow by
pointing into it and an arrow coming out of it and that’s important because that kind
of variable is a mediating variable it’s a variable which through which another
variable has an effect on a third variable. | In this path diagram we can
which we’ve already seen but this path diagram now we can distinguish
what kinds of variables these are we’ve got two exogenous latent variables here. They’re exogenous latent variables
because there is no directional path pointing into either of them. Neither of them therefore has an error term this is just a correlation that we’re seeing here so
these are both exogenous in the model. Again we’ve seen this path diagram we’ve
got a new distinction that we can apply to it now though that eta1 is
endogenous and eta2 is exogenous. eta2 doesn’t have any direct path
going into it it doesn’t have an error term whereas eta1 has an error
term pointing into it because it’s got a directional path running from eta2. |
So a fundamental advantage of using structural equation models is this
ability to represent our theories as diagrams rather than using notation
which many social scientists are less comfortable with. Another if you like
unusual feature of SEM is that in the conventional practice anyway we don’t
analyze the raw data of the observed variables but we analyze the variance
covariance matrix which will denote (S) of those observed variables this is kind
of unusual and somewhat surprising I think the people when they first come
across it that all the data that we need is just the the set of covariances and variances of
the observed variables. As we shall see in later videos some structural equation
models also use that the means of the observed variables in addition to their
variance covariance matrix. So what are we doing with this variance
covariance matrix? Well in broad terms we are trying to summarize (S) the
various covariance matrix of the observed variables by specifying a
simpler underlying structure. So we’re going to specify a model which is in
some ways simpler than simply reproducing S and our model our SEM in
this sense the simpler underlying structure will yield an implied
variance covariance matrix what I mean there is that if our model is true then
the variance covariance matrix that we observed should look like this it should
have these numbers in each of the cells. And again as we’ll see later this
implied matrix can be compared to the one that we’ve actually observed and that
comparison if it’s done properly can tell us something useful about how well
our model is accounting for the data to the extent that the implied and the
observed matrices differ than our theory i.e. our structural equation model is not doing a very good job of telling us how this data were generated. So a variance
covariance matrix probably most people will be more familiar with correlation
matrix but here we’re dealing with unstandardised variables and this matrix
shows six observed variables X1 to X6 and they are in both the columns and in
the rows of this table and the diagonal which is shown in bold indicates the
variance so the covariance of a variable with itself in this case maybe
x1 and x1 that gives us the variance of that variable so covariance of a variance
with itself is its variance and those are shown in bold on the main diagonal. Then
we see in the other cells the covariances which can be negative or
positive in the other cells of this matrix and you’ll observe that the the
top part of the matrix is redundant with the bottom part so we actually only need
the lower part of this matrix. | Now an important aspects of any model
fitting, and structural equation modeling is no different, is the needs to
estimate what the unknown parameters in our models are i.e. the betas what is the relationship between eta1 and eta2 in the population. Now
there are different ways of estimating these parameters in standard regression
modeling we would use ordinary least squares. In structural equation modeling
practices mainly around using a technique called maximum likelihood and
maximum likelihood estimates the unknown model parameters by maximizing the
likelihood which we can denote L of a particular sample of data. Now L is the
likelihood is a mathematical function which is based on the joint probability
of continuous sample observations. So in essence maximum-likelihood finds what
the maximum value of L is for a particular sample of data and it does
that by sort of iterating through using different values for the
unknown parameters until it finds the maximum-likelihood once that maximum has
been found then we have produced the the maximum
likelihood estimates for the unknown parameters. Now maximum-likelihood is
appealing because it is unbiased and efficient and now what those terms mean
are that if we have a large sample then our estimates of the unknown parameters
will be correct they will converge upon the true values
in the population. They are efficient in the sense that no other way of doing
this will give us more precise estimates of those parameters. Now those two
assumptions of being unbiased and efficient do themselves hinge on some
other assumptions one important one is that the data come from a multivariate
normal distribution essentially that requires us to be using continuous
variables so maximum likelihood is less good when we have variables in our datasets that are not continuous and that we have arrows pointing into. In those
situations we need to use different estimators but for now I’ll be focusing
on the simpler case of multivariate normal data and maximum-likelihood. | Now another way in which maximum likelihood is used in SEM is that not just in
the estimation of the unknown parameters but in use of the likelihood the
maximum likelihood. If we take the log of the likelihood for the model then we can use this to
test how well our model fits compared to some more or less restrictive
alternative so maximum likelihood is used in in two ways in SEM one is for
estimation of the unknown parameters and linked to that is use of the log
likelihood to assess how well the model fits the observed data. | Most
areas of statistics that social scientists are familiar with the focus
is very much on estimating the unknown parameters in the model we want to know
what the relationship between X and Y is in the population or possibly what the
conditional association between two variables is and so we we focus on
estimating those unknown parameters It’s also true of course in SEM but in SEM
we have an additional focus which is on fixing or constraining parameters to
particular values before we estimate our model and that’s a bit
unusual for many people. So we can fix model parameters to any values but it
tends to be the case that we will be fixing parameters to be the value 0 or
the value 1 those are the most common parameter constraints that we make in
SEM and I ‘ll come back to why we do that later. But we can also in addition to
fixing parameters to these values we can also constrain model parameters to be
equal to other model parameters so we will still estimate those equivalised
parameters but they have to be estimated so that they are the same the
model applies that constraint on what the parameters are that are
estimated so again that’s something which is quite unusual and we don’t
really see that in many other statistical techniques that we might use
in social science. The main thing that we are using these
parameters constraints for is for the purposes of model identification and I would be
saying some more about that soo. Now I said that we can use the likelihood
of our model to test how well it fits the data by comparing L model with another
model now when we do this the two models that we compare have to
be what is called nested one within the other. So what do we mean by nested?
well it is precisely that one model is a subset of the other or
the parameters in one model or a subset of the parameters in the other
model. Another way of saying this is that if we have two models A and B then
model A is the same as model B but it just adds some additional parameter
restrictions so A is B + parameter restrictions. | To take an example here
then if model B has the form Y=a + b1X1 + b2X2 +e then model A will be nested if it has that same structure but it applies a parameter
constraint to the two beta co-effiients that they are equal. So we
now have this property that model A is the same as model B with an added
parameters constraint it is therefore nested within model B. if we consider a
third model model C though and we now remove X2 from the model and we add
Z2 instead then model C is not nested within model B because it isn’t
just model B+ the parameter restrictions it has a new variables Z
which is not in model B. So these are if you like apple and pear models we
can’t really in any sensible way make comparisons between the fit of model B
and of model C because they include different variables. | So I said something
about model fit already and the fact that it is based on the log of the
likelihood of the model that we’ve estimated and that we can do this
comparison of model fit when the two models are nested this is because if we
take the log of the likelihood for model A from the likelihood for model B the
log of the likelihood then that is itself that number is itself distributed
as Chi square and the Chi square distribution then has a degree of
freedom which is equal to the difference in the degrees of freedom for Model A and
Model B. We can therefore use this Chi square distribution to test the fit of
the the first model to the second model. Now if our value of Chi square
has a P value greater than 0.05 then we will prefer the more parsimonious model Model A because what we’re
saying here in this situation is that the the models are not different with
regard to one another’s likelihood values that we say that the
the likelihoods are essentially the same and that means that we will prefer the
model that has the simpler and is estimating fewer
parameters. So we’re in this case the model B would be our observed data, the
variance covariance matrix, then we’re saying that there is no difference
between the observed and the employed matrices and our model therefore fits the
data well so that’s the essence of the assessment of model fit using Chi square
in structural equation models we can look at the difference in the likelihood for
one model and compare it to the likelihood for a nested model and make a
statistical test of whether one fits the data better than the other. | So the last thing I’m going to talk
about in this video is model identification this is all linked with the things I’ve
parameters to particular values into assessing model fit and so on. So what is
model identification ? Well in conceptual terms we need to have enough known
pieces of information in an equation to produce unique estimates of unknown
parameters and we need unique estimates otherwise we don’t know which ones to
prefer. So to give an example of what we made here by the balance between known
and unknown pieces of information if we look at these two equations the first of
these is unidentified we have X + 2Y=7 So what we would want to do
is to find the unique value that satisfies that equation for y . Now
because we have a balance of knowns and unknowns where X and Y could really take
on many many many different values and they would all be true if you like in
terms of the the equation being correct that equation is unidentified because it
doesn’t enable us to produce unique estimates now if we change that equation
slightly where we we now are not having X an unknown and we make that 3
then we can only have one value for Y which is 2 in a way that will satisfy
that equation so that equation then is identified. Now that is the
essence of what we need to understand about identification is that it’s to do
with the balance between the number of known and unknown pieces of information
in an equation. Now there is something else to know about identification which
is that it says a theoretical property of the model it’s not really linked to
the data as such so we can figure out what the identification status of a
particular model is without having any any data or estimating any parameters
but it’s also true to say that a model can be theoretically identified but
empirically unidentified given a particular set of data. So we are looking
at the balance between the known and the unknown piece of information in our
equations and in SEM the known pieces of information are the variances and
covariances in means if we’re using means in our model of the observed variables these are the known pieces of information. The unknown pieces of
information are the parameters that we want to
estimate in the model. | Now models can have different identification status. A
model can have as we saw a moment ago more unknowns than knowns that means it is
unidentified we can’t produce unique values for the unknown parameters that’s an unidentified model. Other models can
be just identified where the number of knowns is equal to the number of unknowns
we don’t have any what we call over identifying restrictions on the model
and therefore for just identified models we don’t have any likelihood for
the model that we can use to assess its fit. Now most of the models that people
are familiar with using again ordinary least squares regression those kinds of models
are just identified. | The third level of identification status is over identified
models and that’s usually what we are trying to get to and deal with in SEM and
that’s where the number of knowns is greater than the number of unknown
parameters in the model and that means that we can assess the fit of the
model as well as estimating the unknown parameters. | So there are different ways that we can assess
the identification status of a model a very simple one these days with modern
computers is simply to run our model and most software will
tell us what the identification status is of the model even before we fit the
data so it’s quite easy compared to how things were done in the past but
nonetheless it is still useful to have a consideration of the identification
status of a model as it helps us to understand where things might be going
wrong if we have a problem and our model is unidentified working
through it in this way can help us to see why. So here is the accounting rule that can
be used where if we have S=number of observed variables in the model and the number of observed variables in the
model and the number of non-redundant parameters is given by this equations i.e.
1/2 of S times (s +1) again s the number of observed variables and t is the
number of parameters that we are going to estimate in the model number of
unknown parameters so if t is greater than the answer to this equation then
our model is unidentified we have more unknown parameters than we have
non-redundant parameters and if it’s less then we have an over identified model. |
So to give an example of that here is the path diagram that we saw earlier
where we have e1 a latent variable which is measured by or causing 3
observed variables and each of those observed variables has an error variance.
So if we want to find the number of non-redundant parameters we can use our
1/2s (s+1) equation then we have s here is equal to 3 so s ( s+1) is 3(4) that’s 12 if we take half of that that gives us 6 as the number of
non-redundant parameters. Now how many parameters we are trying to estimate with this model ? Well 3 variances one for each error term we’ve got 2 factor
loadings one of them you’ll see there is constrained to 1 so we’re fixing that
loading and that is for identification of the models so we’re we’re not
estimating that factor loading but we are estimating the other two so we have
2 factor loadings and then lastly we have variance for the latent variable so
3+2 +1 is 6 parameters to be estimated which is the same as the number of
non-redundant parameters so we have with this model zero degrees of freedom the
model is just identified. So we can estimate the unknown parameters but we
do not have any way of assessing the fit of this model because it’s just identified
no degrees of freedom. | Now something else that’s important to understand about
identification is that we as the analyst can control to some degree the
identification status of our model so we can do this for a model like the
one that we just saw that’s just identified or model that is under identified
by adding more known pieces of information to the equation or by
removing some unknown pieces of information removing parameters that are
to be estimated and adding constraints. So if we were to constrain two of the
parameters in the model to be equal to one another let’s say we constrain two of the
we’re only estimating one parameter where previously we were estimating two,
so we’ve removed one unknown and gained one degree of freedom. Now we can see
this in this model here where we have added an additional observed variable to
the previous path diagram so now the model is essentially the same but we’ve
got a fourth observed variable X4. We now are estimating an additional factor
known parameters so now if we use our 1/2s (s+1) equation, s is now
4 so 4 (s+1) becomes 20, we take half of that now we have 10
non-redundant parameters in this model and we have 4 + 3 + 1 parameters to
be estimated 8 so 10-8 gives us 2 degrees of freedom so by adding that
fourth observed variable our model is now over identified and we can say
something about the fit of that model to the variance covariance matrix
that we observed. | So in that example we changed the identification status of our
model by adding in more known another known piece of information and other
observed variable. Another way of changing identification status is to
remove unknown parameters. Now here in this example we are now not estimating
the two factor loadings that we were in the first example so you can see
there’s a number 1 next to each of the arrows for the factor loadings so rather
than estimating those we’re saying these are all equal this may not be a very
theoretically meaningful thing to do this isn’t the point at this particular
juncture what we’re showing here is that you can change the identification status
of the model by removing unknown so we’re not estimating these anymore so we
still have six non-redundant parameters but we now are only estimating 4
unknown parameters because we are not estimating any of the factor loadings so
now this model is over identified. | So in this video I have covered some of the
important ideas and concepts that learners will need to take with them into later videos
and applications. These are focused around the use of path diagrams for
representing our theories and their equations. The fact that we analyzed the
variance covariance matrix of the observed variables rather than the raw
data that we use for the most part maximum likelihood estimation which has
quite restrictive assumptions about multivariate normality
but nonetheless is a very useful estimator. It gives us consistent
unbiased and efficient estimates of the unknown model parameters and allows us
to do global tests of the fit of the model to
our data. Those kinds of fit tests are mainly applicable in the context
where models are nested where we can say that one model is a subset of a second
model that it is the same as the second model with some additional parameter
restrictions. And I talked about identification of models and models that
can be under identified just identified and over identified and how we as the analysts can exert some control over
the identification status about our model by removing unknown parameters or adding in
more known parameters.