7 - Linear mixed effects models

7 - Linear mixed effects models#

Dealing with repeated measurements#

This week we expand our modelling repertoire to deal with an important issue in psychology and behavioural science - what happens when you have repeated measurements in your data?

You might remember the assumption of independent errors. This subtle assumption means, simply, that a standard GLM expects each row of the data to be unrelated to the others. If this is the case, then each residual is also independent.

But very often in psychological datasets, we have repeated measurements, where participants are measured multiple times, or give us many responses on different variables. If we ignore this, we will end up biasing our coefficients, and by extension, altering our predictions and how sure we are about them.

Linear mixed effects models allow us to deal with these kinds of data, and allow us to build complex models that allow us to investigate individual differences in a clear fashion when participants give us a lot of repeated data. How they do it can be confusing, but we can work through code-based examples to see how.

We need to import all our usual packages to investigate these models:

# Import what we need
import pandas as pd # dataframes
import seaborn as sns # plots
import statsmodels.formula.api as smf # Models
import marginaleffects as me # marginal effects
import numpy as np # numpy for some functions
import statsmodels.tools.eval_measures as measures

# Set the style for plots
sns.set_style('whitegrid')
sns.set_context('talk')

Professors and repeated measurements#

In an earlier example we investigated the effect of instructor attractiveness on teaching evaluations. At the time, we noted that this dataset actually violated the assumption of independent errors, but carried on in the interests of learning!

Now, we will use an appropriate model to examine this relationship in more detail, and see what happens if we actually meet this assumption - and show how to use them in practice. First lets read in the data and fit a standard GLM to it, which we know is incorrect:

# Read in data
profs = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/AER/TeachingRatings.csv')

# Display data
display(profs.head(), profs.tail())

# Fit a GLM predicting evaluation from sex
glm = smf.ols('eval ~ scale(beauty)', data=profs).fit()
glm.summary(slim=True)

	rownames	minority	age	gender	credits	beauty	eval	division	native	tenure	students	allstudents	prof
0	1	yes	36	female	more	0.289916	4.3	upper	yes	yes	24	43	1
1	2	no	59	male	more	-0.737732	4.5	upper	yes	yes	17	20	2
2	3	no	51	male	more	-0.571984	3.7	upper	yes	yes	55	55	3
3	4	no	40	female	more	-0.677963	4.3	upper	yes	yes	40	46	4
4	5	no	31	female	more	1.509794	4.4	upper	yes	yes	42	48	5

	rownames	minority	age	gender	credits	beauty	eval	division	native	tenure	students	allstudents	prof
458	459	no	32	male	more	1.231394	3.2	lower	yes	yes	9	21	93
459	460	no	32	male	more	1.231394	4.3	upper	yes	yes	52	86	93
460	461	yes	42	female	more	0.420400	3.3	upper	no	yes	52	67	94
461	462	yes	42	female	more	0.420400	3.2	upper	no	yes	54	66	94
462	463	yes	42	female	single	0.420400	4.1	lower	no	yes	28	35	94

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.036
Model:	OLS	Adj. R-squared:	0.034
No. Observations:	463	F-statistic:	17.08
Covariance Type:	nonrobust	Prob (F-statistic):	4.25e-05

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	3.9983	0.025	157.727	0.000	3.948	4.048
scale(beauty)	0.1048	0.025	4.133	0.000	0.055	0.155

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

As before, we see significant effects of both gender and beauty. But if we look at the data display we can see the prof variable, which indicates which professor gave the class and is being evaluated, has multiple rows per professor - so we know our GLM is in trouble!

We can correct this by using the mixedlm command. We can do an awful lot of stuff with this command, fitting complex models with repeated measures data, but first lets see how to make a model aware that there are repeated measures:

# Using a mixed model
mix = smf.mixedlm('eval ~ scale(beauty)', # formula is the same
                  groups='prof', # groups is the variable that 'keeps track' of which row belongs to which professor - or whatever group!
                  data=profs).fit()

mix.summary()

Model:	MixedLM	Dependent Variable:	eval
No. Observations:	463	Method:	REML
No. Groups:	94	Scale:	0.1705
Min. group size:	1	Log-Likelihood:	-321.9997
Max. group size:	13	Converged:	Yes
Mean group size:	4.9

	Coef.	Std.Err.	z	P>\|z\|	[0.025	0.975]
Intercept	3.929	0.045	88.145	0.000	3.841	4.016
scale(beauty)	0.091	0.042	2.147	0.032	0.008	0.174
prof Var	0.139	0.075

We can note several things from this output:

No. Groups - the model has identified there are 94 unique professors in the data.
Min. group size - this means there is at least one professor who had just one rating/course that was evaluated.
Max. group size - this means there is at least one professor who had 13 ratings/courses that were evaluated.
Mean group size - on average, each professor was rated 4.9 times.
Scale - this is the variance of the residuals.
prof Var - there is now a new output! This interesting line tells us how much variation in our evaluation ratings is due to differences amongst professors - neat!

Notice however that the coefficients are slightly different to the GLM - the mixed effects model has slightly different coefficients and wider intervals, and the p-values are different. This is only a simple example but the differences can be much larger in practice!

Is it worth all this extra stress though? If we examine the RMSE of both models we’ll see why it is…

# RMSE for both models
glm_rmse = measures.rmse(glm.model.endog, glm.fittedvalues)
mix_rmse = measures.rmse(mix.model.endog, mix.fittedvalues)
display(f'GLM RMSE: {glm_rmse:.2f}', f'Mixed Model RMSE: {mix_rmse:.2f}')

'GLM RMSE: 0.54'

'Mixed Model RMSE: 0.38'

Random effects - how mixed models work#

How do these models work? What do they do that allows us to incorporate the repeated measures?

They incorporate a clever statistical trick known as ‘random effects’, and these come in two types - a ‘random intercept’, and a ‘random slope’.

Random intercepts#

What is a random intercept? Quite simply, it is an intercept estimated for each specific group in the data. In our case, each professor gets their own intercept. While there is indeed a global intercept, each professor has their own, and they represent - compared to the global intercept - whether the professor is above or below on their teaching evaluations. Lets take a look at them:

# Retrieve the intercepts - they are stored as a dictionary in the `.random_effects` attribute of the model. We can convert them into a DataFrame like so
professor_effects = pd.DataFrame.from_dict(mix.random_effects, orient='index')
display(professor_effects.head())

# histogram of random effects
sns.histplot(professor_effects)

	prof
1	0.028885
2	-0.219941
3	-0.255526
4	0.140606
5	0.204728

<Axes: ylabel='Count'>

../_images/f38066df9d8a9521f2a28e033fdccf0cc1a676d3ed4fa6fe1769e66ac1bd9572.png

You can see they are centred around zero, because they are ‘offset’ from the global intercept. If we wanted to, we could add the global intercept to these, and obtain each professors actual intercept - same information, just represented differently.

The key thing to retain here is that the model is ‘figuring out’ multiple regression lines - in fact, one regression line per professor. They all have the same slope, but they all start from a different intercept point. Here’s a plot to visualise exactly what’s going on:

# Plot all random effects and data
import matplotlib.pyplot as plt
with sns.plotting_context('poster'):

    # Canvas
    fig, ax = plt.subplots(1, 1, figsize=(15, 10))

    # Data scatter
    sns.scatterplot(data=profs, x='beauty', y='eval', color='black', alpha=.1, ax=ax)

    # Add the global line of best fit, which we obtain from predictions
    preds = me.predictions(mix)
    sns.lineplot(data=preds, x='beauty', y='estimate', color='black', linewidth=10, label='Global Regression Line')

    # A bit of trickery to get each professors line!
    for intercept in professor_effects.itertuples():
        ax.plot( 
            (x := np.linspace(profs['beauty'].min(), profs['beauty'].max(), 100)),
            (mix.params['Intercept'] + intercept.prof) + mix.params['scale(beauty)'] * x, 
            linewidth=.5, alpha=.3, color='blue', label='An individual professor'
        )

../_images/fe6b092dd3bd924b80adb08eb2acdda9aaac851dffd731fe562298318e6ef836.png

This is how the model deals with the assumption. It fits separate regression lines for each professor, who all have the same slope for beauty but have different intercepts.

The mathematics behind these models are very elegant, but we need not concern ourselves. We can think of it in two exchangeable ways:

The model figures out a global intercept and then tries its best to provide a best-fitting offset for each person, or
The model gives everyone a single intercept and then averages them together as best it can to get the global intercept.

Random intercepts AND random slopes#

While a random intercept is sufficient for us to fix the assumption of independent errors, there is far more to linear mixed models than just this. In the following example we’ll see how we can use their flexibility to map individual differences with intercepts and slopes. This is where things can get tricky, so we will take it slow.

We are going to examine data from a sleep-deprivation study, that illustrates how sleep restriction (~ 3 hours a night) affects reaction times on a task. Each participants reaction time was measured at baseline (day zero) before they started 9 days of sleep restriction. First, let us read in this data, and take a look at the structure:

# Read in sleep data and look at structure
sleep = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/lme4/sleepstudy.csv')

# Show top 15
sleep.head(15)

	rownames	Reaction	Days	Subject
0	1	249.5600	0	308
1	2	258.7047	1	308
2	3	250.8006	2	308
3	4	321.4398	3	308
4	5	356.8519	4	308
5	6	414.6901	5	308
6	7	382.2038	6	308
7	8	290.1486	7	308
8	9	430.5853	8	308
9	10	466.3535	9	308
10	11	222.7339	0	309
11	12	205.2658	1	309
12	13	202.9778	2	309
13	14	204.7070	3	309
14	15	207.7161	4	309

Each participant thus provides about 11 days worth of reaction time data. If we were to analyse this with a GLM, our inferences and predictions would be incorrect, but a linear mixed model will be of great use. We can in fact plot the variation in reaction time, across days, for each participant, to give us a sense of what we’re trying to do:

# Plot all data
fig = sns.relplot(data=sleep,
                  x='Days', y='Reaction', 
                  col_wrap=6,
                  s=250, color='black',
                  alpha=.5,
                  col='Subject')

../_images/69240505735639f24e9ac0b769384de50597717468ab7148c2cd73daba5069d5.png

As can be seen, there’s a lot of variability here. Lets first use the (incorrect) GLM to predict how reaction time changes with increasing time spent with poor sleep, and see how it looks:

# An incorrect GLM
glm = smf.ols('Reaction ~ Days', data=sleep).fit()

# Get predictions, adding them to the dataset
sleep['glm_predictions'] = glm.fittedvalues# an easy quick way of obtaining just the predictions, no extra bits

# Add the predictions to the plot
fig2 = sns.relplot(data=sleep,
                   x='Days', y='Reaction', 
                   col_wrap=6,
                   s=250, color='black',
                   alpha=.5,
                   col='Subject').map(sns.lineplot, 'Days', 'glm_predictions', color='black').add_legend()

../_images/027d2461425aacc7d3626bfd26e90a6a20c736516429bac6e0eb704b838cef1b.png

# And take a look at the GLM output
glm.summary(slim=True)

OLS Regression Results
Dep. Variable:	Reaction	R-squared:	0.286
Model:	OLS	Adj. R-squared:	0.282
No. Observations:	180	F-statistic:	71.46
Covariance Type:	nonrobust	Prob (F-statistic):	9.89e-15

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	251.4051	6.610	38.033	0.000	238.361	264.449
Days	10.4673	1.238	8.454	0.000	8.024	12.911

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

As can be seen above, the line is exactly the same for each person. The GLM is blind to the fact each person has different data, and so we should not trust the results.

We can add a random intercept to this data as we did for the professor dataset, by telling the mixedlm command the name of the “group” variable - that is, the Subject column. How does that look?

# Fit a mixed model with a random intercept
intercept = smf.mixedlm('Reaction ~ Days', 
                        groups='Subject',
                        data=sleep).fit()

# Get the predictions
sleep['intercept_predictions'] = intercept.fittedvalues

# Add the predictions to the plot
fig2 = (sns.relplot(data=sleep,
                    x='Days', y='Reaction', 
                    col_wrap=6,
                    s=250, color='black',
                    alpha=.5,
                    col='Subject')
         .map(sns.lineplot, 'Days', 'glm_predictions', color='black', alpha=.1, label='GLM')
         .map(sns.lineplot, 'Days', 'intercept_predictions', color='blue', label='Mixed Model - Intercepts')
         .add_legend()
        )

../_images/0f0d8f1a622ffa12082325161106693e1aff1cd960abd6d6d4d077bc73a82a19.png

# And take a look at the model summary
intercept.summary()

Model:	MixedLM	Dependent Variable:	Reaction
No. Observations:	180	Method:	REML
No. Groups:	18	Scale:	960.4568
Min. group size:	10	Log-Likelihood:	-893.2325
Max. group size:	10	Converged:	Yes
Mean group size:	10.0

	Coef.	Std.Err.	z	P>\|z\|	[0.025	0.975]
Intercept	251.405	9.747	25.794	0.000	232.302	270.508
Days	10.467	0.804	13.015	0.000	8.891	12.044
Subject Var	1378.176	17.156

Notice how the intercepts in the graph alter, but not the slopes. Participant 369 has little difference for example - but the varying intercepts allow better predictions for Participant 337 and 352. The model is now ‘aware’ of the differences and has incorporated them. Also notice that the results here are relatively unchanged! The coefficients are broadly similar and are always significant - but the quality of the mixed model predictions are clearly more accurate!

The random slope#

Now for the crowning glory - while we can include random intercepts for all our participants, we can also include a random slope. This means we can let the slope of Day vary for each person (or indeed, any other predictor). This requires us to modify our mixedlm command a little. We now include the re_formula keyword, and pass it the name of the column we’d like to have the slopes for, as shown below. Note that this also includes the random intercepts - only in very rare circumstances would you exclude intercepts and fit only random slopes.

# Our final model
slopes = smf.mixedlm('Reaction ~ Days', 
                     groups='Subject',
                     re_formula='Days', # Where the magic happens
                     data=sleep).fit()

# Extract predictions
sleep['slope_predictions'] = slopes.fittedvalues

# And show
fig3 = (sns.relplot(data=sleep,
                    x='Days', y='Reaction', 
                    col_wrap=6,
                    s=250, color='black',
                    alpha=.5,
                    col='Subject')
        .map(sns.lineplot, 'Days', 'glm_predictions', color='black', alpha=.1, label='GLM')
        .map(sns.lineplot, 'Days', 'intercept_predictions', color='blue', alpha=.1, label='Mixed Model - Intercepts')
        .map(sns.lineplot, 'Days', 'slope_predictions', color='gold', label='Mixed Model - Intercepts + Slopes')
        .add_legend()
        )

../_images/d9d834dbc315d22dc18706f02222c726e998bd4a45fd694dbac37320562a871f.png

# Check the output
slopes.summary()

Model:	MixedLM	Dependent Variable:	Reaction
No. Observations:	180	Method:	REML
No. Groups:	18	Scale:	654.9405
Min. group size:	10	Log-Likelihood:	-871.8141
Max. group size:	10	Converged:	Yes
Mean group size:	10.0

	Coef.	Std.Err.	z	P>\|z\|	[0.025	0.975]
Intercept	251.405	6.825	36.838	0.000	238.029	264.781
Days	10.467	1.546	6.771	0.000	7.438	13.497
Subject Var	612.096	11.881
Subject x Days Cov	9.605	1.821
Days Var	35.072	0.610

The random slopes and intercepts together improve the predictions overall, correcting many of errors that the random intercepts miss. Consider participants 308 and 309 - while the blue line does its best to fit their data, only by altering the slope can we properly account for their reaction time increases. Notice again the output of this model is, for all intents and purposes, the same as in the GLM! The coefficients are still signifcant and similar in size. Does this mean that model is just as good? No! Model structure is far more important than p-values. The RMSE of the models is revealing:

# RMSE
display(
    'GLM RMSE: ', measures.rmse(glm.model.endog, glm.fittedvalues),
    'Mixed Model Intercepts Only RMSE: ', measures.rmse(intercept.model.endog, intercept.fittedvalues),
    'Mixed Model Intercepts + Slopes RMSE: ', measures.rmse(slopes.model.endog, slopes.fittedvalues)
)

'GLM RMSE: '

47.448897509757494

'Mixed Model Intercepts Only RMSE: '

29.41062417621592

'Mixed Model Intercepts + Slopes RMSE: '

23.43805168478719

Note also the extra information present in the output of the model above. The Subject Var and Days Var rows tell us how much variance people alone contribute to the overall reaction time variation, and how much the variation between people there is in how they change over each day, respectively. One is far higher than the other - this suggests that people are more variable in their baseline reaction time than the effect of poor sleep is on changing that. It’s also possible to do very advanced analyses such as correlating the random effects - that is, we can obtain each participants random intercept and slope, and correlate them, like so:

# Obtain random effects in a dataframe
ranefs = pd.DataFrame.from_dict(slopes.random_effects, orient='index')
display(ranefs.head(7),
        ranefs.corr())

	Subject	Days
308	2.258584	9.198969
309	-40.398695	-8.619686
310	-38.960356	-5.448864
330	23.690554	-4.814339
331	22.260257	-3.069902
332	9.039549	-0.272174
333	16.840475	-0.223631

	Subject	Days
Subject	1.000000	0.263547
Days	0.263547	1.000000

sns.scatterplot(x='Subject', y='Days', data=ranefs)

<Axes: xlabel='Subject', ylabel='Days'>

../_images/90c78602ccdebdc6fb0e95a48d1b16617de5a7e40c199baa29343d07a3ac84f4.png

This suggest those with higher starting baselines (i.e., are slower to start with) show bigger increases in their reaction times with more time spent sleep deprived.

Building complexity with several random intercepts and random slopes#

Using linear mixed models to identify individual differences#

We turn now to a more complex example and push the capabilities of statsmodels. About ten years ago I was researching the effects of cosmetics on attractiveness as part of a collaboration with industry. One of the things our partners were interested in was how cosmetics impacted models differently - e.g., do some people get a bigger benefit from makeup than others, and how does that depend on their initial level of attractiveness? e.g. one might imagine more attractive models gain ‘less’ from cosmetics than do less attractive models. We didn’t realise it at the time, but a linear mixed model would allow us to answer this question easily (in fact, it was not so easy to fit these models ten years ago!). The original paper is here.

To test this, we had a set of participants rate around 45 models for attractiveness, photographed with and without cosmetics. Our model is thus very simple, actually much like a t-test - ratings ~ cosmetics - how much does attractiveness change with makeup?

However, there are many sources of variability we didn’t properly control for in this data.

The first is the variability from participants, who will vary in how much they like the models. In fact we simply removed this variability by averaging - a terrible decision in hindsight. We can incorporate this as a random intercept for participants.
The second is the baseline level of attractiveness of each model. That is, how much more or less attractive than the average is each model? This can be incorporated as a random intercept for models - notice now we have two random intercepts!
The third is the effect of cosmetics on each model. That is, how much does the attractiveness of each model change with an application of cosmetics? In our initial analysis we addressed this in a very roundabout, indirect way. But with mixed models, this is simply incorporated as a random slope for cosmetics for each model.

Let’s see if our conclusions hold when we use a proper model. To do this will involve some extra steps to coerce the model to fit. First, lets read in the data and take a look at it:

# Read in data
cosmetics = pd.read_csv('https://raw.githubusercontent.com/alexjonesphd/py4psy2024/refs/heads/main/jones_kramer_2015.csv')

# Show top 10 rows
display(cosmetics.head())

	pid	model_id	cosmetics	rating	cosmetics_code
0	44	1	without	2	0
1	44	2	with	5	1
2	44	3	without	5	0
3	44	4	without	3	0
4	44	5	without	3	0

The pid column represents the participant, model_id is the model identifier, the cosmetics and cosmetics_code columns show whether the instance was presented with/without makeup, and rating is the attractiveness rating assigned by that participant, to that model, in that makeup condition. We can visualise the overall pattern:

# Show overall
sns.pointplot(data=cosmetics, x='cosmetics', y='rating')

<Axes: xlabel='cosmetics', ylabel='rating'>

../_images/1727cde27ceed3d0b5b51bcad674f14823361ade29a5d9e93d957a6719f02204.png

But this ignores the variability in the dataset from models and participants. Lets see how we can fit the structured model.

To do this we must use an approach called variance components, which allows us to directly tell the model which ratings belong to which source (e.g., model, participant). The first step is to create a new variable that is the same number for everyone, that is, our grouping variable is the same for all data - we then can partition it up. This is easy:

# Create a variable with a single number for everyone
cosmetics['grouping'] = 1

# Show
display(cosmetics.head(), cosmetics.tail())

	pid	model_id	cosmetics	rating	cosmetics_code	grouping
0	44	1	without	2	0	1
1	44	2	with	5	1	1
2	44	3	without	5	0	1
3	44	4	without	3	0	1
4	44	5	without	3	0	1

	pid	model_id	cosmetics	rating	cosmetics_code	grouping
2998	30	29	without	3	0	1
2999	30	30	with	4	1	1
3000	30	31	with	5	1	1
3001	30	32	without	4	0	1
3002	30	33	with	5	1	1

Literally every row now has a ‘1’ in it. That’s our grouping variable. Next, we have to use some curious syntax to specify our random effects.

We do this via a dictionary, and we give each source of variability a name. Then we use different expressions to indicate a random intercept or slope.

A random intercept would look like this: 0 + C(variable_name) - we add a ‘C’ around the name to force Python to recognise it as a categorical variable.
A random slope would look like this: 0 + C(variable_name):predictor_name. This is like saying ‘there needs to be a random slope of predictor_name for each individual in the variable_name group’.

We want to have a random intercept for participants and models, and a random slope of cosmetics for each model, so:

# random effects
re = {'pid': '0 + C(pid)', 
      'model_id': '0 + C(model_id)',
      'cosmetics_model': '0 + C(model_id):cosmetics_code'} 

Then we specify them in our model (as well as our fixed grouping factor) like so:

# Fit our fancy model
cosmetics_model = smf.mixedlm('rating ~ cosmetics_code',
                              groups='grouping', # the constant grouping number
                              vc_formula=re, # the dictionary of vc
                              data=cosmetics).fit()

# Show the results
cosmetics_model.summary()

Model:	MixedLM	Dependent Variable:	rating
No. Observations:	3003	Method:	REML
No. Groups:	1	Scale:	1.5169
Min. group size:	3003	Log-Likelihood:	-5083.8173
Max. group size:	3003	Converged:	Yes
Mean group size:	3003.0

	Coef.	Std.Err.	z	P>\|z\|	[0.025	0.975]
Intercept	3.240	0.149	21.809	0.000	2.949	3.531
cosmetics_code	1.162	0.132	8.778	0.000	0.902	1.421
cosmetics_model Var	0.509	0.118
model_id Var	0.549	0.119
pid Var	0.401	0.055

Our model shows the mean difference between cosmetics conditions is significant. The with condition, on average, is about 1.162 units higher than the without.

It’s now possible to check the variance the model calculates to answer part of our initial question. We can add up all the sources of variability in the model:

Cosmetics (due to the random slope)
Model (variation due to baseline attractiveness of models)
PID (variation due to participants having different tastes)
Scale (unexplained variation/error).

If we add those up and then take a single measure and divide-by-the-total, we’ll see how much it contributes:

# Get total variance by adding the vcomp + scale together
total_variance = cosmetics_model.vcomp.sum() + cosmetics_model.scale

# How much is due to makeup altering faces differently?
makeup = 0.509 / total_variance

# How much is due to models varying in attractiveness?
models = 0.549 / total_variance

print(f'Due to makeup: {makeup:.2f}%, due to models: {models:.2f}%')

Due to makeup: 0.17%, due to models: 0.18%

Oh dear. Our last line of the abstract was ‘…However, the effect of individual differences in facial appearance is ultimately more important in perceptions of attractiveness’. That’s flat wrong. It appears both of these things contribute similarly to the overall attractiveness effect!

One additional step is to correlate the intercepts and slopes for each model. Getting these is a bit tricky, but can be done like so:

# Get all random effects
ranefs = pd.DataFrame.from_dict(cosmetics_model.random_effects, orient='columns')

# Show
ranefs

	1
cosmetics_model[C(model_id)[1]:cosmetics_code]	0.361547
cosmetics_model[C(model_id)[2]:cosmetics_code]	0.602184
cosmetics_model[C(model_id)[3]:cosmetics_code]	-0.586605
cosmetics_model[C(model_id)[4]:cosmetics_code]	0.849587
cosmetics_model[C(model_id)[5]:cosmetics_code]	-0.150157
...	...
pid[C(pid)[100]]	-1.020414
pid[C(pid)[103]]	-0.726942
pid[C(pid)[106]]	-0.296678
pid[C(pid)[109]]	0.925090
pid[C(pid)[112]]	-0.103292

157 rows × 1 columns

We’ll have to filter this:

# Get slopes
slopes = ranefs[ranefs.index.str.contains('cosmetics_code')]

# Intercepts
intercepts = ranefs[~ranefs.index.str.contains('pid|cosmetics')]

# Display
display(slopes.head(), intercepts.head())

	1
cosmetics_model[C(model_id)[1]:cosmetics_code]	0.361547
cosmetics_model[C(model_id)[2]:cosmetics_code]	0.602184
cosmetics_model[C(model_id)[3]:cosmetics_code]	-0.586605
cosmetics_model[C(model_id)[4]:cosmetics_code]	0.849587
cosmetics_model[C(model_id)[5]:cosmetics_code]	-0.150157

	1
model_id[C(model_id)[1]]	-0.901041
model_id[C(model_id)[2]]	0.491351
model_id[C(model_id)[3]]	1.245916
model_id[C(model_id)[4]]	0.007135
model_id[C(model_id)[5]]	0.219817

We can correlate these two, but we have to do some extra steps to access the data - this is an annoying part of life!

# Correlate
import pingouin as pg
display(pg.corr(slopes.values[:, 0], intercepts.values[:, 0]))

ax = sns.scatterplot(x=intercepts.values[:, 0], y=slopes.values[:, 0])
ax.set(xlabel='Random intercept\nbaseline level of attractiveness',
       ylabel='Random slope\nchange in attractiveness with cosmetics')

	n	r	CI95%	p-val	BF10	power
pearson	33	-0.369867	[-0.63, -0.03]	0.034126	1.861	0.577048

[Text(0.5, 0, 'Random intercept\nbaseline level of attractiveness'),
 Text(0, 0.5, 'Random slope\nchange in attractiveness with cosmetics')]

../_images/3800bb3528b027a27ea3767455b249d0a808ec6b69c6b70ba0ac552f9bd28d2a.png

This does answer the question our industry partners had - it appears those with greater attractiveness baselines benefit less from cosmetics.

Variance components - is beauty in the eye of the beholder, or a property of faces?#

One clever property of linear mixed models is their ability to figure out how much variance in data comes from various sources. In the previous example we saw that the variation in the ratings of attractiveness with and without cosmetics could be allocated across participants, models, and changes in attractiveness with cosmetics for each model.

As a general rule, if a dataset has repeated measures in a group - whether its participants giving multiple ratings, or faces receiving multiple ratings, then we can use a linear mixed model to tell us how much variability in our dependent variable comes from those sources. These ‘intercept only’ models are very useful, allowing us greater insight into what’s going on in our datasets.

One question psychologists have been interested in for a long time is whether facial attractiveness is in the eye of the beholder, or whether it is a stable feature of faces. Traditional statistical approaches don’t really work to answer this question, but a mixed model allows us to figure out the variation we can attribute to people and to faces. Lets see how we do this.

We’ll borrow a dataset called the London Face Database, an open dataset that contains attractiveness ratings of 102 faces made by 2,513 participants. This should allow us to answer our question. Lets get the data - unfortunately things are a bit slow on such a massive dataset!

# Read in data from link
faces = pd.read_csv('https://figshare.com/ndownloader/files/8542045')
faces.head()

	rater_sex	rater_sexpref	rater_age	X001	X002	X003	X004	X005	X006	X007	...	X137	X138	X139	X140	X141	X142	X143	X144	X172	X173
0	female	either	17.0	3	3	3	3	2	3	5	...	2	3	2	2	2	2	3	5	2	3
1	female	either	17.0	5	2	3	2	1	5	6	...	3	2	3	1	2	1	1	2	1	1
2	female	either	17.1	5	3	4	3	3	4	4	...	5	3	6	1	3	6	3	6	2	6
3	female	either	17.1	4	6	5	5	3	4	5	...	5	4	4	3	5	2	2	4	2	3
4	female	either	17.2	3	4	3	1	1	1	3	...	1	3	5	1	1	1	1	3	1	2

5 rows × 105 columns

This will need reshaping to the ‘long’ format. Each row is a rater, and the columns starting with ‘X’ are the faces. We’ll need to:

Insert a column tracking the rows, which will be our participant ID
Melt the data into a long format

This is done like so:

# Insert a column tracking rows
faces['pid'] = range(len(faces))

# Melt the data, keeping pid, rater_sex, rater_sexpref, and rater_age as identifying variables
faces_long = faces.melt(id_vars=['pid', 'rater_sex', 'rater_sexpref', 'rater_age'],
                        var_name='face',  value_name='rating')

# Display
display(faces_long)

	pid	rater_sex	rater_sexpref	rater_age	face	rating
0	0	female	either	17.0	X001	3
1	1	female	either	17.0	X001	5
2	2	female	either	17.1	X001	5
3	3	female	either	17.1	X001	4
4	4	female	either	17.2	X001	3
...	...	...	...	...	...	...
256321	2508	male	NaN	61.8	X173	3
256322	2509	male	NaN	88.1	X173	3
256323	2510	NaN	men	17.2	X173	1
256324	2511	NaN	women	19.5	X173	3
256325	2512	NaN	NaN	33.4	X173	3

256326 rows × 6 columns

This dataset can now be subjected to a variance partitioning approach. First we add a grouping variable which is the same for everyone, then create our variance components, one for pid and one for face.

# Add grouping
faces_long['grouping'] = 1

# Variance components for faces and pid
vc = {'faces': '0 + C(face)', 
      'pids': '0 + C(pid)'}

# Fit a model with just an intercept, note this is done by ~ 1
intercept_only_model = smf.mixedlm('rating ~ 1',
                                   groups='grouping',
                                   vc_formula=vc,
                                   use_sparse=True,
                                   data=faces_long).fit()

# Summary
intercept_only_model.summary()

Model:	MixedLM	Dependent Variable:	rating
No. Observations:	256326	Method:	REML
No. Groups:	1	Scale:	1.1834
Min. group size:	256326	Log-Likelihood:	-390834.4643
Max. group size:	256326	Converged:	Yes
Mean group size:	256326.0

	Coef.	Std.Err.	z	P>\|z\|	[0.025	0.975]
Intercept	3.019	0.078	38.737	0.000	2.866	3.172
faces Var	0.590	0.076
pids Var	0.706	0.019

From this, we can compute the proportion each source contributes to the total. First, add up the random effect variances and the error variability (the ‘scale’).

# Total variance
total = intercept_only_model.vcomp.sum() + intercept_only_model.scale

# Then divide the vcomps by the total
vcomps = intercept_only_model.vcomp / total
vcomps

array([0.23803889, 0.28481472])

This suggests faces contribute about 24%, and participants - the eye of the beholder - contributes 28%!

Predictions with mixed models#

We can rely on marginaleffects to make predictions from models in the usual way. There is however a philosophical wrinkle when you make predictions with mixed models. Namely, what kind of prediction do you want?

A mixed model can make predictions using only the global coefficient estimates (in the summary tables). These are properly the things we want - given we know X, what is Y on average? But these ignore the sources of variability in the data. For example, a mixed model can make specific predictions about an individual participant better than a GLM can, because it knows the random effect of the participant.

Most of the time, predictions are focused only on the former, global effects. This is what marginaleffects will use when making predictions, and will ignore the random effects:

# Compute the difference in makeup
me.predictions(cosmetics_model, newdata=me.datagrid(cosmetics_code=[1, 0]), hypothesis='b1=b2')

shape: (1, 8)

term	estimate	std_error	statistic	p_value	s_value	conf_low	conf_high
str	f64	f64	f64	f64	f64	f64	f64
"b1=b2"	1.161886	0.132362	8.778118	0.0	inf	0.902462	1.42131