Linear mixed models for psychological data

or: not letting the statistical tail wag the theoretical dog

Alex Jones & Jeremy Tree

The data

  • This data inspired by papers of mine (Mileva et al., 2016; Childs & Jones, 2022)
  • Broad questions:
    • How does intrasexual competitiveness and cosmetics affect beauty perception?
  • 45 participants, 70 faces, without and with makeup
  • But participants view only a subset of faces
  • Outcome: attractiveness rating (1–7)
  • Participants have a score on ICS, a questionnaire measuring competitiveness

Hypotheses

We have five hypotheses to test in this study.

  • H1. Do cosmetics increase attractiveness?
  • H2. Is the effect of makeup on attractiveness the same for all observers?
  • H3. Is the effect of makeup on attractiveness the same for all faces?
  • H4. For observers, does the ‘makeup effect’ depend on intrasexual competitiveness?
  • H5. Do faces differ in how much they are impacted by intrasexual competitiveness of observers?

We will (attempt) to test these using traditional statistics, and then again with an LMM

H1. Do cosmetics increase attractiveness?

Analyst degrees of freedom — how do we test this?

Traditionally, we have to average across one level — face or rater — to even start.

We go from this…

pid faceid ICS attr cosmetics_f
ppt_1 face_4 0.979 3 Cosmetics
ppt_1 face_66 0.979 4 No cosmetics
ppt_1 face_32 0.979 1 Cosmetics
ppt_1 face_21 0.979 3 Cosmetics
ppt_1 face_67 0.979 5 No cosmetics
ppt_1 face_28 0.979 4 No cosmetics
ppt_1 face_33 0.979 3 Cosmetics
ppt_1 face_52 0.979 4 Cosmetics

The full dataset, 26,364 rows, to…

H1. Do cosmetics increase attractiveness?

pid ics No cosmetics Cosmetics
ppt_1 0.979 4.00 3.50
ppt_10 1.396 2.94 3.41
ppt_11 -1.055 3.56 4.87
ppt_12 0.150 3.34 1.94
ppt_13 -0.488 3.59 2.33
ppt_14 -0.257 3.89 6.06
ppt_15 0.744 3.77 4.30
ppt_16 -0.211 3.46 4.85

By-participant averages, N = 45

faceid No cosmetics Cosmetics
face_1 3.32 3.12
face_10 3.09 2.48
face_11 4.57 5.65
face_12 3.96 5.50
face_13 3.59 5.12
face_14 4.67 5.32
face_15 3.33 3.17
face_16 3.37 4.37

By-face averages, N = 70

H1. Do cosmetics increase attractiveness?

Which to choose? By face has more N, so more power? By ppt also allows us to test ICS, which we lose with faces! Either way, a paired-samples t-test should help!

By participant, then by face

estimate statistic p.value parameter conf.low conf.high method alternative
-0.431 -2.71 0.01 44 -0.752 -0.11 Paired t-test two.sided
estimate statistic p.value parameter conf.low conf.high method alternative
-0.386 -4.08 0 69 -0.575 -0.197 Paired t-test two.sided

H1. Do cosmetics increase attractiveness?

A mixed model can do either or both of these tests in a single fit. - Recall a regression with a binary predictor is akin to a t-test! - A mixed model uses the full data, no aggregation

  • “By face” = attr ~ cosmetics + (1|faceid)
  • “By participant” = attr ~ cosmetics + (1|pid)
  • All at once = attr ~ cosmetics + (1|pid) + (1|faceid)

H1. Do cosmetics increase attractiveness?

H1. Do cosmetics increase attractiveness?

While the mean difference is broadly similar, the uncertainty around it is very different. A mixed model loses no information to aggregation, so is more certain about the effect.

In return you get individual-difference measures of each face and participant in the No Cosmetics condition (the intercept!):

H1. Do cosmetics increase attractiveness?

The variances of those distributions are shared with us by the model, and we see how much variance can be partitioned by the different “sources”.

Group Variance SD Proportion
faceid 0.551 0.742 0.217
pid 0.371 0.609 0.146
Residual 1.616 1.271 0.637

H2. Does the ‘makeup effect’ on attractiveness vary across observers?

Are all observers ratings equally affected by cosmetics?

Messy and indirect traditionally. Carry out separate regressions for each person (attr ~ cosmetics) and collect their slopes; then test against zero. Loses all face information and bakes in measurement error - we know all participants are not equal!

attr ~ cosmetics + (1+cosmetics|pid) + (1|faceid) allows the effect of cosmetics to differ for each participant, directly answering the question.

LMM estimate is more conservative for participants with smaller N

H3. Does the ‘makeup effect’ on attractiveness vary across faces?

More simply, is the effect of cosmetics the same for all faces, or does it vary?

Harder problem than for observers! A separate regression is not possible because each face has two ‘versions’. Best we could manage is a correlation between the average score for each face, which removes individual variation. No easy way to test this traditionally, but solved simply by mixed models

attr ~ cosmetics + (1+cosmetics|pid) + (1+cosmetics|faceid)

allows the effect of cosmetics to differ for each face (and observers too).

A free gift

Our model now includes intercepts and slopes for both observers and faces:

  • Intercept — higher baseline rating
  • Slope — change in ratings with cosmetics

The model also estimates the correlation between these effects - Harsher raters are unaffected by cosmetics - More attractive faces get more attractive with cosmetics

H4. Does an observer ‘makeup effect’ depend on intrasexual competitiveness?

An interaction. How to test that traditionally?

  • ANCOVA only controls for ICS, it doesn’t interact with it
  • Solution — chop ICS into subgroups, force into ANOVA?
  • Requires averaging over faces again + an arbitrary decision on how to chop

Median split (2 groups)

Effect DFn DFd F p
ics_group 1 43 2.71 0.107
cosmetics 1 43 8.04 0.007
ics_group:cosmetics 1 43 7.47 0.009

Tertile split (3 groups)

Effect DFn DFd F p
ics_group 2 42 1.45 0.245
cosmetics 1 42 8.11 0.007
ics_group:cosmetics 2 42 3.40 0.043

Interaction is significant under one chop and borderline under the other

The answer depends on how you chop

Forcing a continuous moderator into categories is not good!

H4. Does an observer ‘makeup effect’ depend on intrasexual competitiveness?

Trivial to let a continuous, observer-level variable interact with a face-level variable, with no chopping required - variables are ‘natural’

attr ~ cosmetics + ICS + cosmetics:ICS + (1+cosmetics|pid) + (1+cosmetics|faceid)

The p-value is between the two other approaches!

Unpacking with EMM

While we have clear evidence of the interaction, we can use our mixed model to probe it further through EMM, or ‘simple slopes’ A typical approach is to ‘pick a point’ on one variable and take the difference between the predictions from the points of the other variable.

Concretely, we could pin ICS at low, medium, high (-1, 0, 1 Z-score) and take the difference between the without and with cosmetics conditions to see where the difference holds:

ICS term contrast estimate std.error statistic p.value
-1 cosmetics mean(1) - mean(0) 1.159 0.256 4.52 0.000
0 cosmetics mean(1) - mean(0) 0.767 0.174 4.40 0.000
1 cosmetics mean(1) - mean(0) 0.376 0.232 1.62 0.106

Alternatively, we can compute a simple slope which is the slope of the relationship between our DV and a predictor when we fix another predictor to specific levels.

Here we can estimate the slopes of ICS with attractiveness when we pin cosmetics to without (0) and with (1):

term cosmetics estimate std.error statistic p.value
ICS 0 0.112 0.056 1.98 0.048
ICS 1 -0.280 0.186 -1.50 0.132

Notice only one is borderline significant. But their difference is!

term estimate std.error statistic p.value
b2=b1 -0.392 0.171 -2.28 0.022

H5. Do faces differ in how much they are impacted by intrasexual competitiveness of observers?

  • Take a moment to consider the ‘fixed effects’

  • attr ~ cosmetics + ICS + cosmetics:ICS

  • and random

  • (1 + cosmetics|faceid) + (1 + cosmetics|pid)

  • The simple rule of thumb is that a fixed effect without random effects is the same across units (here ICS).

  • Relaxing that assumption allows substantial flexibility in analysis.

  • ICS is an observer level score; and so each face could be affected differently by ICS, rather than the effect being constant.

  • We can let the effect of ICS vary for faces - some faces are more impacted by observer ICS than others.
  • attr ~ cosmetics + ICS + cosmetics:ICS + (1+ICS+cosmetics|faceid) + (1+cosmetics|pid)
  • Genuinely no clear way to test this outside of LMMs
SD of face-level ICS slope: 0.507
Correlation estimate
Baseline ↔︎ Cosmetics effect 0.652
Baseline ↔︎ ICS effect -0.220
Cosmetics effect ↔︎ ICS effect -0.482
  • A higher baseline attractiveness is correlated with lower slopes, that is, as ICS goes up, ratings go down
  • A higher cosmetics slope (boost from makeup) is correlated with lower slopes, that is, as ICS goes up, ratings go down
  • LMM’s allow us to test complex questions and allow more realistic flexibility

  • Random effects change conclusions - things change with and without them
  • Without face-level ICS slope
Term Estimate SE p
(Intercept) 3.556 0.074 0.000
cosmetics 0.447 0.174 0.013
ICS 0.112 0.056 0.055
cosmetics:ICS -0.392 0.171 0.027
  • With face-level ICS slope
Term Estimate SE p
(Intercept) 3.555 0.075 0.000
cosmetics 0.459 0.176 0.011
ICS 0.111 0.082 0.179
cosmetics:ICS -0.416 0.172 0.020

Ignoring that the ICS effect varies across faces makes its uncertainty look smaller than it is.

Enough talk - lets try another dataset