Linear Mixed Models — Morning Worksheet

pairs with slides 11–16 Stop 1 · It's just a linear model

The tests you already know, as one model

Four quick problems. Each takes a test you've run many times and shows it's a regression in disguise.

Q1 — t-test. You collect attractiveness ratings for two faces. You code Face A as 0 and Face B as 1, then fit a regression: rating ~ 1 + face. The group means come out as:

Face A (coded 0)

mean = 3.40

Face B (coded 1)

mean = 4.10

Without running anything — just from the means — fill in the two regression coefficients:

Intercept (β₀) Slope (β₁)

Now the conceptual point. In this model, what is the slope?

Correct
The intercept is the mean of group 0 (3.40), and the slope is the move from group 0 to group 1 (4.10 − 3.40 = 0.70). That mean difference is identical to what a two-sample t-test gives you — same estimate, same standard error, same p-value. The t-test was a regression all along.

Not quite
The average of the means would be the grand mean — that's not what either coefficient gives you. The intercept is the mean of group 0 (3.40); the slope is the difference between groups (0.70). Try again.

Not quite
A correlation would be the slope only if both variables were z-scored (that's IJALM #1). Here the predictor is a raw 0/1 group code, so the slope is in the original rating units — it's the mean difference. Try again.

Q2 — correlation. You z-score flipper length and z-score body mass, then regress one on the other: massZ ~ 1 + flipperZ. The slope comes out as 0.92. What is the intercept, and what does the slope equal?

Correct
When both variables are z-scored, the intercept is 0 by definition (mean of Y when X is at its mean), and the standardised slope is Pearson's r. A correlation is just a regression in standardised units — same number, same CI, same p-value (IJALM #1).

Not quite
That would be true in raw units. But both variables here are z-scored, so the units are SDs, not grams or mm. Z-scoring forces the intercept to 0 and turns the slope into r.

Not quite
The slope is 0.92 and that is the correlation — but the intercept isn't 0.92. With both variables z-scored the intercept is 0 by definition. Try again.

Q3 — one-way ANOVA. Three penguin species (Adelie, Chinstrap, Gentoo). You dummy-code with Adelie as the reference and regress body mass on the two dummies. The intercept is 3700 and the isGentoo coefficient is 1300. What is the predicted mean mass of a Gentoo?

Gentoo mean (g)

Q4 — factorial ANOVA. You fit mass ~ species * sex. The isGentoo × isMale interaction coefficient is large and positive. In plain words, what does that single coefficient tell you?

Not quite
That's the main effect of species (the isGentoo coefficient), not the interaction. The interaction is specifically about whether the sex effect changes across species.

Correct
An interaction coefficient is exactly that: how much the effect of one predictor (sex) differs across levels of another (species). A positive isGentoo × isMale means the male advantage is bigger in Gentoos than in the reference species. It's just one more column in the design matrix — the factorial ANOVA's interaction term, nothing more exotic.

Not quite
That's the main effect of sex (the isMale coefficient). The interaction asks something different: does the size of the sex effect depend on species?

The point Correlation, t-test, ANOVA, factorial ANOVA — you already know how to run all of these. Each is the same machinery (an intercept plus slopes) wearing a different name. Once you see that, you stop picking tests and start building models.

pairs with slide 20 · use over the coffee break Stop 2 · Spot the dependence

Where do the errors stop being independent?

The GLM assumes residuals are independent. For each study below, decide whether that assumption is safe or broken — and if broken, what's doing the clustering.

Study A You measure reaction time. Each of 30 participants completes 200 trials. You analyse all 6,000 trials in one regression.

Look again
200 trials from the same person are not 200 independent observations — a fast person is fast on all of them. The trials are clustered within participant, so the errors are correlated. Standard errors will be far too small.

Yes
Trials nest within participants. Treating 6,000 trials as independent badly understates uncertainty. This is the textbook case for a random intercept per participant: RT ~ 1 + ... + (1|participant).

Study B A survey of 500 people, each sampled at random from the national population, each answering once. One row per person.

Yes
One independent draw per person, no repeated measures, no nesting — this is the clean case the ordinary GLM was built for. Not every dataset needs a mixed model; this one doesn't.

Look again
Each person contributes exactly one independent observation and they were sampled independently. There's no repeated measurement and no grouping structure to induce correlation. This one is genuinely fine.

Study C 40 participants each rate the same 60 faces. You have 2,400 ratings.

Look again
Two ratings of the same face are correlated (some faces are just more attractive), and two ratings by the same person are correlated (some raters are generous). Both sources are live at once.

Exactly
Ratings cluster by rater and by face simultaneously — and neither nests inside the other. These are crossed random effects: (1|rater) + (1|face). We come back to this on slide 47. Averaging the faces away (the old habit) throws out the face variance and inflates false positives.

Study D You test a reading intervention on 600 pupils. The pupils sit in 30 classrooms, and those classrooms sit in 8 schools.

Look again
Pupils in the same classroom share a teacher and environment; classrooms in the same school share a catchment and ethos. Two pupils from one class are more alike than two pupils from different schools — the residuals move together at two levels.

Yes
This is a nested hierarchy: pupils within classrooms within schools. Each lower unit sits inside exactly one higher unit, which is what distinguishes it from the crossed case in Study C. The structure is (1|school/classroom) — a random intercept for schools and for classrooms-within-schools.

The point The question is always the same: "could I shuffle the residuals freely, or do some of them move together?" Whenever something is measured more than once — same person, same item, same school — they move together.

pairs with slides 27–32 Stop 3 · What's allowed to vary?

Match the description to the model

Back to the sleep study: reaction time over 10 days of sleep restriction, 18 participants. Read each plain-English description and click the model that matches it. (The notation is the answer, not the question — don't worry if it's still unfamiliar.)

1Everyone shares a single baseline RT and a single effect of sleep loss. One line for the whole sample. We ignore that people differ.

2People are allowed to start at different baseline RTs, but sleep loss is assumed to hit everyone at the same rate. Parallel lines, different heights.

3People differ in their baseline RT and in how badly sleep loss affects them — some are robust, some collapse. Different heights and different gradients.

The thing in the brackets is what's allowed to vary, and after the | is what it varies by.
(1|Subject) → the 1 is the intercept, so each subject gets their own baseline. Slope still shared.
(Days|Subject) → now Days (the slope) is in there too, so each subject gets their own gradient as well.
Nothing in brackets → ordinary GLM, one line for everyone. That's complete pooling, and it gives you those tight-but-wrong confidence intervals from slide 27.

The point You don't read the notation, you decode it: "what's allowed to vary, and across what?" Everything after that is just adding ingredients.

pairs with slide 38 Stop 4 · Fixed or random?

Which effect is which?

For each grouping variable, decide whether you'd treat it as a fixed or random effect. The test: are the levels a few specific things you care about individually (fixed), or an exchangeable sample from some larger population whose variance interests you (random)?

Variable 1Drug dose — 4 specific doses you chose: 0, 10, 20, 40 mg.

Fixed
You chose these exact doses and you want a coefficient for each. They aren't a random sample of "all possible doses" — 20 mg means something specific. Few levels, not exchangeable → fixed.

Reconsider
Doses are the slide-38 worked example of why "few, non-exchangeable, you-care-about-each" points to fixed. You'd never want "the variance across doses" — you want the effect of this dose vs that one.

Variable 2Participants — the 100 people who happened to take part.

Reconsider
You don't care about participant #47 specifically — they're a stand-in for the population. Many levels, exchangeable, you want the spread across people → random.

Random
Many exchangeable levels, nothing special about any one person beyond their name, and you care about between-person variance. The defining case for a random effect.

Variable 3Stimulus items — 50 faces drawn from a large face database.

Reconsider
Treating your 50 faces as fixed assumes they are the population — the slide-47 trap. If you'd want the finding to hold for other faces, they're a sample → random.

Random
The rule of thumb from slide 47: "if you'd want it to hold for other faces, faces get a random effect." Ignoring item variance is exactly what inflates false positives (Judd, Westfall & Kenny).

Variable 4Condition — A vs B, the two experimental conditions you designed.

Fixed
Two levels, both of direct interest, not a sample of anything — you want the A-vs-B coefficient. With only two levels you couldn't estimate a variance anyway. Fixed.

Reconsider
A and B are your whole universe of conditions and you care about the difference between them specifically. Two non-exchangeable levels → fixed. (You also can't estimate a variance from two levels.)

The honest caveat These are rules of thumb, not law — slide 38 says so. The borderline cases (a handful of schools, say) are genuine judgement calls, and that's the discussion worth having.

pairs with slides 8, 38–39, 47 Stop 5 · Write the formula

From a design to Wilkinson notation

Now put it together. For each design, write down the Wilkinson formula yourself — on paper or in your head — then check the pieces and reveal the worked answer. Reminder of the grammar: outcome ~ fixed effects + (random part | grouping). The part before the | is what varies; the part after is what it varies across.

Design 1 A researcher measures reaction time. Each of 40 participants completes many trials under two conditions, congruent vs incongruent. Every participant sees both conditions. The researcher wants the overall effect of condition, and is happy to allow participants to differ in baseline RT.

First decide the pieces:

Is condition a fixed or random effect?

Which grouping factor needs a random intercept?

RT ~ 1 + condition + (1 | participant)

condition is fixed — two specific levels you care about (this is the effect you're testing).
participant gets a random intercept — many exchangeable people, and you've said you'll let baselines differ: (1 | participant).
The brief only asked to vary baseline, so no random slope yet. If you wanted the condition effect to vary by person too, you'd write (1 + condition | participant) — that's Design 2.

Design 2 Same study, but now the researcher suspects the congruency effect itself is stronger in some people than others and wants to capture that. (Condition varies within each participant.)

Compared with Design 1, what changes?

RT ~ 1 + condition + (1 + condition | participant)

condition stays fixed — you still want its population effect.
Now condition also goes inside the brackets: (1 + condition | participant) lets both the baseline and the condition effect vary across people.
This is only legitimate because condition varies within each participant (everyone sees both). A predictor that's constant within a group can't have a random slope over that group — slide 39's "random slopes for predictors that vary within those groups."

Design 3 A rating study: every participant rates every face for trustworthiness. There are 50 participants and 80 faces. The researcher wants the finding to generalise both to other people and to other faces, and is modelling only baseline differences (intercepts), not slopes.

How are participants and faces related here?

rating ~ 1 + (1 | participant) + (1 | face)

Because every participant sees every face, the two grouping factors are crossed — you add a separate random intercept for each: (1 | participant) + (1 | face).
Both are random: many exchangeable levels, and you explicitly want to generalise beyond these people and these faces (slide 47's rule of thumb).
There's no fixed predictor in the brief, so the fixed part is just the intercept (1) — you're estimating the grand mean plus two sources of variance. Treating the faces as fixed, or averaging over them, is exactly the false-positive trap from slide 47.

The point Writing the formula is just three questions in order: what's the outcome, which predictors are fixed, and which grouping factors need a random part (and does anything vary within them). Get those three and the notation writes itself.