Short problems to work through during the breaks. Each only uses ideas from the slides you've already seen. Click an answer to check it — you'll get told why.
These are talking points, not a test. There are no marks. Have a go, argue with your neighbour, and we'll discuss as a group before moving on.
pairs with slides 11–16Stop 1 · It's just a linear model
The tests you already know, as one model
Four quick problems. Each takes a test you've run many times and shows it's a regression in disguise.
Q1 — t-test. You collect attractiveness ratings for two faces. You code Face A as 0 and Face B as 1, then fit a regression: rating ~ 1 + face. The group means come out as:
Face A (coded 0)
mean = 3.40
Face B (coded 1)
mean = 4.10
Without running anything — just from the means — fill in the two regression coefficients:
Now the conceptual point. In this model, what is the slope?
Correct The intercept is the mean of group 0 (3.40), and the slope is the move from group 0 to group 1 (4.10 − 3.40 = 0.70). That mean difference is identical to what a two-sample t-test gives you — same estimate, same standard error, same p-value. The t-test was a regression all along.
Not quite The average of the means would be the grand mean — that's not what either coefficient gives you. The intercept is the mean of group 0 (3.40); the slope is the difference between groups (0.70). Try again.
Not quite A correlation would be the slope only if both variables were z-scored (that's IJALM #1). Here the predictor is a raw 0/1 group code, so the slope is in the original rating units — it's the mean difference. Try again.
Q2 — correlation. You z-score flipper length and z-score body mass, then regress one on the other: massZ ~ 1 + flipperZ. The slope comes out as 0.92. What is the intercept, and what does the slope equal?
Correct When both variables are z-scored, the intercept is 0 by definition (mean of Y when X is at its mean), and the standardised slope is Pearson's r. A correlation is just a regression in standardised units — same number, same CI, same p-value (IJALM #1).
Not quite That would be true in raw units. But both variables here are z-scored, so the units are SDs, not grams or mm. Z-scoring forces the intercept to 0 and turns the slope into r.
Not quite The slope is 0.92 and that is the correlation — but the intercept isn't 0.92. With both variables z-scored the intercept is 0 by definition. Try again.
Q3 — one-way ANOVA. Three penguin species (Adelie, Chinstrap, Gentoo). You dummy-code with Adelie as the reference and regress body mass on the two dummies. The intercept is 3700 and the isGentoo coefficient is 1300. What is the predicted mean mass of a Gentoo?
Q4 — factorial ANOVA. You fit mass ~ species * sex. The isGentoo × isMale interaction coefficient is large and positive. In plain words, what does that single coefficient tell you?
Not quite That's the main effect of species (the isGentoo coefficient), not the interaction. The interaction is specifically about whether the sex effect changes across species.
Correct An interaction coefficient is exactly that: how much the effect of one predictor (sex) differs across levels of another (species). A positive isGentoo × isMale means the male advantage is bigger in Gentoos than in the reference species. It's just one more column in the design matrix — the factorial ANOVA's interaction term, nothing more exotic.
Not quite That's the main effect of sex (the isMale coefficient). The interaction asks something different: does the size of the sex effect depend on species?
The point
Correlation, t-test, ANOVA, factorial ANOVA — you already know how to run all of these. Each is the same machinery (an intercept plus slopes) wearing a different name. Once you see that, you stop picking tests and start building models.
pairs with slide 20 · use over the coffee breakStop 2 · Spot the dependence
Where do the errors stop being independent?
The GLM assumes residuals are independent. For each study below, decide whether that assumption is safe or broken — and if broken, what's doing the clustering.
Study A
You measure reaction time. Each of 30 participants completes 200 trials. You analyse all 6,000 trials in one regression.
Look again 200 trials from the same person are not 200 independent observations — a fast person is fast on all of them. The trials are clustered within participant, so the errors are correlated. Standard errors will be far too small.
Yes Trials nest within participants. Treating 6,000 trials as independent badly understates uncertainty. This is the textbook case for a random intercept per participant: RT ~ 1 + ... + (1|participant).
Study B
A survey of 500 people, each sampled at random from the national population, each answering once. One row per person.
Yes One independent draw per person, no repeated measures, no nesting — this is the clean case the ordinary GLM was built for. Not every dataset needs a mixed model; this one doesn't.
Look again Each person contributes exactly one independent observation and they were sampled independently. There's no repeated measurement and no grouping structure to induce correlation. This one is genuinely fine.
Study C
40 participants each rate the same 60 faces. You have 2,400 ratings.
Look again Two ratings of the same face are correlated (some faces are just more attractive), and two ratings by the same person are correlated (some raters are generous). Both sources are live at once.
Exactly Ratings cluster by rater and by face simultaneously — and neither nests inside the other. These are crossed random effects: (1|rater) + (1|face). We come back to this on slide 47. Averaging the faces away (the old habit) throws out the face variance and inflates false positives.
Study D
You test a reading intervention on 600 pupils. The pupils sit in 30 classrooms, and those classrooms sit in 8 schools.
Look again Pupils in the same classroom share a teacher and environment; classrooms in the same school share a catchment and ethos. Two pupils from one class are more alike than two pupils from different schools — the residuals move together at two levels.
Yes This is a nested hierarchy: pupils within classrooms within schools. Each lower unit sits inside exactly one higher unit, which is what distinguishes it from the crossed case in Study C. The structure is (1|school/classroom) — a random intercept for schools and for classrooms-within-schools.
The point
The question is always the same: "could I shuffle the residuals freely, or do some of them move together?" Whenever something is measured more than once — same person, same item, same school — they move together.
pairs with slides 27–32Stop 3 · What's allowed to vary?
Match the description to the model
Back to the sleep study: reaction time over 10 days of sleep restriction, 18 participants. Read each plain-English description and click the model that matches it. (The notation is the answer, not the question — don't worry if it's still unfamiliar.)
1Everyone shares a single baseline RT and a single effect of sleep loss. One line for the whole sample. We ignore that people differ.
2People are allowed to start at different baseline RTs, but sleep loss is assumed to hit everyone at the same rate. Parallel lines, different heights.
3People differ in their baseline RT and in how badly sleep loss affects them — some are robust, some collapse. Different heights and different gradients.
The thing in the brackets is what's allowed to vary, and after the | is what it varies by.
(1|Subject) → the 1 is the intercept, so each subject gets their own baseline. Slope still shared.
(Days|Subject) → now Days (the slope) is in there too, so each subject gets their own gradient as well.
Nothing in brackets → ordinary GLM, one line for everyone. That's complete pooling, and it gives you those tight-but-wrong confidence intervals from slide 27.
The point
You don't read the notation, you decode it: "what's allowed to vary, and across what?" Everything after that is just adding ingredients.
pairs with slide 38Stop 4 · Fixed or random?
Which effect is which?
For each grouping variable, decide whether you'd treat it as a fixed or random effect. The test: are the levels a few specific things you care about individually (fixed), or an exchangeable sample from some larger population whose variance interests you (random)?
Variable 1Drug dose — 4 specific doses you chose: 0, 10, 20, 40 mg.
Fixed You chose these exact doses and you want a coefficient for each. They aren't a random sample of "all possible doses" — 20 mg means something specific. Few levels, not exchangeable → fixed.
Reconsider Doses are the slide-38 worked example of why "few, non-exchangeable, you-care-about-each" points to fixed. You'd never want "the variance across doses" — you want the effect of this dose vs that one.
Variable 2Participants — the 100 people who happened to take part.
Reconsider You don't care about participant #47 specifically — they're a stand-in for the population. Many levels, exchangeable, you want the spread across people → random.
Random Many exchangeable levels, nothing special about any one person beyond their name, and you care about between-person variance. The defining case for a random effect.
Variable 3Stimulus items — 50 faces drawn from a large face database.
Reconsider Treating your 50 faces as fixed assumes they are the population — the slide-47 trap. If you'd want the finding to hold for other faces, they're a sample → random.
Random The rule of thumb from slide 47: "if you'd want it to hold for other faces, faces get a random effect." Ignoring item variance is exactly what inflates false positives (Judd, Westfall & Kenny).
Variable 4Condition — A vs B, the two experimental conditions you designed.
Fixed Two levels, both of direct interest, not a sample of anything — you want the A-vs-B coefficient. With only two levels you couldn't estimate a variance anyway. Fixed.
Reconsider A and B are your whole universe of conditions and you care about the difference between them specifically. Two non-exchangeable levels → fixed. (You also can't estimate a variance from two levels.)
The honest caveat
These are rules of thumb, not law — slide 38 says so. The borderline cases (a handful of schools, say) are genuine judgement calls, and that's the discussion worth having.
pairs with slides 8, 38–39, 47Stop 5 · Write the formula
From a design to Wilkinson notation
Now put it together. For each design, write down the Wilkinson formula yourself — on paper or in your head — then check the pieces and reveal the worked answer. Reminder of the grammar: outcome ~ fixed effects + (random part | grouping). The part before the | is what varies; the part after is what it varies across.
Design 1
A researcher measures reaction time. Each of 40 participants completes many trials under two conditions, congruent vs incongruent. Every participant sees both conditions. The researcher wants the overall effect of condition, and is happy to allow participants to differ in baseline RT.
First decide the pieces:
Is condition a fixed or random effect?
Which grouping factor needs a random intercept?
RT ~ 1 + condition + (1 | participant)
condition is fixed — two specific levels you care about (this is the effect you're testing).
participant gets a random intercept — many exchangeable people, and you've said you'll let baselines differ: (1 | participant).
The brief only asked to vary baseline, so no random slope yet. If you wanted the condition effect to vary by person too, you'd write (1 + condition | participant) — that's Design 2.
Design 2
Same study, but now the researcher suspects the congruency effect itself is stronger in some people than others and wants to capture that. (Condition varies within each participant.)
condition stays fixed — you still want its population effect.
Now condition also goes inside the brackets: (1 + condition | participant) lets both the baseline and the condition effect vary across people.
This is only legitimate because condition varies within each participant (everyone sees both). A predictor that's constant within a group can't have a random slope over that group — slide 39's "random slopes for predictors that vary within those groups."
Design 3
A rating study: every participant rates every face for trustworthiness. There are 50 participants and 80 faces. The researcher wants the finding to generalise both to other people and to other faces, and is modelling only baseline differences (intercepts), not slopes.
How are participants and faces related here?
rating ~ 1 + (1 | participant) + (1 | face)
Because every participant sees every face, the two grouping factors are crossed — you add a separate random intercept for each: (1 | participant) + (1 | face).
Both are random: many exchangeable levels, and you explicitly want to generalise beyond these people and these faces (slide 47's rule of thumb).
There's no fixed predictor in the brief, so the fixed part is just the intercept (1) — you're estimating the grand mean plus two sources of variance. Treating the faces as fixed, or averaging over them, is exactly the false-positive trap from slide 47.
The point
Writing the formula is just three questions in order: what's the outcome, which predictors are fixed, and which grouping factors need a random part (and does anything vary within them). Get those three and the notation writes itself.