Checking assumptions and including interactions - Exercises & Answers

Checking assumptions and including interactions - Exercises & Answers#

1. Fitting some models and checking our assumptions#

Before we begin, lets import most of the things that we’ll need.

a. Imports#

Import pandas, pingouin, statsmodels.formula.api and seaborn, and also get the eval_measures module from statsmodels so we can access the RMSE function

b. Loading up data#

This week we test our skills on a new dataset, called the ‘Teaching Ratings’ dataset. This dataset contains 463 teaching evaluations of a set of professors at the University of Austin, and the characteristics of their course - and also includes how attractive the professor is perceived to be!

You can find more discussion of the dataset here, but for our purposes, we will consider the following variables described briefly here:

age, the professors age
minority, whether the professor belongs to a non-Caucasian minority.
gender, indicating the professors gender
beauty, a standardised measure of the attractiveness of the professor.
eval, the course evaluation score that ranges from 1 to 5.
native, whether the instructor is a native English speaker or not.
students, number of students who provided evaluations of this professor.
prof, a simple number indicating the professor.

You can find the dataset at this link: https://vincentarelbundock.github.io/Rdatasets/csv/AER/TeachingRatings.csv

Read it into a dataframe called profs and show the top 5 rows.

	rownames	minority	age	gender	credits	beauty	eval	division	native	tenure	students	allstudents	prof
0	1	yes	36	female	more	0.289916	4.3	upper	yes	yes	24	43	1
1	2	no	59	male	more	-0.737732	4.5	upper	yes	yes	17	20	2
2	3	no	51	male	more	-0.571984	3.7	upper	yes	yes	55	55	3
3	4	no	40	female	more	-0.677963	4.3	upper	yes	yes	40	46	4
4	5	no	31	female	more	1.509794	4.4	upper	yes	yes	42	48	5

c. Some graphic exploratory associations#

Teaching evaluations are studied in surprising depths, particularly in fields such as economics, marketing, and in some parts of psychology. Two constant findings are that a) females receive lower ratings, on average, and that b) more physically attractive individuals get higher ratings, an example of the ‘halo effect’.

We will examine these questions with a linear model that we will refine. We will aim to predict course evaluations from gender and beauty.

First, let us check the assumption of linearity and additivity. Using seaborn, make a scatterplot that shows the association between course evaluations and beauty, first colouring each point by whether it belongs to a male or female, and then making a separate plot for each gender - this might require a quick look at last weeks notes, but repetition will solidify these concepts, and allow you to check your first assumptions.

<Axes: xlabel='beauty', ylabel='eval'>

../_images/60b48fffc9b47884257bf46a87872d795245f272bfbc825895b2a3171edd4a01.png

<seaborn.axisgrid.FacetGrid at 0x1527cde10>

../_images/e1f8597d79a24478d69fe054e132ea705a459888fa909ae5f76ba39b998fd961.png

Is the assumption of additivity/linearity OK here? Do you observe relatively linear associations between these variables?

d. Fitting a model and interpreting outputs#

With that assumption in the bag, fit a model that predicts evaluations from beauty and gender. Make sure you scale beauty, but not gender, to make the interpretation a bit simpler. Store it in dataframe called modelA and inspect the summary.

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.066
Model:	OLS	Adj. R-squared:	0.062
No. Observations:	463	F-statistic:	16.33
Covariance Type:	nonrobust	Prob (F-statistic):	1.41e-07

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	3.8838	0.039	100.468	0.000	3.808	3.960
gender[T.male]	0.1978	0.051	3.880	0.000	0.098	0.298
scale(beauty)	0.1171	0.025	4.650	0.000	0.068	0.167

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From this, what is the mean difference in evaluations between female and male professors, if beauty is set to zero? The answer is in the coefficients.

From a glance at the coefficients, a new professor who is two SDs above the mean would have an increase of how much in their teaching ratings?

e. Evaluating the model and checking#

We’ve got some basic answers from our model about the influence of beauty and gender on teaching ratings. We also saw these relationships were relatively linear - so now lets see whether our model is any good.

Produce a histogram and KDE plot of residuals, checking the normality of the errors. Can you also conduct a Shapiro-Wilk test on them to see if they come from a normal distribution?

ShapiroResult(statistic=0.9804073996638387, pvalue=6.764511194804067e-06)

../_images/91b8bdc368eb60ef3d8371566bd11af757c78a62c694eee4aa144b3293ba8925.png

How do these look? Is the test significant?

d. Checking for homoscedasticity#

The residuals look mostly normal, though the test is once again suggesting a problem! Move on now to checking the assumption of homoscedasticity - is the variance in prediction consistent? You will need to make a scatterplot of the predictions and the errors here. Can you also add a line at zero to make the points clear?

<matplotlib.lines.Line2D at 0x15295b0d0>

../_images/789cd00e35c6a34008eea7d16e8413458bfc2b69d66a58f175e1cc84f0b78bf1.png

How does this look? Any issues here?

f. Further assumptions and conceptual issues#

Have a look at the prof column of your profs dataset. This column tells you which rows of the dataset (courses, ratings, etc) belong to a single professor. For example, professor 34 has 13 rows associated with them, as they were evaluated 13 times across different courses. You can count the number of unique values in this column by using something called value_counts(), like so:

profs['prof'].value_counts()

Run this line of code. What does this tell you about the assumption of independence of errors?

prof
  13
  13
  11
  10
  10
      ..
   1
   1
   1
   1
   1
Name: count, Length: 94, dtype: int64

While we’ve done well so far, we’ve absolutely failed this assumption! Each row of the dataset is not independent, with some rows all concerning the same professor. As such we should not trust these results too much - its OK for learning for now, but this would be a bigger issue - and one we will solve later in the course.

It looks as though beauty has an association with course evaluations. If you check the link here for a description of the dataset, how much do you trust the beauty variable - what’s wrong with it? How does this affect our validity and representativeness assumptions?

g. Checking prediction errors#

Finally for the assumptions of this model, can you check the prediction error using RMSE? How ‘off’ on average is this model in terms of course evaluations? Is this good?

0.5355783806218819

2. Expanding the model to improve inferences#

We can now expand the model to answer a more complex, but more realistic question. The initial model examines whether both males and females differ in their evaluations, controlling for attractiveness - remember the coefficient of gender tests the mean difference when other predictors are zero. We also saw that attractiveness is associated with higher course evaluations, altogether.

a. An interaction#

Perhaps we should consider whether the effect of attractiveness on evaluations is different for men and women. To do this, we need an interaction. Fit a model with an interaction between gender and (scaled) beauty; call it modelB.

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.073
Model:	OLS	Adj. R-squared:	0.066
No. Observations:	463	F-statistic:	11.97
Covariance Type:	nonrobust	Prob (F-statistic):	1.47e-07

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	3.8909	0.039	100.337	0.000	3.815	3.967
gender[T.male]	0.1951	0.051	3.834	0.000	0.095	0.295
scale(beauty)	0.0690	0.037	1.862	0.063	-0.004	0.142
scale(beauty):gender[T.male]	0.0888	0.050	1.761	0.079	-0.010	0.188

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

First, can you conduct an F-test to see whether the interaction significantly raises the \(R^2\) compared to modelA?

(3.1010028248372294, 0.07890991920658393, 1.0)

Can you calculate the RMSE and see whether it has improved as a consequence of including the interaction?

0.5337783133573137

Does this model suggest evidence of beauty influence teaching evaluations? Note the differences between the conclusions of modelA and modelB, and consider which model is more representative of the real process of teaching evalutions.

b. Additional controls#

When building models, it is necessary to think carefully about the process you are studying, and remember the mantra that ‘all models are wrong, but some are useful’.

Suppose our initial question is to see how beauty affects teaching evaluations separately for men and women. The interaction certainly allows us to answer that, but the dataset contains many other variables that likely impact teaching evaluations. For example:

minority codes whether the professor belongs to an ethnic minority. There is a long history suggesting things like accents affect teaching evaluations (sadly).
age may affect beauty, since younger people are more attractive, on average.
tenure codes whether a professor has achieved tenure, generally granted after around 7 years of teaching and research experience. In principle, these individuals should be better teachers.

Let us now expand our model to include these predictors. We will scale age since it is a continuous variable, but we will just include minority and tenure. Including these will allow to see if our hypothesised interaction between sex and beauty exists once we account (i.e., set to zero) these other important predictors of teaching ability. Call this modelC.

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.094
Model:	OLS	Adj. R-squared:	0.082
No. Observations:	463	F-statistic:	7.892
Covariance Type:	nonrobust	Prob (F-statistic):	4.18e-08

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	4.0276	0.060	67.262	0.000	3.910	4.145
minority[T.yes]	-0.0837	0.074	-1.138	0.256	-0.228	0.061
tenure[T.yes]	-0.1825	0.062	-2.958	0.003	-0.304	-0.061
gender[T.male]	0.2197	0.053	4.154	0.000	0.116	0.324
scale(age)	-0.0364	0.027	-1.331	0.184	-0.090	0.017
scale(beauty)	0.0842	0.038	2.238	0.026	0.010	0.158
scale(beauty):gender[T.male]	0.0424	0.052	0.813	0.416	-0.060	0.145

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Now check whether this model is ‘significantly’ better than modelB, and then check the RMSE. What does the statistical significance of the test tell you in comparison to the RMSE?

(3.6090171263130095, 0.013388316523354751, 3.0)
0.5275520794476387

What does the model indicate about the effects of beauty now, and the difference between males and females? What about the effects of having tenure and being in a minority?

c. A final challenge in interactions#

The interaction we have examined so far (with additional controls) is an interaction between two variables, one a categorical variable and one a continuous. This is sometimes known as a ‘two-way’ interaction. We can expand this further into an interaction between three variables, just to show you that the GLM is very flexible.

Perhaps we think that rather than just controlling for how old a professor is, we should let age interact with gender and beauty. That is, we hypothesise that the effect of beauty on evaluations depends on both your gender, and your age. In order to know how your beauty affects your evaluations, we need to know your age and gender.

Fitting an interaction with three variables is slightly more complex as you must include what’s called the ‘lower level’ interactions, of which there are three!

beauty:gender (as we’ve targeted)
beauty:age
gender:age

and then you can include the interaction between all three - e.g., beauty:age:gender.

Writing that out would be a lot of work. statsmodels simplifies things a bit for us here. So far we’ve specified interactions with ‘:’, but if we want to include all the lower-level ones in a single go, we can just use ‘*’. So, to make clear, these two things are identical:

eval ~ gender + beauty + gender:beauty and eval ~ gender*beauty

Being able to write out the full formula is a bonus, but it can slow you down, and this version just makes it easier for us. With that in mind, fit a model below that includes minority and tenure as controls, but lets age, gender, and beauty all interact together. Call this modelD, and remember to scale age and beauty.

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.158
Model:	OLS	Adj. R-squared:	0.141
No. Observations:	463	F-statistic:	9.427
Covariance Type:	nonrobust	Prob (F-statistic):	3.42e-13

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	3.9615	0.060	66.202	0.000	3.844	4.079
tenure[T.yes]	-0.1355	0.062	-2.170	0.031	-0.258	-0.013
minority[T.yes]	-0.1045	0.072	-1.442	0.150	-0.247	0.038
gender[T.male]	0.2893	0.053	5.499	0.000	0.186	0.393
scale(age)	-0.1339	0.045	-3.003	0.003	-0.222	-0.046
scale(age):gender[T.male]	0.1601	0.055	2.906	0.004	0.052	0.268
scale(beauty)	0.0358	0.049	0.731	0.465	-0.060	0.132
scale(beauty):gender[T.male]	0.0690	0.062	1.108	0.268	-0.053	0.191
scale(age):scale(beauty)	-0.0414	0.044	-0.934	0.351	-0.129	0.046
scale(age):scale(beauty):gender[T.male]	0.2052	0.056	3.655	0.000	0.095	0.316

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

What does this model say about effect of beauty on evaluations considering age and gender? Can you compare this model to model C, and also assess its RMSE?

(11.416009424414138, 3.1413335349546134e-07, 3.0)
0.5086738302618814

Notice how the F-test suggests a very significant increase in variance explained, while RMSE decreases only a little.

d. Interpreting this interaction#

You can see that the interaction between all three variables (age, gender, beauty) is statistically significant. But what does it mean? You absolutely cannot tell by looking at the coefficients, and this is why in general looking at them is folly. Run the code below to see a plot and see if graphically interpreting the predictions helps.

../_images/a4196a84873ee44d43bb43444c9d925e06008e05cdd53fbcacd05bfd4bb7c02a.png

What does the interaction show you?

Checking assumptions and including interactions - Exercises & Answers

Contents

Checking assumptions and including interactions - Exercises & Answers#

1. Fitting some models and checking our assumptions#

a. Imports#

b. Loading up data#

c. Some graphic exploratory associations#

d. Fitting a model and interpreting outputs#

e. Evaluating the model and checking#

d. Checking for homoscedasticity#

f. Further assumptions and conceptual issues#

g. Checking prediction errors#

2. Expanding the model to improve inferences#

a. An interaction#

b. Additional controls#

c. A final challenge in interactions#

d. Interpreting this interaction#