Using and understanding probit models - Exercises & Answers#

The exercises for this week will ease you into the slightly odd world of probit models. In many ways the logic is the same - we fit the models and interpret the predictions - but since the predictions are probabilities, and a threshold has to be imposed to actually recover a binary outcome, things can be a little trickier. On the whole, these models are a little trickier to understand than a standard GLM, so take care.

1. Iceberg, right ahead!#

Our probit regression practice begins by examining the titanic dataset. This is a genuine dataset that records the full passenger log of the Titanic, with many interesting variables, including whether a passenger survived the disaster or not.

a. Imports#

To get started, import the following packages with their usual aliases: pandas, pingouin, statsmodels.formula.api, seaborn, marginaleffects, and numpy.

Hide code cell source
# Your answer here
import pandas as pd
import pingouin as pg
import statsmodels.formula.api as smf
import seaborn as sns
import marginaleffects as me
import numpy as np

sns.set_style('whitegrid')

b. Loading up data#

Load the Titanic dataset from the following link using pandas into a variable called titanic, then show the top 10 rows.

Data: https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv

The dataset is fairly self explanatory, but you can find a more detailed description here: https://vincentarelbundock.github.io/Rdatasets/doc/Stat2Data/Titanic.html

Hide code cell source
# Your answer here
titanic = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv')
titanic.head(10)
rownames Name PClass Age Sex Survived SexCode
0 1 Allen, Miss Elisabeth Walton 1st 29.00 female 1 1
1 2 Allison, Miss Helen Loraine 1st 2.00 female 0 1
2 3 Allison, Mr Hudson Joshua Creighton 1st 30.00 male 0 0
3 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 1st 25.00 female 0 1
4 5 Allison, Master Hudson Trevor 1st 0.92 male 1 0
5 6 Anderson, Mr Harry 1st 47.00 male 1 0
6 7 Andrews, Miss Kornelia Theodosia 1st 63.00 female 1 1
7 8 Andrews, Mr Thomas, jr 1st 39.00 male 0 0
8 9 Appleton, Mrs Edward Dale (Charlotte Lamson) 1st 58.00 female 1 1
9 10 Artagaveytia, Mr Ramon 1st 71.00 male 0 0

c. A tiny bit of cleaning#

The dataset has a missing data value for one of the PClass entries, where it is recorded as a * symbol.

Remove this by querying the data so that PClass is not equal to ‘*’, and reassign this back to the variable titanic.

Then, drop the missing values in the Age column, and reassign this back to the variable titanic.

Hide code cell source
# Your answer here
titanic = titanic.query('PClass != "*"').dropna(how='any', subset='Age')

d. Estimating survival probabilities#

The sinking of the Titanic is one of the worst maritime disasters in recorded history, with over 1500 deaths. The dataset contains only the passenger manifesto (i.e., crew are not included). Surviving the disaster was not equally likely for all passengers, however - the phrase ‘women and children first’ may come to mind.

Probit regression can help us estimate survival probabilities for the different passenger classes, ages, and sex.

Fit a probit model predicted whether a passenger survived from their sex, age, and class, and print the summary:

Hide code cell source
# Your answer here
titanic_model = smf.probit('Survived ~ PClass + Age + Sex', data=titanic).fit()
titanic_model.summary()
Optimization terminated successfully.
         Current function value: 0.461503
         Iterations 6
Probit Regression Results
Dep. Variable: Survived No. Observations: 756
Model: Probit Df Residuals: 751
Method: MLE Df Model: 4
Date: Fri, 01 Nov 2024 Pseudo R-squ.: 0.3196
Time: 00:54:53 Log-Likelihood: -348.90
converged: True LL-Null: -512.79
Covariance Type: nonrobust LLR p-value: 1.098e-69
coef std err z P>|z| [0.025 0.975]
Intercept 2.1643 0.217 9.967 0.000 1.739 2.590
PClass[T.2nd] -0.7540 0.152 -4.966 0.000 -1.052 -0.456
PClass[T.3rd] -1.4076 0.150 -9.356 0.000 -1.702 -1.113
Sex[T.male] -1.5460 0.112 -13.821 0.000 -1.765 -1.327
Age -0.0221 0.004 -5.130 0.000 -0.031 -0.014

All of the coefficients are significant, but again, we do not want to dwell on interpreting these. Instead, we should focus on the predictions. Generate a set of predictions for each predictor, to see how survival probability is affected. Bonus points if you can plot them as well.

Hide code cell source
# Your answer here
dg = me.datagrid(titanic_model,
                 PClass=['1st', '2nd', '3rd'],
                 Sex=['male', 'female'],
                 Age=np.arange(0, 90))

# Predict
preds = me.predictions(titanic_model, newdata=dg)

# Plot
sns.relplot(data=preds,
            x='Age', y='estimate',
            hue='PClass', col='Sex',
            kind='line')
<seaborn.axisgrid.FacetGrid at 0x148b1c650>
../_images/bad468a0f10199df36a014726d3100d32376e28632f9a888951594e688de4921.png

This story is not apparent at all from the coefficients. Younger individuals had a higher chance of survival, as did women, which is particularly pronounced for higher-class passengers.

Let’s check the importance of these differences.

Use marginaleffects to contrast the differences in survival probability for females and males, regardless of class and age (i.e., the marginal effect of sex).

Hide code cell source
# Your code here
# marginal effects here
display(
    me.predictions(titanic_model,
               newdata=dg,
               by=['Sex'])
)

# Contrast here
me.predictions(titanic_model,
               newdata=dg,
               by=['Sex'],
               hypothesis='pairwise')
shape: (2, 8)
Sexestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"female"0.6372970.03453718.4523950.0inf0.5696050.704989
"male"0.2024820.01593612.7061240.0inf0.1712480.233715
shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"0.4348150.03703611.7403540.0inf0.3622260.507404

What about the marginal effects of passenger class? What are the probabilities of survival for those in each of the three classes, and do they differ significantly from one another?

Hide code cell source
# Your code here
display(
    me.predictions(titanic_model,
                   newdata=dg,
                   by=['PClass'])
)

# Contrast here
me.predictions(titanic_model,
               newdata=dg,
               by=['PClass'],
               hypothesis='pairwise')
shape: (3, 8)
PClassestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"1st"0.611460.0273422.3647250.0inf0.5578740.665046
"2nd"0.4054210.03244312.4962930.0inf0.3418330.469009
"3rd"0.2427870.02400910.112150.0inf0.1957290.289845
shape: (3, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 1 - Row 2"0.2060390.03895.2966361.1796e-723.0152520.1297970.282282
"Row 1 - Row 3"0.3686730.03298711.1764560.0inf0.3040210.433326
"Row 2 - Row 3"0.1626340.0332264.8947879.8412e-719.9546610.0975120.227756

What about the marginal slope of age? How did probability of survival change with a 1-year increase in age, regardless of passenger sex and class? Hint - since you have just one variable of interest here, you should use avg_slopes rather than slopes, which will helpfully average across the results for you.

Hide code cell source
# Your answer here
me.avg_slopes(titanic_model, newdata=dg, variables='Age')
shape: (1, 9)
termcontrastestimatestd_errorstatisticp_values_valueconf_lowconf_high
strstrf64f64f64f64f64f64f64
"Age""mean(dY/dX)"-0.0055770.00087-6.407391.4803e-1032.653374-0.007283-0.003871

e. Individualised survival estimates#

Based on your model, who was more likely to survive the Titanic disaster out of the following three people:

  1. A 10 year old female from the second class

  2. A 30 year old male from first class

  3. A 60 year old female from first class

  4. A 1 year old male from third class

Hint - you may find using datagrid gives you more than what you need here, so you could specify your newdata through a simple DataFrame. Try it and see. Once you obtain the four predictions, you can use a different argument to hypothesis - try using ‘reference’ to see if you can compare all predictions to that of a 10 year old second class female.

Hide code cell source
# Your answer here
# Option 1 - create a dataframe
dg = pd.DataFrame({'Age': [10, 30, 60, 1],
                   'Sex': ['female', 'male', 'female', 'male'],
                   'PClass': ['2nd', '1st', '1st', '3rd']})

# Predict
opt1 = me.predictions(titanic_model, newdata=dg)
display(opt1)

# Opt1 contrast
display(me.predictions(titanic_model, newdata=dg, hypothesis='reference'))
shape: (4, 11)
rowidestimatestd_errorstatisticp_values_valueconf_lowconf_highAgeSexPClass
i32f64f64f64f64f64f64f64i64strstr
00.8828650.02827631.2226810.0inf0.8274440.93828610"female""2nd"
10.4823420.04489310.744140.0inf0.3943520.57033130"male""1st"
20.7993320.04196619.0469150.0inf0.7170790.88158560"female""1st"
30.2085690.0394945.2809841.2849e-722.8918180.1311610.2859761"male""3rd"
shape: (3, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 2 - Row 1"-0.4005240.051269-7.8122495.5511e-1547.356144-0.501008-0.300039
"Row 3 - Row 1"-0.0835330.051703-1.6156230.1061763.235471-0.184870.017804
"Row 4 - Row 1"-0.6742960.043832-15.3837470.0inf-0.760205-0.588388

f. Evaluating the model#

Compute the phi coefficient for the model examine its magnitude. How good is this (higher is better)?

Hide code cell source
# Your answer here
# Get predictions for all datapoints
preds = me.predictions(titanic_model)

# Binarise
binary = np.digitize(preds['estimate'], [.5])

# Correlate
pg.corr(binary, titanic['Survived'])
n r CI95% p-val BF10 power
pearson 756 0.557665 [0.51, 0.6] 5.325949e-63 3.067e+59 1.0

2. The Challenger Disaster#

Probit regression is really morbid, isn’t it? Lets continue with another example.

The Challenger was a space shuttle that exploded shortly after launch in 1986, killing all seven astronauts on board. The cause of the explosion was traced back to a simple mechanical fault in the form of a rubber O-ring in the fuselage. In normal temperatures, these O-rings kept extremely powerful and flammable rocket fuel from the booster, allowing the fuel to enter when needed. In low temperatures, they became brittle, allowing fuel to leak into the booster in greater quantities.

What is very tragic about this example is that the data was there to suggest it all along - engineers had stated that the O-rings failed in low temperatures, but those in charge of the operation pressed forward with the launch date.

The recorded engineer data can be downloaded here: https://vincentarelbundock.github.io/Rdatasets/csv/openintro/orings.csv

And a short description is here: https://vincentarelbundock.github.io/Rdatasets/doc/openintro/orings.html

Read in the dataset into a dataframe called oring and display it.

Hide code cell source
# Your answer here
oring = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/openintro/orings.csv')
oring
rownames mission temperature damaged undamaged
0 1 1 53 5 1
1 2 2 57 1 5
2 3 3 58 1 5
3 4 4 63 1 5
4 5 5 66 0 6
5 6 6 67 0 6
6 7 7 67 0 6
7 8 8 67 0 6
8 9 9 68 0 6
9 10 10 69 0 6
10 11 11 70 1 5
11 12 12 70 0 6
12 13 13 70 1 5
13 14 14 70 0 6
14 15 15 72 0 6
15 16 16 73 0 6
16 17 17 75 0 6
17 18 18 75 1 5
18 19 19 76 0 6
19 20 20 76 0 6
20 21 21 78 0 6
21 22 22 79 0 6
22 23 23 81 0 6

There are two columns of interest here. The first is temperature, displaying the temperature on the day of the test flight, and damaged, which records the number (out of six) of the damaged O-rings. Before we fit a model, create a new column in the dataframe called any_damage which should have a 1 where there is 1 or more in the damaged column, and a zero elsewhere. You already know the function that will do this.

Hide code cell source
# Your answer here
oring['any_damage'] = np.digitize(oring['damaged'], [1])

Once you have done that, fit a simple probit model that predicts any_damage from temperature. Show the summary.

Hide code cell source
# Your answer here
oring_model = smf.probit('any_damage ~ temperature', data=oring).fit()
oring_model.summary()
Optimization terminated successfully.
         Current function value: 0.442994
         Iterations 6
Probit Regression Results
Dep. Variable: any_damage No. Observations: 23
Model: Probit Df Residuals: 21
Method: MLE Df Model: 1
Date: Fri, 01 Nov 2024 Pseudo R-squ.: 0.2791
Time: 00:54:54 Log-Likelihood: -10.189
converged: True LL-Null: -14.134
Covariance Type: nonrobust LLR p-value: 0.004972
coef std err z P>|z| [0.025 0.975]
Intercept 8.7750 4.029 2.178 0.029 0.879 16.671
temperature -0.1351 0.058 -2.314 0.021 -0.250 -0.021

Notice that this is a very small dataset, and the coefficient for temperature is not particularly fairly weak. Compute the phi coefficient for this model.

Hide code cell source
# Your answer here
# Get predictions for all datapoints
preds = me.predictions(oring_model)

# Binarise
binary = np.digitize(preds['estimate'], [.5])

# Correlate
pg.corr(binary, oring['any_damage'])
n r CI95% p-val BF10 power
pearson 23 0.693688 [0.39, 0.86] 0.000242 144.722 0.973117

Plot the predictions of the model, showing how probability of damage changes with temperature.

Hide code cell source
# Your answer here
# Get predictions
preds = me.predictions(oring_model)

sns.lineplot(data=preds,
             x='temperature', y='estimate')
<Axes: xlabel='temperature', ylabel='estimate'>
../_images/812a117d7ff9cc9f5089a89f38bab8602ffe0be4f9171a57ee3879a02c55c158.png

On the morning of the disaster, when the shuttle launched, the temperature was 36 degrees Fahrenheit.

What probability does the model give for any kind of O-ring damage for this event?

Hide code cell source
# Your answer here
me.predictions(oring_model, newdata=me.datagrid(oring_model, temperature=36))
shape: (1, 14)
temperaturerowidestimatestd_errorstatisticp_values_valueconf_lowconf_highrownamesmissiondamagedundamagedany_damage
i64i32f64f64f64f64f64f64f64i64i64i64i64i64
3600.9999540.0003682713.6166810.0inf0.9992321.00067647060

3. Marginally significant meta-scientific statistics#

Effects are either statistically significant, or they are not. Historically, researchers in psychology would refer to hypothesis tests that are just above the .05 threshold as ‘marginally significant’, a practice that is heavily frowned upon these days in light of the replication crisis.

In 2016, Pritschet, Powell, & Horne, 2016 carried out an analysis of just over 1,500 articles in three different areas of psychology (social, cognitive, and developmental), over the past four decades, and examined whether articles used the term ‘marginally significant’. This interesting dataset allows us to essentially examine a cultural trend in psychological science in terms of the language used.

For this exercise, we will fit a probit regression to this data and examine the probability of this phrase appearing over time and in different fields. First, you will need to do some data detective work - follow the link above to the paper, and then head to the OSF page linked in the paper. There, you’ll find an Excel file titled marginals psych science revision_corrections.xlsx. Download this and then place it in the same folder as this notebook. Read it in using pandas, which has a .read_excel function, and show the top 10 rows.

Hide code cell source
# Your answer here
marginals = pd.read_excel('marginals psych science revision_corrections.xlsx')

The first thing we need to do is rename some columns, as we can’t work easily with these column names.

Create a dictionary that has the keys ‘Number of Experiments’ and ‘Marginals Yes/No’ (the existing column names) and the values ‘n_exps’ and ‘marginal_yes’ (the new names). Call it newcols, then use the .rename method on your dataframe to alter the column names. Make sure to overwrite the dataframe so the names actually stick! You may need to refer to Week 1’s content to figure this out.

Hide code cell source
# Your answer here
newcols = {'Number of Experiments': 'n_exps', 'Marginals Yes/No': 'marginal_yes'}
marginals = marginals.rename(columns=newcols)

Next, we need to replace the values in the Field column so they are no longer numbers but represent the actual fields (this makes it easier for us to interpret the outcome, but it is not fully required). You can again make a dictionary that has the numbers 1, 2, and 3 to the values ‘Cognitive’, ‘Developmental’, and ‘Social’. Then you can use the .replace method to recode the values in the Field column. Again, reassign the dataframe so the changes stick!

Hide code cell source
# Your answer here
newfields = {1: 'Cognitive', 2: 'Developmental', 3: 'Social'}
marginals = marginals.replace({'Field': newfields})

Now filter the data so that the number of experiments is 1 or more (there are some papers that don’t have any statistics, e.g. commentaries). Again, reassign the dataframe. As a final step, just take the columns marginal_yes, Year, Field, and n_exps out to use.

Hide code cell source
# Your answer here
marginals = marginals.query('n_exps > 0')
marginals = marginals[['marginal_yes', 'Year', 'Field', 'n_exps']]

Now fit a probit regression to this data that predicts whether a paper contains the phrase ‘marginally significant’ from Year, Field, and number of experiments. This examines whether different fields use the phrase at different rates, whether it has changed over time, and whether papers with more studies are more likely to use it. Call it marginal_mod, and print the summary.

Hide code cell source
# Your answer here
marginal_mod = smf.probit('marginal_yes ~ Year + Field + n_exps', data=marginals).fit()
marginal_mod.summary()
Optimization terminated successfully.
         Current function value: 0.597290
         Iterations 5
Probit Regression Results
Dep. Variable: marginal_yes No. Observations: 1469
Model: Probit Df Residuals: 1464
Method: MLE Df Model: 4
Date: Fri, 01 Nov 2024 Pseudo R-squ.: 0.07556
Time: 00:54:54 Log-Likelihood: -877.42
converged: True LL-Null: -949.13
Covariance Type: nonrobust LLR p-value: 5.202e-30
coef std err z P>|z| [0.025 0.975]
Intercept -40.1238 5.205 -7.709 0.000 -50.325 -29.923
Field[T.Developmental] -0.0067 0.162 -0.041 0.967 -0.323 0.310
Field[T.Social] 0.3672 0.151 2.426 0.015 0.071 0.664
Year 0.0197 0.003 7.525 0.000 0.015 0.025
n_exps 0.1164 0.027 4.303 0.000 0.063 0.169

Generate a set of predictions for this data. This is a little tricky but numpy can help. You will need:

  • Each of the fields

  • Years from 1970 to 2010 (DO NOT TYPE EACH NUMBER OUT INDIVIDUALLY!)

  • The number of experiments, from 1 to 10 (these are what are observed in the data).

Create a datagrid and generate these predictions.

Hide code cell source
# Your answer here
dg = me.datagrid(marginal_mod,
                 Year=np.arange(1970, 2011),
                 n_exps=np.arange(1, 11),
                 Field=['Cognitive', 'Developmental', 'Social'])

Plot these predictions. It should have year on the X axis, the probability of the phrase on the Y, separate lines for the number of experiments, and separate plots for each field (use relplot). Can you add a reference line to the plot at .5?

Hide code cell source
# Your answer here
preds = me.predictions(marginal_mod, newdata=dg)

# Plot
sns.relplot(data=preds,
            x='Year', y='estimate',
            hue='n_exps', col='Field',
            kind='line').refline(y=.5)
<seaborn.axisgrid.FacetGrid at 0x151d4ef10>
../_images/3912501f8e3c73c68d810678cb43be9d2d2d0c670a26d44e057cf3f20444a065.png

Using your datagrid, or by developing one you might need, test the following hypotheses:

  1. Regardless of time or number of experiments, which field is most guilty of using the phrase?

Hide code cell source
# Your answer here
me.predictions(marginal_mod, 
               newdata=dg,
               by=['Field'])
shape: (3, 8)
Fieldestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Cognitive"0.428010.057577.4345881.0481e-1343.1173570.3151740.540845
"Developmental"0.4255790.0478178.9001350.0inf0.3318590.519299
"Social"0.5627630.03521515.9808380.0inf0.4937430.631782
  1. In 2010, what number of experiments in social psychology led to the highest probability of using the phrase? At which point was the highest value statistically significant from others (you can use the ‘sequential’ argument to hypothesis for this!)

Hide code cell source
# Your answer here
me.predictions(marginal_mod,
               newdata=me.datagrid(marginal_mod, 
                                   Year=2010, 
                                   Field='Social', 
                                   n_exps=np.arange(1, 11)),
               hypothesis='sequential')
shape: (9, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"Row 2 - Row 1"0.0461790.0108864.2422340.00002215.4635990.0248440.067515
"Row 3 - Row 2"0.0453370.0106244.2675110.0000215.6265750.0245150.066159
"Row 4 - Row 3"0.0439110.0099664.4061690.00001116.5363030.0243790.063444
"Row 5 - Row 4"0.0419590.0089664.6796070.00000318.408380.0243850.059533
"Row 6 - Row 5"0.0395540.0077025.1353532.8161e-721.7597730.0244580.054651
"Row 7 - Row 6"0.0367860.006275.8672744.4302e-927.7499910.0244980.049075
"Row 8 - Row 7"0.0337520.0047817.0592611.6740e-1239.1198430.0243810.043123
"Row 9 - Row 8"0.0305520.0033789.0444190.0inf0.0239310.037172
"Row 10 - Row 9"0.0272830.00228411.9443060.0inf0.0228060.03176
  1. For each field, what is the difference between the probability of use in 2010 and 2000, averaging over the number of experiments? You may need to run it a few times to compare each field.

Hide code cell source
# Your answer here
me.predictions(marginal_mod,
               newdata=me.datagrid(marginal_mod,
                                   Year=[2000, 2010],
                                   Field=['Cognitive', 'Developmental', 'Social'],
                                   n_exps=np.arange(1, 11)),
               by=['Year', 'Field'], # averages over experiments
               hypothesis='b1=b4' # e.g., for Cognitive, this is the comparison
              )
shape: (1, 8)
termestimatestd_errorstatisticp_values_valueconf_lowconf_high
strf64f64f64f64f64f64f64
"b1=b4"-0.0741310.01057-7.0131932.3295e-1238.643135-0.094849-0.053414
  1. In developmental psychology, what combination of year and number of experiments has a probability significantly grater than .6 of using the phrase?

This is a very tricky one. You will need to specify the right data grid, and rather than specifying a string to hypothesis, you can pass a float value. Then you will need to convert your output to pandas and filter it appropriately - this is difficult, but this is an extremely specific hypothesis.

Nevertheless, this is the kind of science you can do by leveraging linear models.

Hide code cell source
# Your answer here
# Get predictions
ps = me.predictions(marginal_mod,
                    newdata=me.datagrid(marginal_mod,
                                        Field='Developmental',
                                        Year=np.arange(1970, 2011),
                                        n_exps=np.arange(1, 11)),
                    hypothesis=.6)

# Now filter
ps.to_pandas().query('statistic > 0 and p_value < .05')
Field Year n_exps rowid estimate std_error statistic p_value s_value conf_low conf_high marginal_yes
389 Developmental 2008 10 389 0.750526 0.073478 2.048579 0.040503 4.625816 0.606511 0.894540 0
399 Developmental 2009 10 399 0.756750 0.072383 2.165564 0.030345 5.042420 0.614882 0.898619 0
409 Developmental 2010 10 409 0.762890 0.071286 2.285035 0.022311 5.486114 0.623173 0.902607 0
  1. Averaging across all years, for which combination of number of experiments and fields show a significantly-greater-than-.50 chance of using the phrase?

Hide code cell source
# Your answer here
# Get predictions
ps = me.predictions(marginal_mod,
                    newdata=me.datagrid(marginal_mod,
                                        Field='Social',
                                        Year=np.arange(1970, 2011),
                                        n_exps=np.arange(1, 11)),
                    by=['Field', 'n_exps'],
                    hypothesis=.5)

# Now filter
ps.to_pandas().query('statistic > 0 and p_value < .05')
Field n_exps estimate std_error statistic p_value s_value conf_low conf_high
5 Social 6 0.588244 0.042696 2.066821 0.038751 4.689621 0.504562 0.671927
6 Social 7 0.631692 0.050721 2.596420 0.009420 6.730044 0.532281 0.731102
7 Social 8 0.673518 0.057668 3.008925 0.002622 8.575258 0.560491 0.786545
8 Social 9 0.713270 0.063214 3.373783 0.000741 10.397408 0.589373 0.837167
9 Social 10 0.750570 0.067164 3.730709 0.000191 12.354578 0.618931 0.882210

Social psychology papers with 6 or more studies are, across time, most likely to use the phrase!