Contrast coding using linear regression with hypr

Introduction
Then why use linear regression instead of ANOVA?
Linear models
Linear regression
Linear models: Simple regression example
Exercise 1
Linear models: Residuals
Linear models: Categorical differences
Linear models: Multiple regression example
Contrasts: Estimating categorical differences
Exercise 2
Assigning contrasts to factors
Contrasts and hypotheses
Contrasts and hypotheses: Water tank example
Matrix inverse
Hypothesis matrix
The statistical interpretation
Using hypr to verify the hypotheses
Using hypr to generate the contrast matrix
Common contrast coding schemes
Treatment contrasts
Sliding/successive difference contrasts
Helmert contrasts
Sum contrasts / Effects coding
Some rules…
Exercise 3: Using hypr
Exercise 4: Contrasts in action
Exercise 5: Thinking about your own hypotheses
Day 1 recap
What to expect today…
“Good contrasts”
- Centered contrasts
- Orthogonal contrasts
Interactions
Factor interactions
Example 6: Factor interactions
Covariates in hypr
Interactions with covariates
Example 7: Interactions with covariates
Thanks!

Introduction

The most common tools of experimental data analysis in the behavioral sciences are Analysis of variance (ANOVA), linear regression, or variants thereof.

Both are used to model the effects of a set of independent variables (or predictors) on a continuous dependent variable (such as a score, heart rate, income, response times, ratings, …). Basically, you want to explain and/or predict the DV as a function of the IV(s).

Let us consider an example dataset:

The results from an ANOVA may look like this:

summary(aov(Y~A*B, df))

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  15.84   15.84  13.179 0.000873 ***
## B            1  41.53   41.53  34.555 1.01e-06 ***
## A:B          1   0.08    0.08   0.063 0.803633    
## Residuals   36  43.27    1.20                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results can be reproduced exactly in a linear regression:

summary(lm(Y~A*B, df))

## 
## Call:
## lm(formula = Y ~ A * B, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4411 -0.6553 -0.2642  0.8219  2.3071 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.51170    0.17335  89.483  < 2e-16 ***
## A1          -0.62931    0.17335  -3.630 0.000873 ***
## B1          -1.01900    0.17335  -5.878 1.01e-06 ***
## A1:B1       -0.04342    0.17335  -0.250 0.803633    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.096 on 36 degrees of freedom
## Multiple R-squared:  0.5704, Adjusted R-squared:  0.5346 
## F-statistic: 15.93 on 3 and 36 DF,  p-value: 9.349e-07

This is because the $F(n,1)$-distribution is equivalent to the $t^2_{(n)}$ distribution. It is only that straightforward for 2-level IVs/factors as there is no direct equivalence for $F(n,m\neq1)$ in terms of a $t$-distribution.

Then why use linear regression instead of ANOVA?

In an ANOVA, for each IV you generally get an answer to the question whether any of the levels of that IV vary from the grand mean across all of the levels. You do not know how much or which levels exactly differ (with more than 2 levels). And you usually cannot include continuous IVs (covariates) into the model.

To do these things, you would usually have to conduct post-hoc tests after the ANOVA. Without proper correction, this inflates your statistical error and might appear as “questionable research practices” (QRPs) to some reviewers/peers.

In contrast, linear regression naturally …

allows covariates (continuous IVs)
returns effect sizes and p-values in a single model
tests specific a priori hypotheses
tells you where the differences are by “deconstructing” the omnibus F-test
fits a model that may be used to predict outcomes as a function of the IV(s)

Linear models

Linear models explain the relationship between the DV $Y$ and multiple IVs $X_1,X_2,\ldots,X_n$, such as \[Y=\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots+\beta_n X_n+\epsilon\;,\] where $\beta_0$ is the intercept and $\beta_1,\beta_2,\ldots,\beta_n$ are the slopes (of $X_1,X_2,\ldots,X_n$ on $Y$).

The intercept $\beta_0$ is equal to all values of $Y$ for which $x_1=x_2=\ldots=x_n=0$. The residual error is assumed to be normally distributed and zero-centered, $\epsilon\sim\mathcal{N}(0,\sigma^2)$.

There is no conceptual difference between so-called simple and multiple linear regressions, they just differ with respect to number of IVs (one vs. more). There are extensions of this framework to account for dependent and/or hierarchical data, such as linear mixed effects. Those are not within the scope of this workshop but the tools discussed here generally apply there as well.

Linear regression

Linear regression fits the regression weights or coefficients (intercept $\beta_0$ and slopes $\beta_1,\ldots,\beta_n$) of the linear model to the observation ($x_1,\ldots,x_n,y$). It can be used to infer point estimates or confidence intervals for the coefficients as well as p-values for assessing statistical significance.

Model assumptions:

Linearity: The relation between $X$ and $Y$ is linear
Independence: Observations (incl. $Y$ and $X$) are independent, i.e. not sequential, not from the same subject, etc. There are variants such as mixed-effects modeling to handle dependent/hierarchical data
Normality: The residuals ($Y-\hat Y$, not $Y$) should be normally distributed ($Y-\hat Y \sim \mathcal{N}(0,\sigma)$)
Homoscedasticity: The residual variance is equally distributed across the levels/values of $X$

Linear models: Simple regression example

To water my interior plants, I need approx. 1.5L of water every day from my rainwater tank. I know that during one hour of rain, it rises by 2.4L per hour. Therefore, if we want to predict the daily change of the rainwater tank level $Y$ as a function of hours of heavy rain on that day ($X_1$), we could expect the following linear relationship:

\[Y = -1.5 +2.4 X_1 + \epsilon\]

So on a day without any rain ($x_1=0$), we expect the change to be -1.5L, as it only decreases by the daily water use ($\beta_0$). With each hour of rain, however the water level rises by $\beta_1$, so the change on that day may even be positive if more water is deposited than withdrawn.

If we were to record the change in water level $Y$ after 0.0, 1.0, 0.5, 3.0, 1.3, 0.2, 4.2 hours of rain ($X_1$) on 7 different days, we would expect the following values for $Y$:

Day	$x_{1}$	$eta_0$	$eta_{1}x_{1}$	${Y}$
1	0.0	-1.5	0.00	-1.50
2	1.0	-1.5	2.40	0.90
3	0.5	-1.5	1.20	-0.30
4	3.0	-1.5	7.20	5.70
5	1.3	-1.5	3.12	1.62
6	0.2	-1.5	0.48	-1.02
7	4.2	-1.5	10.08	8.58

Knowing the true model, a linear regression should be able to recover the coefficients of the model by fitting it to the simulated observations ($X_1,Y$ from table above).

m_e1 <- lm(Y~1+X1, data = example1)
m_e1

## 
## Call:
## lm(formula = Y ~ 1 + X1, data = example1)
## 
## Coefficients:
## (Intercept)           X1  
##        -1.5          2.4

Indeed, the model returns the intercept $\hat\beta_0=-1.50$, which is the daily water use and the change of water level on days without rain ($X_1=0$), and slope $\hat\beta_1=2.40$, which is the increment of water level with each hour of rain on a given day.

Exercise 1

Head over to maxrabe.com/courses/emsscs. If you have not already done so, follow the instructions to set up an RStudio project for this workshop and install the required packages.

After setting up the RStudio project, download the exercises for day 1 from that same page.

In your R project, open exercise1.r by clicking on the file in the “File” panel on the right (or wherever you moved it). You will find all necessary commands to create the plot and fit the regression to the data.

Fit the model by executing all commands in the script. You should recover the “true” values from the beginning of the file.
Change the “true values” in line 15 (Y = ...). Does your regression recover those as well?

Linear models: Residuals

A linear model is rarely a perfect fit to real data. Our simulated observations from the last slide are an extreme exception: We know the “true” composition of $Y$ and did not simulate the residual error term $\epsilon$. If we do add a normally distributed error on top of the simulations, the observed $Y$ will slightly deviate from the predicted $\hat{Y}$:

Day	$x_{1}$	$eta_0$	$eta_{1}x_{1}$	$\epsilon$	${Y}$	$\hat{y}$
1	0.0	-1.5	0.00	-0.2920852	-1.7920852	-1.9449830
2	1.0	-1.5	2.40	-0.7347356	0.1652644	0.6947592
3	0.5	-1.5	1.20	0.3180176	0.0180176	-0.6251119
4	3.0	-1.5	7.20	0.8146147	6.5146147	5.9742437
5	1.3	-1.5	3.12	-0.0837434	1.5362566	1.4866819
6	0.2	-1.5	0.48	-0.9269293	-1.9469293	-1.4170346
7	4.2	-1.5	10.08	0.2353507	8.8153507	9.1419343

In the plot above, we see the observations as black dots and the true model (without error) as the dashed black line. The fitted model is blue. Not all dots are close to the fitted line, due to the residual variance. The difference between the fitted and observed values are visualized as red lines. Due to that residual variance, there is a variety of regression lines within 95% confidence that would be close enough to the observations. This region is visualized as the gray ribbon around the regression line.

## 
## Call:
## lm(formula = Y ~ 1 + X1, data = example1a)
## 
## Residuals:
##        1        2        3        4        5        6        7 
##  0.15290 -0.52949  0.64313  0.54037  0.04957 -0.52989 -0.32658 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.9450     0.2833  -6.866    0.001 ** 
## X1            2.6397     0.1377  19.169 7.12e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.529 on 5 degrees of freedom
## Multiple R-squared:  0.9866, Adjusted R-squared:  0.9839 
## F-statistic: 367.4 on 1 and 5 DF,  p-value: 7.124e-06

As we can see, the residual variance has a large impact on the statistical significance of the coefficients. A coefficient is significant if its confidence interval does not contain zero.

We can interpret these results so that both the baseline daily water decrease and the increase by hour of rain are significantly different from zero. If in reality, they were actually zero, an observation this or more extreme simply due to sampling error would be highly unlikely (see p-values).

Linear models: Categorical differences

The real world is not perfect and we do not know the truth. In reality,

we have random residuals ($\epsilon$) due to measurement error etc.
we may not really know the baseline daily water use ($\beta_0$) or water level increment by hour of rain ($\beta_1$), or any true parameter values
some plants do not need water every day but only on Sundays or so…

We can account for categorical differences in a very similar way! For most days of the week, nothing really changes and the regression model is

\[Y = \beta_0 + \beta_1 X_1 + \epsilon\;.\] However, the additional demand on Sundays requires an additional term $\beta_2$,

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 + \epsilon\;.\] Both regression models can be combined to a single model where $X_2=1$ for Sundays and $X_2=0$ for Mondays to Saturdays,

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon\;.\]

Linear models: Multiple regression example

Over 21 days, we observed the water level change $Y$, hours of rain on that day ($X_1$), and whether that day was a Sunday ($X_2=1$) or not ($X_2=0$):

Day	$x_{1}$	$x_{2}$	${Y}$
1	0.23	0	-1.02
2	0.85	0	0.49
3	0.21	0	-0.16
4	0.14	0	-1.09
5	0.68	0	-0.80
6	1.03	0	0.91
7	1.16	1	-0.89
8	0.95	0	0.72
9	0.05	0	-0.98
10	0.37	0	-0.92
11	0.20	0	-0.50
12	0.47	0	-0.20
13	3.26	0	6.12
14	0.18	1	-2.80
15	0.25	0	-1.15
16	1.06	0	0.55
17	0.17	0	-1.69
18	0.16	0	-0.32
19	0.20	0	-1.23
20	0.53	0	0.38
21	0.24	1	-2.85

Plotting the data, we see that the Sunday observations deviate substantially from the regression line:

So we may want to fit another linear model to these observations, \[Y = eta_0 + eta_1 x_1 + eta_2 x_2 + \epsilon\] where we can infer $\hat\beta_0$ as the baseline water demand and the water level change on a day with no rain ($x_1=0$) that is not a Sunday ($x_2=0$). $\hat\beta_1$ is the increase by hour of rain, and $\hat\beta_2$ is the additional baseline water demand on Sundays.

Fitting the model from above to the observed data, we get

m_e2 <- lm(Y~1+X1+X2, data = example2)
summary(m_e2)

## 
## Call:
## lm(formula = Y ~ 1 + X1 + X2, data = example2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92682 -0.29182 -0.01401  0.26876  0.75600 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.3819     0.1367 -10.110 7.54e-09 ***
## X1            2.2188     0.1429  15.522 7.26e-12 ***
## X2           -1.9666     0.2816  -6.984 1.60e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4513 on 18 degrees of freedom
## Multiple R-squared:  0.9431, Adjusted R-squared:  0.9368 
## F-statistic: 149.1 on 2 and 18 DF,  p-value: 6.269e-12

According to the model, we have a daily baseline water use of −1.38 (95% CI: [−1.67, −1.09]). With each hour of rain, we gain 2.22 (95% CI: [1.92, 2.52]) but we need an additional −1.97 (95% CI: [−2.56, −1.37]) on Sundays.

Note that our observation is noisy. The residual has a standard deviation of ≈0.45, which can mean a lot of things, including unaccounted systematic variance or “true” unsystematic error.

Contrasts: Estimating categorical differences

The scheme used to map categorical values (Sunday vs. other day) to numerical values is called a contrast matrix. The factor Day may have two different values: Sunday or not Sunday.

In R, all “factors” (categorical variables) can be assigned a contrast matrix in case that factor is entered as an IV in a regression model. The factor IsSunday of the example2 dataset has the following contrast matrix:

contrasts(example2$IsSunday)

##     Yes
## No    0
## Yes   1

In our particular case, the contrast matrix has exactly two rows, one for each factor level (possible categorical values), and one column for the contrast:

\[ C_{2}=\left(\begin{array}{c} 0\\ 1 \end{array}\right) \]

Instead of “translating” the day of the week into values of 0’s and 1’s, we can use the factor example2$IsSunday with its values No vs. Yes. R will do the job of translating the categorical values into contrast codes (numerical values):

m_e2 <- lm(Y ~ 1 + X1 + IsSunday, data = example2)
summary(m_e2)

## 
## Call:
## lm(formula = Y ~ 1 + X1 + IsSunday, data = example2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92682 -0.29182 -0.01401  0.26876  0.75600 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.3819     0.1367 -10.110 7.54e-09 ***
## X1            2.2188     0.1429  15.522 7.26e-12 ***
## IsSundayYes  -1.9666     0.2816  -6.984 1.60e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4513 on 18 degrees of freedom
## Multiple R-squared:  0.9431, Adjusted R-squared:  0.9368 
## F-statistic: 149.1 on 2 and 18 DF,  p-value: 6.269e-12

Exercise 2

In your R project, open exercise2.r by clicking on the file in the “File” panel on the right (or wherever you moved it). You will find the instructions in the script.

Plot the daily water level change as a function of hours of rain and draw a simple regression line through the observations!
Fit that model to the new dataset. The model is the same as in exercise #1 but the variables/columns have more different names now!
In a second plot, add the residuals and color the observations by day of the week (IsSunday column)!
You should see that Sundays tend to be a substantive bit below the model prediction. In a second model, add IsSunday as a second IV!
Which model is a better fit (unsurprisingly)?

Assigning contrasts to factors

We saw that factors can be entered into the regression model and R “translates” the levels to numerical values. The contrasts(...) command may also be used to assign a different contrast matrix to a factor. For example:

contrasts(example2$IsSunday) <- matrix(c(-1, 1), ncol = 1)
contrasts(example2$IsSunday)

##     [,1]
## No    -1
## Yes    1

This leads to different estimates for $\hat\beta_2$ (slope on $X_2$/IsSunday) and for $\hat\beta_0$ (the intercept),

m_e2c <- lm(Change ~ 1 + HrsRain + IsSunday, data = example2)
summary(m_e2c)

## 
## Call:
## lm(formula = Change ~ 1 + HrsRain + IsSunday, data = example2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92682 -0.29182 -0.01401  0.26876  0.75600 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.3652     0.1621 -14.588 2.05e-11 ***
## HrsRain       2.2188     0.1429  15.522 7.26e-12 ***
## IsSunday1    -0.9833     0.1408  -6.984 1.60e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4513 on 18 degrees of freedom
## Multiple R-squared:  0.9431, Adjusted R-squared:  0.9368 
## F-statistic: 149.1 on 2 and 18 DF,  p-value: 6.269e-12

Contrasts and hypotheses

Remember that each IV (including contrasts) is associated with an estimate and a p-value. The smaller the p-value, the less likely it is that an estimate this or more extreme would occur just due to sampling error if the “true” value is zero. Therefore they are interpreted as significance tests against the null hypothesis $H_0$ and for the alternative $H_1$.

We saw that the estimates themselves, including the slopes on the contrasts and the intercept of the model, apparently depend on the selection of a contrast matrix. You have to understand how your contrasts affect your coefficients in order to interpret the estimates and statistical significance and to defend your results! Making no explicit decision is an implicit decision for the default setting, which may not always fit your research question!

The selection of a contrast matrix determines what differences/effects the coefficients represent and what statistical hypotheses they are testing.

Contrasts and hypotheses: Water tank example

Let us consider only the factor IsSunday of the rain water tank dataset. Ignoring any rainfall, we know that

\[ \begin{array}{rrcr} \mu_\textrm{Sunday}=&\beta_0&+&\beta_2 \\ \mu_\textrm{other}=&\beta_0&& \end{array}\;, \]

where $\mu_\textrm{Sunday}$ is the conditional mean of $Y$ on Sundays (i.e., the average fitted/predicted observation in that group) and $\mu_\textrm{other}$ on other days.

Therefore,

\[ \begin{array}{rrcr} \beta_0=&& &\mu_\textrm{other} \\ \beta_2=&\mu_\textrm{Sunday}&-&\mu_\textrm{other} \end{array} \]

The two different representations may also be written in matrix form:

\[ \left(\begin{array}{c} {\mu}_{{\rm Sunday}}\\ {\mu}_{{\rm other}} \end{array}\right)=\left(\begin{array}{rr} 1 \cdot \beta_0 & + 1 \cdot \beta_2\\ 1 \cdot \beta_0 & + 0 \end{array}\right)=\left(\begin{array}{rr} 1 & 1\\ 1 & 0 \end{array}\right)\left(\begin{array}{c} \beta_{0}\\ \beta_{2} \end{array}\right) \\ \left(\begin{array}{c} \beta_{0}\\ \beta_{2} \end{array}\right)=\left(\begin{array}{rr} 0 & +1 \cdot {\mu}_{{\rm other}}\\ 1 \cdot {\mu}_{{\rm Sunday}} & -1 \cdot {\mu}_{{\rm other}} \end{array}\right)=\left(\begin{array}{rr} 0 & 1\\ 1 & -1 \end{array}\right)\left(\begin{array}{c} {\mu}_{{\rm Sunday}}\\ {\mu}_{{\rm other}} \end{array}\right) \]

What may not seem obvious is that the two matrices are actually related and can be transformed back and forth as each other’s inverse matrix.

Matrix inverse

In general, a regression formula can be expressed in matrix notation

\[ Y = X \cdot B\;, \]

where $X$ is the contrast matrix, $B$ is the coefficient vector and $Y$ is the dependent variable. As $\hat Y$ is always represented as a weighted sum of means, $\hat Y$ can be thought of as a vector of conditional means.

Using the inverse of the contrast matrix, $X^{+}$, $B$ may be represented as a weighted sum/difference of conditional means:

\[ B=X^{+} \cdot Y \]

Note that the matrix inverse is reversible. So the inverse of the inverse of a matrix ideally returns that matrix again: $(X^{+})^{+}=X$.

For example, if a factor has $N$ levels, $\hat Y$ is a vector of $N$ elements and $X$ is a matrix of $N$ rows and up to $N$ columns (1 intercept column and $N-1$ contrasts), the matrix representation of the conditional means as a function of the contrasts and coefficients looks like this:

\[ \left(\begin{array}{c} {\mu}_{1}\\ {\mu}_{2}\\ \vdots\\ {\mu}_{N} \end{array}\right)=\left(\begin{array}{ccc} 1 & x_{1,2} & \ldots & x_{1,N}\\ 1 & x_{2,2} & \cdots & x_{2,N}\\ \vdots & \vdots & \ddots & \vdots\\ 1 & x_{N,2} & \ldots & x_{N,N} \end{array}\right)\cdot\left(\begin{array}{c} \beta_{0}\\ \beta_{1}\\ \vdots\\ \beta_{N-1} \end{array}\right)=\left(\begin{array}{ccc} \beta_{0} & \beta_{1} x_{1,2} & \ldots & \beta_{N-1} x_{1,N}\\ \beta_{0} & \beta_{1} x_{2,2} & \cdots & \beta_{N-1} x_{2,N}\\ \vdots & \vdots & \ddots & \vdots\\ \beta_{0} & \beta_{1} x_{N,2} & \ldots & \beta_{N-1} x_{N,N} \end{array}\right) \]

And the representation of the coefficients as a function of the conditional means and the inverse contrast matrix looks like this:

\[ \left(\begin{array}{c} \beta_{0}\\ \beta_{1}\\ \vdots\\ \beta_{N-1} \end{array}\right)=\left(\begin{array}{cccc} h_{1,1} & h_{1,2} & \cdots & h_{1,N}\\ h_{2,1} & h_{2,2} & \cdots & h_{2,N} \\ \vdots & \vdots & \ddots & \vdots\\ h_{N,1} & h_{N,2} & \cdots & h_{N,N} \end{array}\right)\cdot\left(\begin{array}{c} {\mu}_{1}\\ {\mu}_{2}\\ \vdots\\ {\mu}_{N} \end{array}\right)=\left(\begin{array}{cccc} {\mu}_{1} h_{1,1} & {\mu}_{2} h_{1,2} & \cdots & {\mu}_{N} h_{1,N}\\ {\mu}_{1} h_{2,1} & {\mu}_{2} h_{2,2} & \cdots & {\mu}_{N} h_{2,N} \\ \vdots & \vdots & \ddots & \vdots\\ {\mu}_{1} h_{N,1} & {\mu}_{2} h_{N,2} & \cdots & {\mu}_{N} h_{N,N} \end{array}\right) \]

Note that the matrix inverse is a more complex mathematical operation than the inverse of a single scalar value, even though it does simplify to that if the matrix is a single number.

Hypothesis matrix

Given that the coefficients may be represented like this, we can also call the inverse contrast matrix the hypothesis matrix.

For the rainwater tank example, the matrix representation of $Y$ is as follows:

\[ \left(\begin{array}{c} {\mu}_{{\rm Sunday}}\\ {\mu}_{{\rm other}} \end{array}\right)=\left(\begin{array}{cc} 1 & 1\\ 1 & 0 \end{array}\right)\left(\begin{array}{c} \beta_{0}\\ \beta_{2} \end{array}\right) \]

Changing that formula to a representation of the coefficients, we get:

\[ \left(\begin{array}{c} \beta_{0}\\ \beta_{2} \end{array}\right)=\left(\begin{array}{c} {\mu}_{{\rm Sunday}}\\ {\mu}_{{\rm other}} \end{array}\right)\left(\begin{array}{cc} 1 & 1\\ 1 & 0 \end{array}\right)^{+} \]

We can determine the so-called generalized inverse of $X$, which is the solution for $X^{+}$. If our contrast matrix $X$ is

X <- matrix(c(1, 1, 1, 0), ncol = 2) # values by column
X

##      [,1] [,2]
## [1,]    1    1
## [2,]    1    0

…, then the inverse of that matrix, $X^{+}$, can be found by

library(hypr)
ginv2(X)

##      [,1] [,2]
## [1,]  0    1  
## [2,]  1   -1

So, as the inverse of the given contrast matrix is

\[ \left(\begin{array}{cc} 1 & 1\\ 1 & 0 \end{array}\right)^{+}=\left(\begin{array}{rr} 0 & 1\\ 1 & -1 \end{array}\right) \;, \]

we get

\[ \left(\begin{array}{c} \beta_{0}\\ \beta_{2} \end{array}\right)=\left(\begin{array}{rr} 0 & 1\\ 1 & -1 \end{array}\right)\left(\begin{array}{c} {\mu}_{{\rm Sunday}}\\ {\mu}_{{\rm other}} \end{array}\right)=\left(\begin{array}{c} {\mu}_{{\rm other}}\\ {\mu}_{{\rm Sunday}}-{\mu}_{{\rm other}} \end{array}\right) \;. \]

Therefore, the estimate for the intercept $\beta_0$ is the mean daily water level change for all days other than Sunday. $\beta_2$ is the difference between the mean change on Sundays vs. the mean change on other days.

The statistical interpretation

The p-values for those estimated coefficients tell us whether those values differ significantly from zero. The contrast determines what difference the coefficients represent. So the contrast/hypothesis matrices determine what statistical hypotheses are tested!

For $\hat\beta_0$, the p-value tells us whether $\beta_0\neq0$, i.e. whether ${\mu}_{{\rm other}}\neq0$.

For $\hat\beta_2$, the p-value tells us whether $\beta_2\neq0$, i.e. whether ${\mu}_{{\rm Sunday}}-{\mu}_{{\rm other}}\neq0$, or equivalently whether ${\mu}_{{\rm Sunday}}\neq{\mu}_{{\rm other}}$.

Using hypr to verify the hypotheses

We can use hypr to check the hypotheses for a given contrast matrix:

library(hypr)
dimnames(X) <- list(c("Sundays","other"), c("intercept","slope"))
X

##         intercept slope
## Sundays         1     1
## other           1     0

h <- hypr(X)
h

## hypr object containing 2 null hypotheses:
## H0.intercept: 0 = other            (Intercept)
##     H0.slope: 0 = Sundays - other
## 
## Call:
## hypr(intercept = other ~ 0, slope = Sundays - other ~ 0, levels = c("Sundays", 
## "other"))
## 
## Hypothesis matrix (transposed):
##         intercept slope
## Sundays  0         1   
## other    1        -1   
## 
## Contrast matrix:
##         intercept slope
## Sundays 1         1    
## other   1         0

… or directly from the saved contrast matrix of a factor (adding an intercept column first):

h <- hypr(contrasts(example2$IsSunday), add_intercept = TRUE)
h

## hypr object containing 2 null hypotheses:
## H0.Intercept: 0 = No         (Intercept)
##         H0.2: 0 = -No + Yes
## 
## Call:
## hypr(Intercept = No ~ 0, `2` = -No + Yes ~ 0, levels = c("No", 
## "Yes"))
## 
## Hypothesis matrix (transposed):
##     Intercept 2 
## No   1        -1
## Yes  0         1
## 
## Contrast matrix:
##     Intercept 2
## No  1         0
## Yes 1         1

Using hypr to generate the contrast matrix

Likewise, hypr also accepts hypotheses and uses the matrix inverse to generate the corresponding contrast matrix:

h <- hypr(~ No, ~ Yes - No, levels = levels(example2$IsSunday))
h

## hypr object containing 2 null hypotheses:
## H0.1: 0 = No        (Intercept)
## H0.2: 0 = Yes - No
## 
## Call:
## hypr(No ~ 0, Yes - No ~ 0, levels = c("No", "Yes"))
## 
## Hypothesis matrix (transposed):
##     [,1] [,2]
## No   1   -1  
## Yes  0    1  
## 
## Contrast matrix:
##     [,1] [,2]
## No  1    0   
## Yes 1    1

contrasts(example2$IsSunday) <- contr.hypothesis(h)
contrasts(example2$IsSunday) <- h  # or just like this
contrasts(example2$IsSunday)

##     [,1]
## No     0
## Yes    1

Common contrast coding schemes

As we saw previously, contrasts may generally be assigned to factor like this:

contrasts(df$fac) <- some_matrix

Instead of specifying a contrast matrix by hand, most of the time, people will use built-in functions in R to generate common contrast matrices such as

contrasts(df$fac) <- contr.treatment(4)

The most common ones are:

Treatment contrasts (contr.treatment())
Successive/sliding differences (contr.sdif() from the MASS package)
Helmert contrasts (contr.helmert())
Sum contrasts (contr.sum())
Polynomial contrasts (contr.poly())

Remember that the contrast matrix determines what hypotheses are being tested. So what do they test? We can use hypr to investigate that link.

Treatment contrasts

Treatment contrasts compare the means of the “treatments” against the mean of the “baseline” condition. For a factor with 4 levels where $\mu_1$ is the baseline and $\mu_2, \ldots, \mu_4$ are the “treatments”, the contrast matrix can be generated as follows:

treatment_labels <- c("mu1","mu2","mu3","mu4")
contrast_matrix <- contr.treatment(treatment_labels)
contrast_matrix

##     mu2 mu3 mu4
## mu1   0   0   0
## mu2   1   0   0
## mu3   0   1   0
## mu4   0   0   1

Inspecting that matrix in hypr,

h <- hypr(contrast_matrix, add_intercept = TRUE)
h

## hypr object containing 4 null hypotheses:
## H0.Intercept: 0 = mu1         (Intercept)
##       H0.mu2: 0 = -mu1 + mu2
##       H0.mu3: 0 = -mu1 + mu3
##       H0.mu4: 0 = -mu1 + mu4
## 
## Call:
## hypr(Intercept = mu1 ~ 0, mu2 = -mu1 + mu2 ~ 0, mu3 = -mu1 + 
##     mu3 ~ 0, mu4 = -mu1 + mu4 ~ 0, levels = c("mu1", "mu2", "mu3", 
## "mu4"))
## 
## Hypothesis matrix (transposed):
##     Intercept mu2 mu3 mu4
## mu1  1        -1  -1  -1 
## mu2  0         1   0   0 
## mu3  0         0   1   0 
## mu4  0         0   0   1 
## 
## Contrast matrix:
##     Intercept mu2 mu3 mu4
## mu1 1         0   0   0  
## mu2 1         1   0   0  
## mu3 1         0   1   0  
## mu4 1         0   0   1

we see that the estimated coefficients and tested null hypotheses are:

\[ \begin{array}{rll} H_{0_{1}}: & \beta_{0}=\mu_{1} & =0\\ H_{0_{2}}: & \beta_{1}=\mu_{2}-\mu_{1} & =0\\ H_{0_{3}}: & \beta_{2}=\mu_{3}-\mu_{1} & =0\\ H_{0_{4}}: & \beta_{3}=\mu_{4}-\mu_{1} & =0 \end{array} \]

Sliding/successive difference contrasts

Sliding/successive difference contrasts work for comparing each new level with the previous one. If the levels of the factor are again $\mu_1,\ldots,\mu_4$, the contrast matrix may be generated like this in R:

library(MASS) # contr.sdif() is in the MASS package
step_labels <- c("mu1","mu2","mu3","mu4")
contrast_matrix <- contr.sdif(step_labels)
contrast_matrix

##     mu2-mu1 mu3-mu2 mu4-mu3
## mu1   -0.75    -0.5   -0.25
## mu2    0.25    -0.5   -0.25
## mu3    0.25     0.5   -0.25
## mu4    0.25     0.5    0.75

Inspecting that matrix in hypr,

h <- hypr(contrast_matrix, add_intercept = TRUE)
h

## hypr object containing 4 null hypotheses:
## H0.Intercept: 0 = (mu1 + mu2 + mu3 + mu4)/4  (Intercept)
##   H0.mu2-mu1: 0 = -mu1 + mu2
##   H0.mu3-mu2: 0 = -mu2 + mu3
##   H0.mu4-mu3: 0 = -mu3 + mu4
## 
## Call:
## hypr(Intercept = 1/4 * mu1 + 1/4 * mu2 + 1/4 * mu3 + 1/4 * mu4 ~ 
##     0, `mu2-mu1` = -mu1 + mu2 ~ 0, `mu3-mu2` = -mu2 + mu3 ~ 0, 
##     `mu4-mu3` = -mu3 + mu4 ~ 0, levels = c("mu1", "mu2", "mu3", 
##     "mu4"))
## 
## Hypothesis matrix (transposed):
##     Intercept mu2-mu1 mu3-mu2 mu4-mu3
## mu1 1/4        -1       0       0    
## mu2 1/4         1      -1       0    
## mu3 1/4         0       1      -1    
## mu4 1/4         0       0       1    
## 
## Contrast matrix:
##     Intercept mu2-mu1 mu3-mu2 mu4-mu3
## mu1    1      -3/4    -1/2    -1/4   
## mu2    1       1/4    -1/2    -1/4   
## mu3    1       1/4     1/2    -1/4   
## mu4    1       1/4     1/2     3/4

we see that the estimated coefficients and tested null hypotheses are:

\[ \begin{array}{rll} H_{0_{1}}: & \beta_{0}=\frac{\mu_{1}+\mu_{2}+\mu_{3}+\mu_{4}}{4} & =0\\ H_{0_{2}}: & \beta_{1}=\mu_2-\mu_1 & =0\\ H_{0_{3}}: & \beta_{2}=\mu_3-\mu_2 & =0\\ H_{0_{4}}: & \beta_{3}=\mu_4-\mu_3 & =0 \end{array} \]

Helmert contrasts

Helmert contrasts may be suitable for comparing each new level with the average of the previous ones. This is the contrast of choice for ANOVA-like regressions. If the levels of the factor are again $\mu_1,\ldots,\mu_4$, the contrast matrix may be generated like this in R:

step_labels <- c("mu1","mu2","mu3","mu4")
contrast_matrix <- contr.helmert(step_labels)
contrast_matrix

##     [,1] [,2] [,3]
## mu1   -1   -1   -1
## mu2    1   -1   -1
## mu3    0    2   -1
## mu4    0    0    3

Inspecting that matrix in hypr,

h <- hypr(contrast_matrix, add_intercept = TRUE)
h

## hypr object containing 4 null hypotheses:
## H0.1: 0 = (mu1 + mu2 + mu3 + mu4)/4      (Intercept)
## H0.2: 0 = (-mu1 + mu2)/2
## H0.3: 0 = (-mu1 - mu2 + 2*mu3)/6
## H0.4: 0 = (-mu1 - mu2 - mu3 + 3*mu4)/12
## 
## Call:
## hypr(1/4 * mu1 + 1/4 * mu2 + 1/4 * mu3 + 1/4 * mu4 ~ 0, -1/2 * 
##     mu1 + 1/2 * mu2 ~ 0, -1/6 * mu1 - 1/6 * mu2 + 1/3 * mu3 ~ 
##     0, -1/12 * mu1 - 1/12 * mu2 - 1/12 * mu3 + 1/4 * mu4 ~ 0, 
##     levels = c("mu1", "mu2", "mu3", "mu4"))
## 
## Hypothesis matrix (transposed):
##     [,1]  [,2]  [,3]  [,4] 
## mu1   1/4  -1/2  -1/6 -1/12
## mu2   1/4   1/2  -1/6 -1/12
## mu3   1/4     0   1/3 -1/12
## mu4   1/4     0     0   1/4
## 
## Contrast matrix:
##     [,1] [,2] [,3] [,4]
## mu1  1   -1   -1   -1  
## mu2  1    1   -1   -1  
## mu3  1    0    2   -1  
## mu4  1    0    0    3

we see that the estimated coefficients and tested null hypotheses are (rewritten):

\[ \begin{array}{rll} H_{0_{1}}: & \beta_{0}=\frac{\mu_{1}+\mu_{2}+\mu_{3}+\mu_{4}}{4} & =0\\ H_{0_{2}}: & \beta_{1}=\frac{1}{2}\left(\mu_{2}-\mu_{1}\right) & =0\\ H_{0_{3}}: & \beta_{2}=\frac{1}{3}\left(\mu_{3}-\frac{\mu_{1}+\mu_{2}}{2}\right) & =0\\ H_{0_{4}}: & \beta_{3}=\frac{1}{4}\left(\mu_{4}-\frac{\mu_{1}+\mu_{2}+\mu_{3}}{3}\right) & =0 \end{array} \]

Sum contrasts / Effects coding

Sum contrasts / effects coding are used to compare all but one level against the grand mean. If the levels of the factor are again $\mu_1,\ldots,\mu_4$, the contrast matrix may be generated like this in R:

condition_labels <- c("mu1","mu2","mu3","mu4")
contrast_matrix <- contr.sum(condition_labels)
contrast_matrix

##     [,1] [,2] [,3]
## mu1    1    0    0
## mu2    0    1    0
## mu3    0    0    1
## mu4   -1   -1   -1

Inspecting that matrix in hypr,

h <- hypr(contrast_matrix, add_intercept = TRUE)
h

## hypr object containing 4 null hypotheses:
## H0.1: 0 = (mu1 + mu2 + mu3 + mu4)/4     (Intercept)
## H0.2: 0 = (3*mu1 - mu2 - mu3 - mu4)/4
## H0.3: 0 = (-mu1 + 3*mu2 - mu3 - mu4)/4
## H0.4: 0 = (-mu1 - mu2 + 3*mu3 - mu4)/4
## 
## Call:
## hypr(1/4 * mu1 + 1/4 * mu2 + 1/4 * mu3 + 1/4 * mu4 ~ 0, 3/4 * 
##     mu1 - 1/4 * mu2 - 1/4 * mu3 - 1/4 * mu4 ~ 0, -1/4 * mu1 + 
##     3/4 * mu2 - 1/4 * mu3 - 1/4 * mu4 ~ 0, -1/4 * mu1 - 1/4 * 
##     mu2 + 3/4 * mu3 - 1/4 * mu4 ~ 0, levels = c("mu1", "mu2", 
## "mu3", "mu4"))
## 
## Hypothesis matrix (transposed):
##     [,1] [,2] [,3] [,4]
## mu1  1/4  3/4 -1/4 -1/4
## mu2  1/4 -1/4  3/4 -1/4
## mu3  1/4 -1/4 -1/4  3/4
## mu4  1/4 -1/4 -1/4 -1/4
## 
## Contrast matrix:
##     [,1] [,2] [,3] [,4]
## mu1  1    1    0    0  
## mu2  1    0    1    0  
## mu3  1    0    0    1  
## mu4  1   -1   -1   -1

we see that the estimated coefficients and tested null hypotheses are (rewritten):

\[ \begin{array}{rll} H_{0_{1}}: & \beta_{0}=\frac{\mu_{1}+\mu_{2}+\mu_{3}+\mu_{4}}{4} & =0\\ H_{0_{2}}: & \beta_{1}=\mu_{2}-\frac{\mu_{1}+\mu_{2}+\mu_{3}+\mu_{4}}{4} & =0\\ H_{0_{3}}: & \beta_{2}=\mu_{3}-\frac{\mu_{1}+\mu_{2}+\mu_{3}+\mu_{4}}{4} & =0\\ H_{0_{4}}: & \beta_{3}=\mu_{4}-\frac{\mu_{1}+\mu_{2}+\mu_{3}+\mu_{4}}{4} & =0 \end{array} \]

Some rules…

If a contrast matrix is linearly dependent or collinear, it becomes rank-deficient and cannot be used for linear regression because it would otherwise violate model assumptions.

Collinearity: If any contrasts are collinear, i.e. perfectly correlated, they cannot be independently estimateted. For example,

\[ \begin{pmatrix}\mu_{1}\\ \mu_{2}\\ \mu_{3}\\ \mu_{4} \end{pmatrix}=\begin{pmatrix}\begin{array}{rrrr} 1 & 0 & 0 & 0\\ 1 & 1 & 0 & 1\\ 1 & 1 & 0 & 1\\ 1 & 0 & 1 & 0 \end{array}\end{pmatrix}\begin{pmatrix}\beta_{0}\\ \beta_{1}\\ \beta_{2}\\ \beta_{3} \end{pmatrix}=\begin{pmatrix}\begin{array}{rrrr} \beta_{0}\\ \beta_{0} & +\beta_{1} & & +\beta_{3}\\ \beta_{0} & +\beta_{1} & & +\beta_{3}\\ \beta_{0} & & +\beta_{2} \end{array}\end{pmatrix} \]

cannot be solved for $\beta_1$ and $\beta_3$ because any two values that satisfy $\beta_1+\beta_3=\mu_2-\beta_0=\mu_3-\beta_0$ would solve the equation. There is no single solution.

Independence: A contrast matrix for a $N$-level factor ($N$ rows) can only have up to $N-1$ contrasts (columns), excl. intercept. Otherwise, we would attempt to solve an equation with too many unknown variables. For example,

\[ \begin{pmatrix}\mu_{1}\\ \mu_{2}\\ \mu_{3}\\ \mu_{4} \end{pmatrix}=\begin{pmatrix}\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0\\ 1 & 1 & 0 & 0 & -1\\ 1 & 0 & 1 & 0 & 1\\ 1 & 0 & 0 & 1 & 0 \end{array}\end{pmatrix}\begin{pmatrix}\beta_{0}\\ \beta_{1}\\ \beta_{2}\\ \beta_{3}\\ \beta_{4} \end{pmatrix}=\begin{pmatrix}\begin{array}{rrrrc} \beta_{0}\\ \beta_{0} & +\beta_{1} & & & -\beta_{4}\\ \beta_{0} & & +\beta_{2} & & +\beta_{4}\\ \beta_{0} & & & +\beta_{3} \end{array}\end{pmatrix} \]

has more columns (contrasts + intercept) than rows (factor levels). This is problematic because any three values for $(\beta_1,\beta_2,\beta_4)$ that satisfy $\beta_1-\beta_4=\mu_2-\mu_0$ and $\beta_2+\beta_4=\mu_3-\mu_0$ would solve the equation. There is no single solution.

R will also warn you when you try to enter a rank-deficient contrast matrix, in which case it will drop the linearly dependent columns.

hypr will already tell you if you’re trying to create a hypr object with rank-deficient hypotheses. For example:

hypr(~a, ~b-a, ~a-b)

## Warning in expr2hmat(parsed_hypotheses, levels = levels, order_levels =
## order_levels, : Your hypotheses are not linearly independent. The resulting
## hypothesis matrix was rank-deficient. Dropped hypothesis #3.

## hypr object containing 3 null hypotheses:
##   H0.1: 0 = a      (Intercept)
##   H0.2: 0 = b - a
## ( H0.3: 0 = a - b )
## Note: Hypotheses in parentheses are not linearly independent and thus dropped from the hypothesis and contrast matrices!
## 
## Call:
## hypr(a ~ 0, b - a ~ 0, a - b ~ 0, levels = c("a", "b"))
## 
## Hypothesis matrix (transposed):
##   [,1] [,2]
## a  1   -1  
## b  0    1  
## 
## Contrast matrix:
##   [,1] [,2]
## a 1    0   
## b 1    1

Exercise 3: Using hypr

Open exercise3.r to “play around” with hypr. For the discussed contrast coding schemes, create equivalent pairs of hypr objects from the known contrast matrix and from the theoretical hypotheses.

Exercise 4: Contrasts in action

R comes with a few interesting built-in datasets that are interesting to explore because everyone has them available without having to load them from a file. At first, we are going to look at PlantGrowth. The data are “results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.”

group	M	SD	SE	Q25	Q50	Q75	Min	Max
ctrl	5.032	0.5830914	0.1843897	4.5500	5.155	5.2925	4.17	6.11
trt1	4.661	0.7936757	0.2509823	4.2075	4.550	4.8700	3.59	6.03
trt2	5.526	0.4425733	0.1399540	5.2675	5.435	5.7350	4.92	6.31

Exercise 5: Thinking about your own hypotheses

In exercise 4, you used hypr to check the hypothesis you were testing with the default treatment contrast coding scheme. For this exercise, open exercise5.r and follow the instructions to generate a contrast matrix from your own hypotheses.

Instead of the original hypotheses, think of two alternative ones (e.g. compare one against the other and/or against the grand mean) you could have tested and let the intercept be added automatically.

Let’s talk about the results. Did you encounter any problems? Were the original hypotheses “useful” in the first place?

Day 1 recap

On day 1, we talked about

the concept of contrasts for categorical IVs (predictors) in linear regression
the link between contrast matrices and the according statistical hypotheses
using hypr to inspect contrast matrices, check the underlying hypotheses etc.
using hypr to generate our own contrasts from our a priori hypotheses

What to expect today…

Today, we will talk about

properties of “good contrasts”
inspecting interactions with other factors and covariates
using hypr for your own research
open questions / discussion

“Good contrasts”

We already learned that contrast matrices need to be “full rank”, i.e. they should not be linearly dependent or collinear. However, contrast matrices may also be centered and/or orthogonal. Both are useful properties, especially for multiple linear regressions.

Centered contrasts

Contrasts are centered if their weights sum up to 0 (excl. intercept), \[ \sum_{i,j} c_{i,j} = 0\;. \]

Centered contrasts always have the grand mean as the common intercept, which makes the interpretation of coefficients, especially interactions, more straightforward.

In hypr, you can check for centeredness like this:

h <- hypr(contr.helmert(4), add_intercept = TRUE)
all_centered(h)

## [1] TRUE

Or you can center non-centered contrasts:

h <- hypr(contr.treatment(4), add_intercept = TRUE)
all_centered(h)

## [1] FALSE

centered_contrasts(h)

## hypr object containing 4 null hypotheses:
## H0.Intercept: 0 = (X1 + X2 + X3 + X4)/4  (Intercept)
##         H0.2: 0 = -X1 + X2
##         H0.3: 0 = -X1 + X3
##         H0.4: 0 = -X1 + X4
## 
## Call:
## hypr(Intercept = 1/4 * X1 + 1/4 * X2 + 1/4 * X3 + 1/4 * X4 ~ 
##     0, `2` = -X1 + X2 ~ 0, `3` = -X1 + X3 ~ 0, `4` = -X1 + X4 ~ 
##     0, levels = c("X1", "X2", "X3", "X4"))
## 
## Hypothesis matrix (transposed):
##    Intercept 2   3   4  
## X1 1/4        -1  -1  -1
## X2 1/4         1   0   0
## X3 1/4         0   1   0
## X4 1/4         0   0   1
## 
## Contrast matrix:
##    Intercept 2    3    4   
## X1    1      -1/4 -1/4 -1/4
## X2    1       3/4 -1/4 -1/4
## X3    1      -1/4  3/4 -1/4
## X4    1      -1/4 -1/4  3/4

If your contrasts are not centered, they will have an effect on the intercept of the model and thereby possibly change the estimates and interpretation of all other coefficients of the regression model as well!

Note that the “default” contrast is a non-centered treatment contrast. So every factor added to your regression for which you don’t specify a different, centered contrast, necessarily changes your entire model!

Orthogonal contrasts

Two centered contrasts $a$ and $b$ are orthogonal if their product sum is 0, \[ \sum_i a_{i} b_{i} = 0\;. \]

Orthogonal contrasts are not correlated and measure completely different dimensions of the DV.
If a contrast matrix is centered and orthogonal, its generalized inverse (the hypothesis matrix) is identical to its transpose (rows and columns swapped).
Centeredness is equivalent to “orthogonality to the intercept”. If a contrast is centered dropping the intercept or the contrast does not change estimated coefficients of the other. If a contrast is not centered, the respective other changes!

Interactions

Assume a linear model

\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon \]

where values of $Y$ change with the values/levels of $X_1$ and $X_2$.

We know that

the intercept $\beta_0$ is the average observation if all IVs are zero (i.e., $x_1=x_2=0$)
the slope $\beta_1$ on $X_1$ is the change in $Y$ by $X_1$
the slope $\beta_2$ on $X_2$ is the change in $Y$ by $X_2$

It is sometimes neglected that

the intercept $\beta_0$ is the average observation if all IVs are zero (i.e., $x_1=x_2=0$)
the slope $\beta_1$ on $X_1$ is the change in $Y$ by $X_1$, when $X_2=0$
the slope $\beta_2$ on $X_2$ is the change in $Y$ by $X_2$, when $X_1=0$

To account for the fact that $\beta_1$ might differ for $x_2\neq0$ or that $\beta_2$ might differ for $x_1\neq0$, we can include interactions:

\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \epsilon \]

However, as for the other terms, it is important to be aware of the tested hypotheses and the corresponding intercept to interpret the main slopes for covariates and their interactions with factors.

Factor interactions

Remember that categorical predictors may yield more than one contrast. For example, a $3\times2$ design would yield up to three main effects and two interaction terms in a regression. Therefore, for interactions it becomes more and more important to understand the tested hypotheses in order to understand how they will interact in the model.

For example, assume a $2\times2$ factorial design with factors A and B, both of which are assigned the default treatment contrast. It is clear what their regression slopes are testing:

hyp_a <- hypr(~A1, ~A2-A1)
hyp_a

## hypr object containing 2 null hypotheses:
## H0.1: 0 = A1       (Intercept)
## H0.2: 0 = A2 - A1
## 
## Call:
## hypr(A1 ~ 0, A2 - A1 ~ 0, levels = c("A1", "A2"))
## 
## Hypothesis matrix (transposed):
##    [,1] [,2]
## A1  1   -1  
## A2  0    1  
## 
## Contrast matrix:
##    [,1] [,2]
## A1 1    0   
## A2 1    1

hyp_b <- hypr(~B1, ~B2-B1)

But when they are both entered in the model, the estimates and hypotheses slightly change:

hyp_a * hyp_b

## hypr object containing 4 null hypotheses:
## H0.1: 0 = A1:B1                          (Intercept)
## H0.2: 0 = -A1:B1 + A2:B1
## H0.3: 0 = -A1:B1 + A1:B2
## H0.4: 0 = A1:B1 - A1:B2 - A2:B1 + A2:B2
## 
## Call:
## hypr(`A1:B1` ~ 0, -`A1:B1` + `A2:B1` ~ 0, -`A1:B1` + `A1:B2` ~ 
##     0, `A1:B1` - `A1:B2` - `A2:B1` + `A2:B2` ~ 0, levels = c("A1:B1", 
## "A1:B2", "A2:B1", "A2:B2"))
## 
## Hypothesis matrix (transposed):
##       [,1] [,2] [,3] [,4]
## A1:B1  1   -1   -1    1  
## A1:B2  0    0    1   -1  
## A2:B1  0    1    0   -1  
## A2:B2  0    0    0    1  
## 
## Contrast matrix:
##       [,1] [,2] [,3] [,4]
## A1:B1 1    0    0    0   
## A1:B2 1    0    1    0   
## A2:B1 1    1    0    0   
## A2:B2 1    1    1    1

The intercept $\hat\beta_0$ is now the conditional mean of $Y$ where A=A1 and B=B1.
The coefficient $\hat\beta_1$ for A will be the difference between A1 and A2 under the condition that B=B1 (the intercept/baseline for B).
The coefficient $\hat\beta_2$ for B will be the difference between B1 and B2 under the condition that A=A1 (the intercept/baseline for A).
The interaction $\hat\beta_3$ (or $\hat\beta_{1\times2}$) for A:B will be the difference of these differences. For example, it will test whether the mean of A2:B2 ($\mu_\textrm{A2,B2}$) is different from the sum of the mean A1:B1 ($\mu_\textrm{A1,B1}$) and the A effect (i.e., the difference between A1:B1 and A2:B1).

Example 6: Factor interactions

We are going to look at the built-in ToothGrowth dataset. The data are “length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs [after receiving] one of three dose levels of vitamin C by one of two delivery methods [orange juice or ascorbic acid].”

Open exercise6.r and inspect the dataset.
Think of hypotheses for each factor (dose and supp).
Check that they are testing what they are supposed to.
Look at their interaction. What is the full model testing?
Fit the model and interpret the results.

Covariates in hypr

Conceptionally, a covariate can be though of as continuous treatment contrast. But rather than either applying the treatment ($x=1$) or not ($x=0$), they continuously scale the effect.

In hypr, there is currently no way to account for covariates but the package is still useful to understand how contrasts can change the interpretation of covariates and their interactions.

For example, consider the following contrast:

hypr(~control, ~treatment-control)

## hypr object containing 2 null hypotheses:
## H0.1: 0 = control              (Intercept)
## H0.2: 0 = treatment - control
## 
## Call:
## hypr(control ~ 0, treatment - control ~ 0, levels = c("control", 
## "treatment"))
## 
## Hypothesis matrix (transposed):
##           [,1] [,2]
## control    1   -1  
## treatment  0    1  
## 
## Contrast matrix:
##           [,1] [,2]
## control   1    0   
## treatment 1    1

Applying this contrast to a factor in a linear regression such as \[ Y=\beta_0+\beta_1 X_1 +\epsilon \]

will estimate $\hat\beta_0$ as the conditional mean of the control condition (mean $Y$ when $x_1=0$) and $\hat\beta_1$ as the difference between the means of the treatment and control conditions.

Adding a covariate $X_2$ (such as age) to the regression, \[ Y=\beta_0+\beta_1 X_1+\beta_2 X_2 +\epsilon \]

will estimate $\hat\beta_0$ as the conditional mean of the control condition and age 0 (mean $Y$ when $x_1=x_2=0$). But what about $\hat\beta_1$ and $\hat\beta_2$? Slopes are assumed to be linearly independent and all share the same intercept. So they measure their respective effect under the assumption that the other variables are zero.

$\hat\beta_1$ will be the difference of treatment and control for age zero and $\hat\beta_2$ will be the age effect on $Y$ in the control condition (because that is what $x_1=0$ represents).

In this case, determining the interpretation of $x_1=0$ is straightforward but what it means for other contrasts, especially custom ones and interactions, is determined by the intercept term which can be seen in hypr.

Interactions with covariates

When two covariates interact, the interaction simply models their product. This is not always informative. It may be more helpful to know how the covariate changes with the levels of a factor.

Remember that the interpretation of the first-order slope on the covariate is determined by the intercept term of the model. The first-order slope on the covariate refers to the conditional mean modelled in the intercept of the regression! You can find the intercept using hypr by combining all categorical predictors in the same way you would enter them into the regression model.

An interaction of a covariate and factor models the hypothesized effect of the factor on the effect of the covariate on the dependent variable. So with the previous example and adding an interaction between $X_1$ (treatment) and $X_2$ (age), \[ Y=\beta_0+\beta_1 X_1+\beta_2 X_2+\beta_3 X_1 X_2 +\epsilon \]

$\hat\beta_3$ will now be the difference in the effect of age ($\hat\beta_1$) on $Y$ between the treatment and control condition (because that is what the slope $\hat\beta_2$ on contrast $X_2$ represents).

However, $\hat\beta_1$ will still be the difference of treatment and control for age zero and $\hat\beta_2$ will still be the age effect on $Y$ in the control condition (because that is what $x_1=0$ represents).

Example 7: Interactions with covariates

In a psychological experiment, people received one of two treatments or no treatment (control). The outcome variable is the response time to a target stimulus. Submit the data to a linear regression with response time as the DV as well as age and experimental condition as the IVs.

Open exercise7.r and inspect at the dataset.
Think of hypotheses for the factor (condition).
What will the interactions be testing?
Fit the model and interpret the results. Do your expectations match the results?
Remove the factor from the regression. Why do your coefficients change?
Think of a contrast which will not change the coefficients when added/removed.

Thanks!

Any questions or suggestions?

Materials, exercises, and slides will be uploaded to the Google Drive share and on my website (https://maxrabe.com/courses/emsscs).

Contrast coding using linear regression with hypr

Maximilian M. RabeExperimental and Biological PsychologyUniversity of Potsdam, Germany

2021-07-21/22

Introduction

Then why use linear regression instead of ANOVA?

Linear models

Linear regression

Linear models: Simple regression example

Exercise 1

Linear models: Residuals

Linear models: Categorical differences

Linear models: Multiple regression example

Contrasts: Estimating categorical differences

Exercise 2

Assigning contrasts to factors

Contrasts and hypotheses

Contrasts and hypotheses: Water tank example

Matrix inverse

Hypothesis matrix

The statistical interpretation

Using hypr to verify the hypotheses

Using hypr to generate the contrast matrix

Common contrast coding schemes

Treatment contrasts

Sliding/successive difference contrasts

Helmert contrasts

Sum contrasts / Effects coding

Some rules…

Exercise 3: Using hypr

Exercise 4: Contrasts in action

Exercise 5: Thinking about your own hypotheses

Day 1 recap

What to expect today…

“Good contrasts”

Centered contrasts

Orthogonal contrasts

Interactions

Factor interactions

Example 6: Factor interactions

Covariates in hypr

Interactions with covariates

Example 7: Interactions with covariates

Thanks!

Maximilian M. Rabe
Experimental and Biological Psychology
University of Potsdam, Germany