EMS-SCS 2021: Contrast coding workshop

Comparison of groups receiving different treatments is at the core of statistical analysis for the social sciences and is often addressed with ANOVA. While the latter is derived from the same statistical framework as linear regression, in ANOVA, hypothesis testing is relatively limited. Linear regression allows more specific hypothesis testing, but its specification and interpretation requires some understanding of the concept of contrasts. In this hands-on workshop, we will look at this concept and common contrast coding schemes more closely. The hypr package in R will aid our exploration of the link between statistical hypotheses and contrast codes. During the course, we will talk about different contrasts, how to validate the underlying statistical hypotheses of a given contrast coding scheme, and how to derive contrasts for testing our own hypotheses in a linear regression.

Preparation

The workshop is largely based on the contrast coding tutorial by Daniel Schad et al. (Journal of Memory and Language, 110, 2020, Article no. 104038, https://doi.org/10.1016/j.jml.2019.104038). Skimming prior to the course is recommended but not required.

Disclaimer: To complete the exercises in this workshop, you may have to download software, packages, documents, or example scripts. By opening or downloading any of these on your machine, you are responsible for the use or non-use. I do not accept any liability or responsibility for the safety, contents, utility, or any problems incurred by installing or using the linked software or services, in particular but not limited to those linked on external websites (outside the maxrabe.com domain).

Install R and RStudio: You should be familiar with the R programming language. To check your current version, type version in the R/RStudio console. The latest stable version should be 4.1.0 (Camp Pontanezen) but everything as recent as 3.6.0 should work. To update or install R, download the most current version at https://cloud.r-project.org/. In the workshop, I will use RStudio as a graphical interface (GUI) to R. You can check your installed version by clicking on Help > About RStudio in the menu bar. The most recent version should be 1.4.xx. To update or install the most recent version, go to https://www.rstudio.com/products/rstudio/download/. The exercises can be done without RStudio but I will not provide support within the course for any interface other than RStudio due to the limited time.

Create a new project: Once R and RStudio are installed, please create a new project in RStudio. Please do not use an existing project or the general “Project: (None)” because we do not want to mess with your existing work! Create a project by clicking on File > New Project in the menu bar. In the popup window, click on New Directory > New Project. The name of the project and storage location are up to you but make sure you pick a name that will help you identify this project. If for some reason RStudio crashes/is closed or you need to work on something else in the meantime, you can always switch back to the project in the top right corner of the RStudio window. Before you confirm, you may want to consider enabling Use renv with this project. This will make sure that any package updates or installs during the course will only affect this course project, not your other work. If you enable Open in new session, the course project will not close your current RStudio session, in case you want to leave that open. After entering the project name, selecting the storage location, and considering additional options, click on Create Project.

Packages and materials: Install the latest versions of the packages used in the course and download the course materials. The “quick and dirty” way is to execute the setup script I have prepared for you. Just execute this line of code in the console of the RStudio project you just created:

 
# This will setup your packages and materials
source("https://maxrabe.com/documents/teaching/contrasts/setup.r")

Attention: Sourcing scripts from remote locations is a potential security risk! Alternatively, you can download the script (click here) and execute it line by line in the RStudio console panel of the course project.

On some machines, you may see prompts in your console that ask you to choose between source and binary versions or to confirm that you want to install from source. I recommend that you always install from source where possible. If the install fails (e.g. because there is no proper compiler available), try again and install the binary (non-source) version.

The script will also download course materials and unzip them into the current working directory, which should be the project directory we just created. The last line just makes sure that your workspace is cleared before we start off. You can always do that yourself by clicking on the broom icon in the workspace panel.

Note: If you enabled renv during project creation, any package updates and installs will only happen locally and affect only this project. All other projects will keep their respective package versions. If you decide later on that you want to keep using these packages (and have not installed them before), you will need to install them there again. If you did not enable renv, you will be able to use these packages in other projects right away but just keep in mind that it package updates/installs will then also apply to your other projects. Changed your opinion about renv? From the menu bar, go to Tools > Project Options > Environments and toggle the respective checkbox, then execute the setup script again (see above) to make sure all packages are installed and current.

Tidyverse: We are going to use tidyverse packages, in particular dplyr and ggplot2. If you had not installed those before the course, the setup script from above should have done that for you. You will also see a lot of “pipe operators” (%>%) in use throughout the examples. Note that a %>% b(param = TRUE) %>% c()  is equivalent to c(b(a, param = TRUE)) but arguably more readable, especially in longer data processing chains.

Statistical significance: Different disciplines have different standards of determining statistical significance. In this course, we will use frequentist statistical inference and conclude significance (reject the null hypothesis) for p < .05.

Slides

The slides for this workshop are continually updated and can be downloaded here.

Day 1: Contrasts in linear regression

We will talk about the concept of contrast coding in linear regression and common contrast schemes available in R. The hypr package in R will aid our exploration of the link between those contrasts and their respectively tested hypotheses.

To download the course materials for day 1, execute these lines in the RStudio workshop project (see preparation above) or download and unzip the linked file into the working directory of that project:

download.file("https://maxrabe.com/documents/teaching/contrasts/day1.zip", tmp <- tempfile())
unzip(tmp, overwrite = FALSE)
unlink(tmp)

Day 2: Implementing and verifying hypotheses

We will use hypr to verify the hypotheses tested by specific contrast coding schemes, and to implement contrasts based on our own experimental hypotheses.

To download the course materials for day 2, execute these lines in the RStudio workshop project (see preparation above) or download and unzip the linked file into the working directory of that project:

download.file("https://maxrabe.com/documents/teaching/contrasts/day2.zip", tmp <- tempfile())
unzip(tmp, overwrite = FALSE)
unlink(tmp)