Rmd

Kontrastkoeffizienten sind Gewichte für Mittelwertunterschiede
Modellparameter * Ausprägung der Dummyvariablen für jeweils eine Beobachtung (ihre Faktorstufe)
mit ihrer Hilfe können wir spezielle Hypothesen für Mittelwertunterschiede prüfen
auch Trends, also Verläufe über die Faktor-Level bzw. Faktorstufen sind möglich
die Parameter werden bei der Modellanpassung an die Daten geschätzt und auf Sig
Ein einzelner Kontrast, also die Koeffizienten in einer Dummy-Variable, ist nicht an sich orthogonal oder nicht. Mehrere Kontraste können orthogonal zueinander sein. Sie sind es, wenn sie bestimmte Bedingungen erfüllen (vgl. Field et al 2012, chapter 10.4 p 414 ff). Orthogonalität betrifft also das Verhältnis von Kontrasten bzw. Dummy-Variablen zueinander.
Eine Kontrastmatrix ist orthogonal, wenn wir die Kontrastkoeffizienten pro Zeile miteinander multiplizieren, die Zeilenergebnisse aufsummieren und das dann 0 ergibt. (Helmert-Kontrast) (das Summenprodukt muss 0 sein)
Eine Kontrast-Koeffizient in einer Dummy-Variable ist verknüpft mit einer bestimmten Subkategorie (Faktor-Level). Eine 0 hier bewirkt, dass der Effekt (coefficient) auf die jeweilige Subkategorie nicht angewandt wird (Multiplikation mit 0 ergibt 0).
Wenn ein Faktor-Level in allen Dummy-Variablen eine 0 hat, geht in die Schätzung nur noch der Intercept ein. Dies ist dann die Referenzgruppe und der Intercept ist der Mittelwert der Reaktionsvariable in dieser Gruppe.
Kontrast-Koeffizienten werden mit den Effekten multipliziert um die Reaktionsvariable für eine spezielle Subgruppe zu berechnen.
Eine Dummy-Variable definiert einen Effekt. -1 und +1 für zwei Subgruppen prüft den Unterschied zwischen genau diesen beiden Gruppen auf Signifikanz. -2 für einen Level und +1 für zwei andere Level prüft den Unterschied zwischen dem Mittelwert einer Subgruppe und dem Mittelwert der Mittelwerte der beiden anderen Subgruppen auf Signifikanz.

Es gibt ein paar Standard-Kontraste, die car-Package definiert sind, darunter auch der oben verwendete car::contr.Sum() und car::contr.Helmert().

Vorteile und Prinzipien

Keine Alpha-Inflation - Gesamt-Risiko wird eingehalten. Keine multiplen Vergleiche (wie bei multiplen T-Tests). Daher auch keine Alpha Inflation, die korrigiert werden muss. Allerdings gibt es hierzu unterschiedliche Meinungen (Vortrag Daniel Lakens, Juni 2017).

Aufteilen der erkärten Varianz. Hypothesen vorher. Kontraste -> dummy-Variablen -> stat. Modell Kontraste drücken Hypothesen in Mittelwerten bzw. Mittelwertsunterschieden aus. Die aufgrund der Kontraste generierten Dummy-Variablen realisieren/sind das Modell.

Standard Kontraste.

Standard-Kontraste existieren sowohl im Base-System als beispielsweise auch im package(car). im package(car) beginnen die Typen (Namen) der Kontraste mit einem Großbuchstaben. contr.sum() vs. contr.Sum()

contr.treatment bzw. car::contr.Treatment() default in R, Referenzkodierung T. steht im Dummy-Kontrast-Namen
contr.SAS() wrapper für contr.treatment mit letztem Faktor-Level als Vergleichsgruppe
contr.helmert() bzw car::contr.Helmert() Außer der letzten wird jede Kategorie (Faktorstufe) mit dem mittleren Effekt der folgenden Faktorstufen verglichen
contr.sum() bzw car::contr.Sum() uses ‘sum to zero contrasts.’ Gives orthogonal contrasts where you compare every level to the overall mean.
contr.poly() gibt es nur im Base-System, nicht in library(car)

Design Matrix: Die Koeffizienten kann man sich für einen bestimmten Faktor ausgeben lassen. Intern wird eine entsprechende Menge an Dummy-Variablen gebildet.

Kontraste können als Eigenschaft(property) an eine Variable vom Typ factor() angehängt werden. Beispiel: im Data-Frame die vierstufige Variable V1 (Typ factor) mit einem referenzkodierten Kontrast versehen, dabei die Stufe 3 als Referenzkategorie wählen. contrasts(dataframe$V1) <- contr.Treatment(4, base=3) Danach aov() aufrufen.

Achtung in obigem Beispiel gibt es Bezüge auf die Reihenfolge der Faktorstufen. Diese Reihenfolge richtet sich nach der alphabetischen Reihenfolge der Faktorstufen-Bezeichnungen (levels). Die Reihenfolge kann man sich über den Befehl levels() ausgeben lassen. Für obigen Faktor V1 würde der Befehl lauten: levels(dataframe$V1). Transparenter wird das Ganze, wenn man beides miteinander verbindet: contrasts(dataframe$V1) <- contr.Treatment(levels(dataframe$V1), base=3). Hier wird der Bezug auf den dritten Level des factor V1 unmittelbar klar.

Kontraste werden in Faktorstufen -1 Dummy-Variablen umgesetzt, die man sich ausgeben lassen kann model.matrix(). Hierüber hat man eine Kontrollmöglichkeit über das Vorgehen der Berechnung (“Welche Faktorstufe ist Referenzgruppe?”).

Eigendefinierte Spezialkontraste können über den Weg der Speicherung als Eigenschaft des Faktors ebenfalls spezifiziert werden. contrasts(dataframe$V1) <- cbind(c(-1,0,-1,+2), c(-1,2,-2,0), c(-1,1,-1,1)). Kontrolliert werden kann der mit einem Faktor verknüpfte Kontrast über die Ausgabe der Variablen bzw. contrasts(FaktorVariablenName)

Je nach Art des Kontrastes müssen die Koeffizienten/Gewichte (coefficients) unterschiedlich interpretiert werden. Die varianzanalytischen aggregierten Tests sind identisch.

Beispiel STAI-State in einer neutralen Situation in verschiedenen Gruppen (Geschlecht, Angstgruppe)

Es werden hier verschiedene Arten von Kontrasten demonstriert, die nicht unbedingt alle inhaltlich Sinn machen.

-Default Einstellung ist contr.Treatment mit der ersten Faktorstufe als Referenzstufe. -Hier ist das die Gruppe mit dem Label ‘av’ (average student), die eher zufällig auch inhaltlich die adäquate Vergleichsgruppe ist.

# get data
# dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
dd <- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]

## Rows: 180 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): gender, group
## dbl (4): subj, neutr, exam1, exam2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dd$gender <- factor(dd$gender)
dd$group <- factor(dd$group)
# although existing in the base system in car package more transparent
require(car)

## Lade nötiges Paket: car
## Lade nötiges Paket: carData

# levels of factor group
levels(dd$group)

## [1] "av" "hi" "lo"

# show design matrices for some standard contrasts

# standard contrasts
contr.Treatment(levels(dd$group))

##    [T.hi] [T.lo]
## av      0      0
## hi      1      0
## lo      0      1

# refernce group is 'lo' group, default is first level (group 1)
contr.Treatment(levels(dd$group), base=3)

##    [T.av] [T.hi]
## av      1      0
## hi      0      1
## lo      0      0

# compare each group with last group, like SAS does/did
contr.SAS(levels(dd$group))

##    av hi
## av  1  0
## hi  0  1
## lo  0  0

# compare each group with mean effect of all subsequent categories, except last group
contr.Helmert(levels(dd$group))

##    [H.1] [H.2]
## av    -1    -1
## hi     1    -1
## lo     0     2

# contr.Sum effect coding
contr.Sum(levels(dd$group))

##    [S.av] [S.hi]
## av      1      0
## hi      0      1
## lo     -1     -1

# polynomial contrasts
contr.poly(levels(dd$group))

##                 .L         .Q
## [1,] -7.071068e-01  0.4082483
## [2,] -7.850462e-17 -0.8164966
## [3,]  7.071068e-01  0.4082483

# do an analysis with a special contrast
unique(model.matrix(aov(neutr ~ group, data=dd, contrasts = list(group = contr.Treatment))))

##     (Intercept) group[T.2] group[T.3]
## 1             1          0          0
## 61            1          0          1
## 121           1          1          0

coefficients(aov(neutr ~ group, data=dd, contrasts = list(group = contr.Treatment)))

## (Intercept)  group[T.2]  group[T.3] 
##    43.48333    11.03333    -9.95000

unique(model.matrix(aov(neutr ~ group, data=dd, contrasts = list(group = contr.Sum))))

##     (Intercept) group[S.1] group[S.2]
## 1             1          1          0
## 61            1         -1         -1
## 121           1          0          1

coefficients(aov(neutr ~ group, data=dd, contrasts = list(group = contr.Sum)))

## (Intercept)  group[S.1]  group[S.2] 
##  43.8444444  -0.3611111  10.6722222

unique(model.matrix(aov(neutr ~ group, data=dd, contrasts = list(group = contr.poly))))

##     (Intercept)       group.L    group.Q
## 1             1 -7.071068e-01  0.4082483
## 61            1  7.071068e-01  0.4082483
## 121           1 -7.850462e-17 -0.8164966

coefficients(aov(neutr ~ group, data=dd, contrasts = list(group = contr.poly)))

## (Intercept)     group.L     group.Q 
##   43.844444   -7.035712  -13.070749

unique(model.matrix(aov(neutr ~ group, data=dd)))

##     (Intercept) grouphi grouplo
## 1             1       0       0
## 61            1       0       1
## 121           1       1       0

coefficients(aov(neutr ~ group, data=dd))

## (Intercept)     grouphi     grouplo 
##    43.48333    11.03333    -9.95000

require(Rmisc)

## Lade nötiges Paket: Rmisc
## Lade nötiges Paket: lattice
## Lade nötiges Paket: plyr

Rmisc::summarySE(data=dd, "neutr")

##    .id   N    neutr       sd      se      ci
## 1 <NA> 180 43.84444 14.96694 1.11557 2.20136

Rmisc::summarySE(data=dd, "neutr", groupvars="group")

##   group  N    neutr       sd       se       ci
## 1    av 60 43.48333 12.34324 1.593506 3.188598
## 2    hi 60 54.51667 13.18897 1.702688 3.407072
## 3    lo 60 33.53333 11.36821 1.467629 2.936720

# you may save a specific contrast for a factor
contrasts(dd$group) <- contr.Treatment(3, base=2)
# and control the current settings
dd$group['contrasts']

## [1] <NA>
## attr(,"contrasts")
##    [T.1] [T.3]
## av     1     0
## hi     0     0
## lo     0     1
## Levels: av hi lo

# you may define your own contrasts also
# compare all groups to the lo goup
levels(dd$group)

## [1] "av" "hi" "lo"

c1 <- c(1, 1, -2)
c2 <- c(-1, 1, 0)
contrasts(dd$group) <- cbind(c1, c2)
# check the design matrix
dd$group

##   [1] av av av av av av av av av av av av av av av av av av av av av av av av av
##  [26] av av av av av av av av av av av av av av av av av av av av av av av av av
##  [51] av av av av av av av av av av lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo
##  [76] lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo
## [101] lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo lo hi hi hi hi hi
## [126] hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi
## [151] hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi
## [176] hi hi hi hi hi
## attr(,"contrasts")
##    c1 c2
## av  1 -1
## hi  1  1
## lo -2  0
## Levels: av hi lo

dd$group['contrasts']

## [1] <NA>
## attr(,"contrasts")
##    c1 c2
## av  1 -1
## hi  1  1
## lo -2  0
## Levels: av hi lo

dd$group[2][1]

## [1] av
## attr(,"contrasts")
##    c1 c2
## av  1 -1
## hi  1  1
## lo -2  0
## Levels: av hi lo

model.matrix(aov(neutr ~ group, data=dd))

##     (Intercept) groupc1 groupc2
## 1             1       1      -1
## 2             1       1      -1
## 3             1       1      -1
## 4             1       1      -1
## 5             1       1      -1
## 6             1       1      -1
## 7             1       1      -1
## 8             1       1      -1
## 9             1       1      -1
## 10            1       1      -1
## 11            1       1      -1
## 12            1       1      -1
## 13            1       1      -1
## 14            1       1      -1
## 15            1       1      -1
## 16            1       1      -1
## 17            1       1      -1
## 18            1       1      -1
## 19            1       1      -1
## 20            1       1      -1
## 21            1       1      -1
## 22            1       1      -1
## 23            1       1      -1
## 24            1       1      -1
## 25            1       1      -1
## 26            1       1      -1
## 27            1       1      -1
## 28            1       1      -1
## 29            1       1      -1
## 30            1       1      -1
## 31            1       1      -1
## 32            1       1      -1
## 33            1       1      -1
## 34            1       1      -1
## 35            1       1      -1
## 36            1       1      -1
## 37            1       1      -1
## 38            1       1      -1
## 39            1       1      -1
## 40            1       1      -1
## 41            1       1      -1
## 42            1       1      -1
## 43            1       1      -1
## 44            1       1      -1
## 45            1       1      -1
## 46            1       1      -1
## 47            1       1      -1
## 48            1       1      -1
## 49            1       1      -1
## 50            1       1      -1
## 51            1       1      -1
## 52            1       1      -1
## 53            1       1      -1
## 54            1       1      -1
## 55            1       1      -1
## 56            1       1      -1
## 57            1       1      -1
## 58            1       1      -1
## 59            1       1      -1
## 60            1       1      -1
## 61            1      -2       0
## 62            1      -2       0
## 63            1      -2       0
## 64            1      -2       0
## 65            1      -2       0
## 66            1      -2       0
## 67            1      -2       0
## 68            1      -2       0
## 69            1      -2       0
## 70            1      -2       0
## 71            1      -2       0
## 72            1      -2       0
## 73            1      -2       0
## 74            1      -2       0
## 75            1      -2       0
## 76            1      -2       0
## 77            1      -2       0
## 78            1      -2       0
## 79            1      -2       0
## 80            1      -2       0
## 81            1      -2       0
## 82            1      -2       0
## 83            1      -2       0
## 84            1      -2       0
## 85            1      -2       0
## 86            1      -2       0
## 87            1      -2       0
## 88            1      -2       0
## 89            1      -2       0
## 90            1      -2       0
## 91            1      -2       0
## 92            1      -2       0
## 93            1      -2       0
## 94            1      -2       0
## 95            1      -2       0
## 96            1      -2       0
## 97            1      -2       0
## 98            1      -2       0
## 99            1      -2       0
## 100           1      -2       0
## 101           1      -2       0
## 102           1      -2       0
## 103           1      -2       0
## 104           1      -2       0
## 105           1      -2       0
## 106           1      -2       0
## 107           1      -2       0
## 108           1      -2       0
## 109           1      -2       0
## 110           1      -2       0
## 111           1      -2       0
## 112           1      -2       0
## 113           1      -2       0
## 114           1      -2       0
## 115           1      -2       0
## 116           1      -2       0
## 117           1      -2       0
## 118           1      -2       0
## 119           1      -2       0
## 120           1      -2       0
## 121           1       1       1
## 122           1       1       1
## 123           1       1       1
## 124           1       1       1
## 125           1       1       1
## 126           1       1       1
## 127           1       1       1
## 128           1       1       1
## 129           1       1       1
## 130           1       1       1
## 131           1       1       1
## 132           1       1       1
## 133           1       1       1
## 134           1       1       1
## 135           1       1       1
## 136           1       1       1
## 137           1       1       1
## 138           1       1       1
## 139           1       1       1
## 140           1       1       1
## 141           1       1       1
## 142           1       1       1
## 143           1       1       1
## 144           1       1       1
## 145           1       1       1
## 146           1       1       1
## 147           1       1       1
## 148           1       1       1
## 149           1       1       1
## 150           1       1       1
## 151           1       1       1
## 152           1       1       1
## 153           1       1       1
## 154           1       1       1
## 155           1       1       1
## 156           1       1       1
## 157           1       1       1
## 158           1       1       1
## 159           1       1       1
## 160           1       1       1
## 161           1       1       1
## 162           1       1       1
## 163           1       1       1
## 164           1       1       1
## 165           1       1       1
## 166           1       1       1
## 167           1       1       1
## 168           1       1       1
## 169           1       1       1
## 170           1       1       1
## 171           1       1       1
## 172           1       1       1
## 173           1       1       1
## 174           1       1       1
## 175           1       1       1
## 176           1       1       1
## 177           1       1       1
## 178           1       1       1
## 179           1       1       1
## 180           1       1       1
## attr(,"assign")
## [1] 0 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
##    c1 c2
## av  1 -1
## hi  1  1
## lo -2  0

coefficients(aov(neutr ~ group, data=dd))

## (Intercept)     groupc1     groupc2 
##   43.844444    5.155556    5.516667

summary(aov(neutr ~ group, data=dd))

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## group         2  13221    6610   43.53 4.21e-16 ***
## Residuals   177  26877     152                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Aus Kontrasten resultierende Dummy-Variablen speichern/ausgeben

In R können die für ein Modell erzeugten Kontrastkoeffizienten bzw. Dummy-Variablen mit dem Befehl model.matrix() generiert werden. Die Design-Matrix (contr.()) zeigt das Prinzip hinter der Dummy-Codierung. Dadurch hat man Zugriff hierauf und kann sie beispielsweise zu einem Data-Frame hinzufügen oder speichern. Analyse der Dummy-Matrix bzw. die Kontrolle der contrast-settings eines factors sind eine sichere Möglichkeit, das berechnete Modell nachzuvollziehen.

# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]

m.anx <- aov(neutr ~ group, data=dd)

dd <- cbind(dd, model.matrix(neutr ~ group, data=dd))
head(dd)

##   subj gender group neutr (Intercept) grouphi grouplo
## 1    1      f    av    42           1       0       0
## 2    2      f    av    59           1       0       0
## 3    3      f    av    49           1       0       0
## 4    4      f    av    20           1       0       0
## 5    5      f    av    45           1       0       0
## 6    6      f    av    39           1       0       0

# generate effect coded contrast matrix
mm.e <- model.matrix(aov(neutr ~ group, data=dd, contrasts = list(group = contr.Sum)))

R has a “sub-language” to translate formulas into design matrix, and in the spirit of the language you can take advantage of it.

Voreingestellte Kontraste (default contrasts)

Kontraste können auch als Default gesetzt sein und verändert werden

# get data
#dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
dd <- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]

## Rows: 180 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): gender, group
## dbl (4): subj, neutr, exam1, exam2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dd$gender <- factor(dd$gender)
dd$group <- factor(dd$group)

default.options <- options('contrasts')
default.options

## $contrasts
##         unordered           ordered 
## "contr.treatment"      "contr.poly"

coefficients(aov(neutr ~ group, data=dd))

## (Intercept)     grouphi     grouplo 
##    43.48333    11.03333    -9.95000

# with changed default options the same call to aov() produces different results
options(contrasts=c('contr.sum','contr.poly'))
coefficients(aov(neutr ~ group, data=dd))

## (Intercept)      group1      group2 
##  43.8444444  -0.3611111  10.6722222

# reset to old default
options(contrasts=default.options[['contrasts']])

2x2 oder 1x4 Design mit Kontrasten

Im folgenden Beispiel wird ein eigentlich 2*2 varianzanalytisches Design auch über einen Faktor mit 4 Faktorstufen gerechnet, wobei mit Kontrasten gearbeitet wird. Bei diesem Beispiel werden auch spezifische, nicht orthogonale Kontraste angegeben.

dd <- read.delim(header=F, text="
1   a1  b1  a1b1    10
2   a1  b1  a1b1    11
3   a1  b1  a1b1    9
4   a1  b1  a1b1    12
5   a1  b1  a1b1    10
6   a1  b2  a1b2    15
7   a1  b2  a1b2    17
8   a1  b2  a1b2    13
9   a1  b2  a1b2    14
10  a1  b2  a1b2    16
11  a2  b1  a2b1    9
12  a2  b1  a2b1    8
13  a2  b1  a2b1    10
14  a2  b1  a2b1    11
15  a2  b1  a2b1    12
16  a2  b2  a2b2    5
17  a2  b2  a2b2    7
18  a2  b2  a2b2    4
19  a2  b2  a2b2    6
20  a2  b2  a2b2    4")
colnames(dd) <- c('subj', 'A', 'B', 'AB','y')
head(dd)

##   subj  A  B   AB  y
## 1    1 a1 b1 a1b1 10
## 2    2 a1 b1 a1b1 11
## 3    3 a1 b1 a1b1  9
## 4    4 a1 b1 a1b1 12
## 5    5 a1 b1 a1b1 10
## 6    6 a1 b2 a1b2 15

# we set type factor for A, B, and AB
dd$A  <- factor(dd$A)
dd$B  <- factor(dd$B)
dd$AB <- factor(dd$AB)

# graph it
require(ggplot2)

## Lade nötiges Paket: ggplot2

# create a plot object, define dataframe to use, add gender to differ in colour already in base plot
ggplot(dd, x=B, y=y, aes(B, y, fill = A)) +
  stat_summary(fun=mean, geom="bar", position=position_dodge()) +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.9), width=0.2)

# the 'normal' 2*2 way, main effects only
summary(aov(y ~ A + B, data=dd))

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## A            1 130.05  130.05  15.520 0.00106 **
## B            1   0.05    0.05   0.006 0.93933   
## Residuals   17 142.45    8.38                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# main effect B doesn't seem to exist

# the 'normal' way 2*2 Design full model including interaction
summary(aov(y ~ A * B, data=dd))

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1 130.05  130.05  65.025 5.00e-07 ***
## B            1   0.05    0.05   0.025    0.876    
## A:B          1 110.45  110.45  55.225 1.42e-06 ***
## Residuals   16  32.00    2.00                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# we find a significant interaction


# alternatively put as a single factor design
# create a plot object, define dataframe to use, add gender to differ in colour already in base plot
ggplot(dd, x=AB, y=y, aes(AB, y)) +
  stat_summary(fun=mean, geom="bar", position=position_dodge()) +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.9), width=0.2)

require(ez)

## Lade nötiges Paket: ez

# descriptives - the ez way
ezStats(data=dd, dv=.(y), wid=(subj), between=.(AB))

## Warning: Converting "subj" to factor for ANOVA.

## Coefficient covariances computed by hccm()

##     AB N Mean       SD     FLSD
## 1 a1b1 5 10.4 1.140175 1.896101
## 2 a1b2 5 15.0 1.581139 1.896101
## 3 a2b1 5 10.0 1.581139 1.896101
## 4 a2b2 5  5.2 1.303840 1.896101

# grand mean and more descriptives of y
require(psych)

## Lade nötiges Paket: psych

## 
## Attache Paket: 'psych'

## Die folgenden Objekte sind maskiert von 'package:ggplot2':
## 
##     %+%, alpha

## Das folgende Objekt ist maskiert 'package:car':
## 
##     logit

psych::describe(dd$y)

##    vars  n  mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 20 10.15 3.79     10   10.12 3.71   4  17    13 0.01       -1 0.85

require(car)
# standard contrasts may be called directly in aov()
# reference coding, reference group is 1 (a1b1), coefficients are differences to reference group
coefficients(aov(y ~ AB, data=dd, contrasts = list(AB = contr.Treatment)))

## (Intercept)     AB[T.2]     AB[T.3]     AB[T.4] 
##        10.4         4.6        -0.4        -5.2

# effect coding, coefficients are differences referred to grand mean
coefficients(aov(y ~ AB, data=dd, contrasts = list(AB = contr.Sum)))

## (Intercept)     AB[S.1]     AB[S.2]     AB[S.3] 
##       10.15        0.25        4.85       -0.15

# polynomial contrasts
coefficients(aov(y ~ AB, data=dd, contrasts = list(AB = contr.poly)))

## (Intercept)        AB.L        AB.Q        AB.C 
##   10.150000   -4.606300   -4.700000    2.191347

# we may look at the contrast coefficients
contr.treatment(levels(dd$AB), base=3)

##      a1b1 a1b2 a2b2
## a1b1    1    0    0
## a1b2    0    1    0
## a2b1    0    0    0
## a2b2    0    0    1

# contrast names in package(car) are a bit more verbose
contr.Treatment(levels(dd$AB), base=3)

##      [T.a1b1] [T.a1b2] [T.a2b2]
## a1b1        1        0        0
## a1b2        0        1        0
## a2b1        0        0        0
## a2b2        0        0        1

# another way to specify contrasts is to make them properties of factor variables
# specify a reference contrast, take group number 3 (a2b1) as reference group, coefficients are differences to reference group
contrasts(dd$AB) <- contr.treatment(4, base=3)
# or more verbose
contrasts(dd$AB) <- contr.Treatment(4, base=3)

coefficients(aov(y ~ AB, data=dd))

## (Intercept)     AB[T.1]     AB[T.2]     AB[T.4] 
##        10.0         0.4         5.0        -4.8

summary.lm(aov(y ~ AB, data=dd))

## 
## Call:
## aov(formula = y ~ AB, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -2.00  -1.05  -0.10   1.00   2.00 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.0000     0.6325  15.811 3.46e-11 ***
## AB[T.1]       0.4000     0.8944   0.447    0.661    
## AB[T.2]       5.0000     0.8944   5.590 4.06e-05 ***
## AB[T.4]      -4.8000     0.8944  -5.367 6.30e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.414 on 16 degrees of freedom
## Multiple R-squared:  0.8826, Adjusted R-squared:  0.8606 
## F-statistic: 40.09 on 3 and 16 DF,  p-value: 1.141e-07

# special contrasts work in a similar way, defining a contrasts property of a factor
# non orthogonal contrast
# both b1 (a1b1, a2b1) groups vs. group 4 (a2b2)
# both b1 (a1b1, a2b1) groups vs. group 3 (a1b2)
# both b1 (a1b1, a2b1) groups vs. both b2 groups (a1b2, a2b2)
contrasts(dd$AB) <- cbind(c(-1,0,-1,+2), c(-1,2,-2,0), c(-1,1,-1,1))
# check the design matrix
contrasts(dd$AB)

##      [,1] [,2] [,3]
## a1b1   -1   -1   -1
## a1b2    0    2    1
## a2b1   -1   -2   -1
## a2b2    2    0    1

# check the dummy variables
cbind(dd, model.matrix(aov(y ~ AB, data=dd)) )

##    subj  A  B   AB  y (Intercept) AB1 AB2 AB3
## 1     1 a1 b1 a1b1 10           1  -1  -1  -1
## 2     2 a1 b1 a1b1 11           1  -1  -1  -1
## 3     3 a1 b1 a1b1  9           1  -1  -1  -1
## 4     4 a1 b1 a1b1 12           1  -1  -1  -1
## 5     5 a1 b1 a1b1 10           1  -1  -1  -1
## 6     6 a1 b2 a1b2 15           1   0   2   1
## 7     7 a1 b2 a1b2 17           1   0   2   1
## 8     8 a1 b2 a1b2 13           1   0   2   1
## 9     9 a1 b2 a1b2 14           1   0   2   1
## 10   10 a1 b2 a1b2 16           1   0   2   1
## 11   11 a2 b1 a2b1  9           1  -1  -2  -1
## 12   12 a2 b1 a2b1  8           1  -1  -2  -1
## 13   13 a2 b1 a2b1 10           1  -1  -2  -1
## 14   14 a2 b1 a2b1 11           1  -1  -2  -1
## 15   15 a2 b1 a2b1 12           1  -1  -2  -1
## 16   16 a2 b2 a2b2  5           1   2   0   1
## 17   17 a2 b2 a2b2  7           1   2   0   1
## 18   18 a2 b2 a2b2  4           1   2   0   1
## 19   19 a2 b2 a2b2  6           1   2   0   1
## 20   20 a2 b2 a2b2  4           1   2   0   1

# contrasts specified express an effect contrast and coefficients express differences from grand mean
coefficients(aov(y ~ AB, data=dd))

## (Intercept)         AB1         AB2         AB3 
##       10.25       -4.50        0.40        3.95

summary.lm(aov(y ~ AB, data=dd))

## 
## Call:
## aov(formula = y ~ AB, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -2.00  -1.05  -0.10   1.00   2.00 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.2500     0.3873  26.465 1.23e-14 ***
## AB1          -4.5000     1.0000  -4.500 0.000364 ***
## AB2           0.4000     0.8944   0.447 0.660716    
## AB3           3.9500     2.0857   1.894 0.076466 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.414 on 16 degrees of freedom
## Multiple R-squared:  0.8826, Adjusted R-squared:  0.8606 
## F-statistic: 40.09 on 3 and 16 DF,  p-value: 1.141e-07

# check reference contrasts and compare first group (a1b1) with the other ones, non orthogonal contrasts
contrasts(dd$AB) <- cbind(c(0,1,0,0), c(0,0,1,0), c(0,0,0,1))
# check the dummy variables
cbind(dd, model.matrix(aov(y ~ AB, data=dd)) )

##    subj  A  B   AB  y (Intercept) AB1 AB2 AB3
## 1     1 a1 b1 a1b1 10           1   0   0   0
## 2     2 a1 b1 a1b1 11           1   0   0   0
## 3     3 a1 b1 a1b1  9           1   0   0   0
## 4     4 a1 b1 a1b1 12           1   0   0   0
## 5     5 a1 b1 a1b1 10           1   0   0   0
## 6     6 a1 b2 a1b2 15           1   1   0   0
## 7     7 a1 b2 a1b2 17           1   1   0   0
## 8     8 a1 b2 a1b2 13           1   1   0   0
## 9     9 a1 b2 a1b2 14           1   1   0   0
## 10   10 a1 b2 a1b2 16           1   1   0   0
## 11   11 a2 b1 a2b1  9           1   0   1   0
## 12   12 a2 b1 a2b1  8           1   0   1   0
## 13   13 a2 b1 a2b1 10           1   0   1   0
## 14   14 a2 b1 a2b1 11           1   0   1   0
## 15   15 a2 b1 a2b1 12           1   0   1   0
## 16   16 a2 b2 a2b2  5           1   0   0   1
## 17   17 a2 b2 a2b2  7           1   0   0   1
## 18   18 a2 b2 a2b2  4           1   0   0   1
## 19   19 a2 b2 a2b2  6           1   0   0   1
## 20   20 a2 b2 a2b2  4           1   0   0   1

# contrasts specified express an effect contrast and coefficients express differences from reference group (here a1b1)
coefficients(aov(y ~ AB, data=dd))

## (Intercept)         AB1         AB2         AB3 
##        10.4         4.6        -0.4        -5.2

summary.lm(aov(y ~ AB, data=dd))

## 
## Call:
## aov(formula = y ~ AB, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -2.00  -1.05  -0.10   1.00   2.00 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.4000     0.6325  16.444 1.91e-11 ***
## AB1           4.6000     0.8944   5.143 9.82e-05 ***
## AB2          -0.4000     0.8944  -0.447    0.661    
## AB3          -5.2000     0.8944  -5.814 2.64e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.414 on 16 degrees of freedom
## Multiple R-squared:  0.8826, Adjusted R-squared:  0.8606 
## F-statistic: 40.09 on 3 and 16 DF,  p-value: 1.141e-07

Planned Contrasts

We extend the sheet_anova a bit to understand.

subgroups defined by a factor variable ar ordered. contrast coefficients apply to the subgroups in this order. one contrast is one dummy variable. the dummy variables can be put into the regression formula to estimate the reaction variable for a specific subgroup (subgroup combination in factorial designs). a contrast vector applied to a factor variable using the command contrasts() is similar to a column of model.matrix() but ordered due to the factor levels.

Here we use a factor with 4 levels. \[\hat{y}_i = b_0 + b_1 \cdot D_1 + b_2 \cdot D_2 + b_3 \cdot D_3 + b_4 \cdot D_4\]

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ tibble  3.1.6     ✔ dplyr   1.0.8
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ✔ purrr   0.3.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ psych::%+%()       masks ggplot2::%+%()
## ✖ psych::alpha()     masks ggplot2::alpha()
## ✖ dplyr::arrange()   masks plyr::arrange()
## ✖ purrr::compact()   masks plyr::compact()
## ✖ dplyr::count()     masks plyr::count()
## ✖ dplyr::failwith()  masks plyr::failwith()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::id()        masks plyr::id()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ dplyr::mutate()    masks plyr::mutate()
## ✖ dplyr::recode()    masks car::recode()
## ✖ dplyr::rename()    masks plyr::rename()
## ✖ purrr::some()      masks car::some()
## ✖ dplyr::summarise() masks plyr::summarise()
## ✖ dplyr::summarize() masks plyr::summarize()

dd <- read_delim("https://pzezula.pages.gwdg.de/data/Superhero.dat", delim = "\t")

## Rows: 30 Columns: 2

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (2): hero, injury
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# we define a new variable hero.f with factorized categorial levels and label them also
dd$hero.f <- factor(dd$hero,
  levels = c(1:4), labels =
    c("Spiderman", "Superman", "Hulk", "Ninja Turtle")
)
# all default
m.0 <- lm(injury ~ hero.f, dd)
summary.lm(m.0)

## 
## Call:
## lm(formula = injury ~ hero.f, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.333  -8.250   1.688   7.562  24.667 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          41.625      4.577   9.095 1.47e-09 ***
## hero.fSuperman       18.708      6.991   2.676   0.0127 *  
## hero.fHulk           -6.250      6.472  -0.966   0.3431    
## hero.fNinja Turtle  -15.375      6.472  -2.376   0.0252 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.94 on 26 degrees of freedom
## Multiple R-squared:  0.4897, Adjusted R-squared:  0.4308 
## F-statistic: 8.317 on 3 and 26 DF,  p-value: 0.0004828

unique(model.matrix(m.0))

##    (Intercept) hero.fSuperman hero.fHulk hero.fNinja Turtle
## 1            1              0          0                  0
## 9            1              1          0                  0
## 15           1              0          1                  0
## 23           1              0          0                  1

mm <- m.0
g.l <- "Superman"
# mm["(Intercept)"] + 


# the contrast of task 9 of anova sheet, only one contrast defined, the other two are set automatically and orthogonal to the first one
contrasts(dd$hero.f) <- c(1, -2, 1, 0)
m.1 <- lm(injury ~ hero.f, dd)
unique(model.matrix(m.1))

##    (Intercept) hero.f1     hero.f2    hero.f3
## 1            1       1 -0.62659863 -0.4367007
## 9            1      -2  0.06329932 -0.2816497
## 15           1       1  0.75319726 -0.1265986
## 23           1       0 -0.18989795  0.8449490

# a contrast
contrasts(dd$hero.f) <- cbind(c(3,-1,-1,-1), c(0, 2,-1,-1), c(0, 0, 1, -1) )
m.2 <- lm(injury ~ hero.f, dd)
unique(model.matrix(m.2))

##    (Intercept) hero.f1 hero.f2 hero.f3
## 1            1       3       0       0
## 9            1      -1       2       0
## 15           1      -1      -1       1
## 23           1      -1      -1      -1

# we double the coefficients
contrasts(dd$hero.f) <- cbind(c(6,-2,-2,-2), c(0, 4,-2,-2), c(0, 0, 2, -2) )
m.3 <- lm(injury ~ hero.f, dd)
unique(model.matrix(m.3))

##    (Intercept) hero.f1 hero.f2 hero.f3
## 1            1       6       0       0
## 9            1      -2       4       0
## 15           1      -2      -2       2
## 23           1      -2      -2      -2

# simply doubling the contrast-coefficients divides the model coefficients by 2
m.2$coefficients

## (Intercept)     hero.f1     hero.f2     hero.f3 
##  40.8958333   0.2430556   9.8402778   4.5625000

m.3$coefficients

## (Intercept)     hero.f1     hero.f2     hero.f3 
##  40.8958333   0.1215278   4.9201389   2.2812500

# the significance tests stay the same
summary(m.2)

## 
## Call:
## lm(formula = injury ~ hero.f, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.333  -8.250   1.688   7.562  24.667 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  40.8958     2.3817  17.171 1.05e-15 ***
## hero.f1       0.2431     1.3394   0.181    0.857    
## hero.f2       9.8403     2.0656   4.764 6.27e-05 ***
## hero.f3       4.5625     3.2361   1.410    0.170    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.94 on 26 degrees of freedom
## Multiple R-squared:  0.4897, Adjusted R-squared:  0.4308 
## F-statistic: 8.317 on 3 and 26 DF,  p-value: 0.0004828

summary(m.3)

## 
## Call:
## lm(formula = injury ~ hero.f, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.333  -8.250   1.688   7.562  24.667 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  40.8958     2.3817  17.171 1.05e-15 ***
## hero.f1       0.1215     0.6697   0.181    0.857    
## hero.f2       4.9201     1.0328   4.764 6.27e-05 ***
## hero.f3       2.2812     1.6181   1.410    0.170    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.94 on 26 degrees of freedom
## Multiple R-squared:  0.4897, Adjusted R-squared:  0.4308 
## F-statistic: 8.317 on 3 and 26 DF,  p-value: 0.0004828

# a standard contrast of package car
contrasts(dd$hero.f) <- car::contr.Sum(levels(dd$hero.f))
m.4 <- lm(injury ~ hero.f, dd)
unique(model.matrix(m.4))

##    (Intercept) hero.f[S.Spiderman] hero.f[S.Superman] hero.f[S.Hulk]
## 1            1                   1                  0              0
## 9            1                   0                  1              0
## 15           1                   0                  0              1
## 23           1                  -1                 -1             -1

# orthogonal contrast Helmert
contrasts(dd$hero.f) <- car::contr.Helmert(levels(dd$hero.f))
m.5 <- lm(injury ~ hero.f, dd)
unique(model.matrix(m.5))

##    (Intercept) hero.f[H.1] hero.f[H.2] hero.f[H.3]
## 1            1          -1          -1          -1
## 9            1           1          -1          -1
## 15           1           0           2          -1
## 23           1           0           0           3

require(Rmisc)
Rmisc::summarySE(data=dd, "injury", groupvars="hero.f")

##         hero.f N   injury        sd       se        ci
## 1    Spiderman 8 41.62500 12.211675 4.317479 10.209216
## 2     Superman 6 60.33333 17.851237 7.287737 18.733724
## 3         Hulk 8 35.37500 13.383759 4.731873 11.189102
## 4 Ninja Turtle 8 26.25000  8.154753 2.883141  6.817544

Beispiele Übungen / Exercises

{}todo

Links und Referenzen

Package library(memisc): Convenience Methods for Setting Contrasts

Erklärung und Verstehbeispiel zu contr.sum, contr.helmert, contr.treatment link

Field (2012) Kapitel comparing means

Sammlung R und Anova, Einstiegspunkte [https://www.r-statistics.com/tag/repeated-measures-anova/]

check

we can apply contrasts to factors
various standard contrasts are predefined
if contrasts are orthogonal, they account for independent parts of variance
we can also define and use our own contrasts

Version: 28 Mai, 2022 16:26

Lineares Modell, kategoriale Prädiktoren, Kontraste (geplante Kontraste)

Rmd