Rmd

Dargestellt werden Varianzanalytische Designs mit Focus auf das Allgemeine Lineare Modell. Es geht um den regressionsanalytischen Ansatz varianzanalytischer Designs.

Hier geht es um mehrfaktorielle varianzanalytische Designs (between subjects models).

Modelle mit Messwiederholung (repeated measures designs, within subjects designs) und gemischte Modelle (mixed models) werden in eigenen Units behandelt.

Beispiel STAI-State Angstgruppe und Geschlecht

Geschlecht und Gruppe: Gibt es Unterschiede hinsichtlich der State-Angst in neutraler Situation (Statistiktutorium)?

Vergleich mit Aufsplittung der Varianzkomponenten

Wir passen ein 2*3-faktorielles Design an. Die Gesamtvarianz des Null-Modells (Residuen) wird durch die Faktoren zunehmend reduziert. Dies sehen wir am besten, wenn wir uns das dahinter liegende Dummy-Modell in seinen Auswirkungen ansehen. Die Quadratsummenanteile der Einzel-Prädiktoren (Dummy-Variablen) addieren sich zu den aggregierten Quadratsummen der Summary-Anova-Tabellen.

# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]

# # descriptive data for current design
# require(psych)
# # global descriptives
# psych::describe(dd)
# # group specific descriptives
# # gender and group
# by(dd$neutr, list(dd$gender           ), psych::describe)
# by(dd$neutr, list(            dd$group), psych::describe)
# # combination of groups
# by(dd$neutr, list(dd$gender, dd$group), psych::describe)
# # or using stat.desc() of library(pastecs)
# require(pastecs)
# by(dd$neutr, list(dd$gender, dd$group), pastecs::stat.desc)

# descriptives using summarySE
require(Rmisc)

## Lade nötiges Paket: Rmisc

## Lade nötiges Paket: lattice

## Lade nötiges Paket: plyr

Rmisc::summarySE(data=dd, "neutr")

##    .id   N    neutr       sd      se      ci
## 1 <NA> 180 43.84444 14.96694 1.11557 2.20136

Rmisc::summarySE(data=dd, "neutr", groupvars="group")

##   group  N    neutr       sd       se       ci
## 1    av 60 43.48333 12.34324 1.593506 3.188598
## 2    hi 60 54.51667 13.18897 1.702688 3.407072
## 3    lo 60 33.53333 11.36821 1.467629 2.936720

Rmisc::summarySE(data=dd, "neutr", groupvars="gender")

##   gender  N    neutr       sd       se       ci
## 1      f 90 47.47778 16.97005 1.788800 3.554308
## 2      m 90 40.21111 11.65563 1.228612 2.441225

Rmisc::summarySE(data=dd, "neutr", groupvars=c("group", "gender"))

##   group gender  N    neutr       sd       se       ci
## 1    av      f 30 48.10000 12.64461 2.308580 4.721576
## 2    av      m 30 38.86667 10.29474 1.879553 3.844118
## 3    hi      f 30 61.16667 12.60565 2.301465 4.707025
## 4    hi      m 30 47.86667 10.17344 1.857407 3.798824
## 5    lo      f 30 33.16667 12.60565 2.301465 4.707025
## 6    lo      m 30 33.90000 10.18569 1.859644 3.803399

# we adapt a linear model
m.anx <- lm(neutr ~ gender + group, data=dd)
m.anx <- lm(neutr ~ gender*group, data=dd)
# model summary gives us model test and parameter tests
summary(m.anx)

## 
## Call:
## lm(formula = neutr ~ gender * group, data = dd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.1667  -6.9000   0.4833   6.8500  27.9000 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       48.100      2.096  22.946  < 2e-16 ***
## genderm           -9.233      2.964  -3.115  0.00215 ** 
## grouphi           13.067      2.964   4.408 1.82e-05 ***
## grouplo          -14.933      2.964  -5.037 1.17e-06 ***
## genderm:grouphi   -4.067      4.192  -0.970  0.33339    
## genderm:grouplo    9.967      4.192   2.377  0.01852 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.48 on 174 degrees of freedom
## Multiple R-squared:  0.428,  Adjusted R-squared:  0.4115 
## F-statistic: 26.04 on 5 and 174 DF,  p-value: < 2.2e-16

# we can get the overall sums of squares SS via anova()
anova(m.anx)

## Analysis of Variance Table
## 
## Response: neutr
##               Df  Sum Sq Mean Sq F value    Pr(>F)    
## gender         1  2376.2  2376.2 18.0261  3.54e-05 ***
## group          2 13220.7  6610.4 50.1470 < 2.2e-16 ***
## gender:group   2  1564.0   782.0  5.9325  0.003218 ** 
## Residuals    174 22936.7   131.8                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ... or via aov
# aov(neutr ~ gender*group, data=dd)
summary(aov(m.anx))

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## gender         1   2376    2376  18.026 3.54e-05 ***
## group          2  13221    6610  50.147  < 2e-16 ***
## gender:group   2   1564     782   5.932  0.00322 ** 
## Residuals    174  22937     132                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

unique(model.matrix(aov(m.anx)))

##     (Intercept) genderm grouphi grouplo genderm:grouphi genderm:grouplo
## 1             1       0       0       0               0               0
## 31            1       1       0       0               0               0
## 61            1       0       0       1               0               0
## 91            1       1       0       1               0               1
## 121           1       0       1       0               0               0
## 151           1       1       1       0               1               0

# if we want the SS of every single dummy variable, we can do an explicit dummy variable model
# so we add dummy variables to the data frame
dd.t <- cbind(dd, model.matrix(m.anx))
# and adapt the above model using the dummy variables
m.anx.d <- lm(neutr ~ genderm + grouphi + grouplo + genderm:grouphi + genderm:grouplo, data=dd.t)
# SS of the above anova table is the sum of dummy SS of the effect in question
anova(m.anx.d)

## Analysis of Variance Table
## 
## Response: neutr
##                  Df  Sum Sq Mean Sq F value    Pr(>F)    
## genderm           1  2376.2  2376.2 18.0261 3.540e-05 ***
## grouphi           1 10250.7 10250.7 77.7627 1.181e-15 ***
## grouplo           1  2970.1  2970.1 22.5313 4.298e-06 ***
## genderm:grouphi   1   819.0   819.0  6.2132   0.01362 *  
## genderm:grouplo   1   745.0   745.0  5.6517   0.01852 *  
## Residuals       174 22936.7   131.8                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# package(car) is required. It has 'Anova()'. The capital 'A' is significant!
require(car)

## Lade nötiges Paket: car

## Lade nötiges Paket: carData

# model test SS type II is default for car::Anova
Anova(m.anx)

## Anova Table (Type II tests)
## 
## Response: neutr
##               Sum Sq  Df F value    Pr(>F)    
## gender        2376.2   1 18.0261  3.54e-05 ***
## group        13220.7   2 50.1470 < 2.2e-16 ***
## gender:group  1564.0   2  5.9325  0.003218 ** 
## Residuals    22936.7 174                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# model test SS type III, only unique contribution of variance is used, Sum Sq of the effects is lowered for all effects but highest interaction
Anova(m.anx, type=3)

## Anova Table (Type III tests)
## 
## Response: neutr
##              Sum Sq  Df  F value    Pr(>F)    
## (Intercept)   69408   1 526.5388 < 2.2e-16 ***
## gender         1279   1   9.7012  0.002154 ** 
## group         11777   2  44.6724 < 2.2e-16 ***
## gender:group   1564   2   5.9325  0.003218 ** 
## Residuals     22937 174                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# parameters in detail
summary(m.anx)

## 
## Call:
## lm(formula = neutr ~ gender * group, data = dd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.1667  -6.9000   0.4833   6.8500  27.9000 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       48.100      2.096  22.946  < 2e-16 ***
## genderm           -9.233      2.964  -3.115  0.00215 ** 
## grouphi           13.067      2.964   4.408 1.82e-05 ***
## grouplo          -14.933      2.964  -5.037 1.17e-06 ***
## genderm:grouphi   -4.067      4.192  -0.970  0.33339    
## genderm:grouplo    9.967      4.192   2.377  0.01852 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.48 on 174 degrees of freedom
## Multiple R-squared:  0.428,  Adjusted R-squared:  0.4115 
## F-statistic: 26.04 on 5 and 174 DF,  p-value: < 2.2e-16

# SS sum up to zero model residuals
m.0 <- lm(neutr ~ 1, data=dd)
# zero model SS
summary(aov(m.0))

##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   179  40098     224

m.a <- anova(m.anx.d)
# we can access the SS 
m.a$`Sum Sq`

## [1]  2376.2000 10250.6694  2970.0750   819.0250   745.0083 22936.6667

# and get the sum
sum(m.a$`Sum Sq`)

## [1] 40097.64

# we can use coefficients to get at group means, that are best estimates for response variable.
# women of group 'av'
m.anx$coefficients['(Intercept)']

## (Intercept) 
##        48.1

# men of group 'hi'
m.anx$coefficients['(Intercept)']  + m.anx$coefficients['genderm'] + m.anx$coefficients['grouphi'] + m.anx$coefficients['genderm:grouphi']

## (Intercept) 
##    47.86667

# women of group 'lo'
m.anx$coefficients['(Intercept)']  + m.anx$coefficients['grouplo']

## (Intercept) 
##    33.16667

# men of group 'lo'
m.anx$coefficients['(Intercept)']  + m.anx$coefficients['genderm'] + m.anx$coefficients['grouplo'] + m.anx$coefficients['genderm:grouplo']

## (Intercept) 
##        33.9

# descriptives (as Rcmdr does it)
# means
tapply(dd$neutr, list(gender=dd$gender, group=dd$group), mean, na.rm=TRUE)

##       group
## gender       av       hi       lo
##      f 48.10000 61.16667 33.16667
##      m 38.86667 47.86667 33.90000

# std. deviations
# tapply(dd$neutr, list(gender=dd$gender, group=dd$group), sd, na.rm=TRUE)
# counts
# tapply(dd$neutr, list(gender=dd$gender, group=dd$group), function(x) sum(!is.na(x)))

lm() passt ein ANOVA-Modell an, wenn die Prädiktoren (Vorhersagevariablen) vom Typ factor sind. Die hierfür ev. notwendige dummy-Kodierung wird im Hintergrund ausgeführt. Default ist dabei eine Referenz-Kodierung. Die Referenzgruppe ist dabei die Faktorstufe, deren Name im Alphabet vorne liegt. Beachte: Faktorstufen eines Faktors haben eine Ordnung. Default ist alphabetisch nach dem Faktor-Level. vgl. transformation.html

In unserem Beispiel wird die Ausprägung ‘f’ (‘f’ < ‘m’) des Faktors gender in Kombination mit der Ausprägung ‘av’ (‘av’ < ‘hi’ < ‘lo’) des Faktors group zur Referenzgruppe.

Interaktion und ihre grafische Darstellung

Wir generieren eine graphische Darstellung. Nach einem Plot zur Visualisierung der Effekte ergänzen wir den Originalplot mit einer Visualisierung der Koeffizienten als Unterschiede der Mittelwerte in Abhängigkeit von den varianzanalytischen Effekten.

# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
#psych::describe(dd)
# package(car) is required. It has 'Anova()'. The capital 'A' is significant!
require(car)
require(effects)

## Lade nötiges Paket: effects

## Use the command
##     lattice::trellis.par.set(effectsTheme())
##   to customize lattice options for effects plots.
## See ?effectsTheme for details.

#m.anx <- (lm(neutr ~ gender*group, data=dd))
m.anx <- (lm(neutr ~ gender*group, data=dd))
# dummy variables
unique(model.matrix(m.anx))

##     (Intercept) genderm grouphi grouplo genderm:grouphi genderm:grouplo
## 1             1       0       0       0               0               0
## 31            1       1       0       0               0               0
## 61            1       0       0       1               0               0
## 91            1       1       0       1               0               1
## 121           1       0       1       0               0               0
## 151           1       1       1       0               1               0

# sum of squares only
aov(m.anx)

## Call:
##    aov(formula = m.anx)
## 
## Terms:
##                    gender     group gender:group Residuals
## Sum of Squares   2376.200 13220.744     1564.033 22936.667
## Deg. of Freedom         1         2            2       174
## 
## Residual standard error: 11.48129
## Estimated effects may be unbalanced

# SS and omnibus test of effects with SS type III
car::Anova(m.anx, type=3)

## Anova Table (Type III tests)
## 
## Response: neutr
##              Sum Sq  Df  F value    Pr(>F)    
## (Intercept)   69408   1 526.5388 < 2.2e-16 ***
## gender         1279   1   9.7012  0.002154 ** 
## group         11777   2  44.6724 < 2.2e-16 ***
## gender:group   1564   2   5.9325  0.003218 ** 
## Residuals     22937 174                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# model summary with all dummy effects tested
summary(m.anx)

## 
## Call:
## lm(formula = neutr ~ gender * group, data = dd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.1667  -6.9000   0.4833   6.8500  27.9000 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       48.100      2.096  22.946  < 2e-16 ***
## genderm           -9.233      2.964  -3.115  0.00215 ** 
## grouphi           13.067      2.964   4.408 1.82e-05 ***
## grouplo          -14.933      2.964  -5.037 1.17e-06 ***
## genderm:grouphi   -4.067      4.192  -0.970  0.33339    
## genderm:grouplo    9.967      4.192   2.377  0.01852 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.48 on 174 degrees of freedom
## Multiple R-squared:  0.428,  Adjusted R-squared:  0.4115 
## F-statistic: 26.04 on 5 and 174 DF,  p-value: < 2.2e-16

# descriptives (as Rcmdr does it)
tapply(dd$neutr, list(gender=dd$gender, group=dd$group), mean, na.rm=TRUE) # means

##       group
## gender       av       hi       lo
##      f 48.10000 61.16667 33.16667
##      m 38.86667 47.86667 33.90000

tapply(dd$neutr, list(gender=dd$gender, group=dd$group), sd, na.rm=TRUE)

##       group
## gender       av       hi       lo
##      f 12.64461 12.60565 12.60565
##      m 10.29474 10.17344 10.18569

# std. deviations
tapply(dd$neutr, list(gender=dd$gender, group=dd$group), function(x) sum(!is.na(x))) # counts

##       group
## gender av hi lo
##      f 30 30 30
##      m 30 30 30

# the built in possibility
plot(allEffects(m.anx), ask=FALSE)

# using ggplot and custom colors
require(ggplot2)

## Lade nötiges Paket: ggplot2

# colors by rgb values
c_colors <- c("#336600", "#660033")
c_colors <- c("lavenderblush","lightblue")
pplot <- ggplot(dd, x=group, y=neutr, aes(group, neutr, fill = gender)) +
  # set colors used for bars
  scale_fill_manual(values=c_colors) +
  # partially overlaping bars
  stat_summary(fun.y=mean, geom="bar", position=position_dodge(width=0.7)) +
  # add error information
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) +
  # put points on top of bars
  stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
  # add verbose axis labels
  labs(x = "Anxiety-Group", y = "STAI-Trait")

## Warning: `fun.y` is deprecated. Use `fun` instead.

# show it
pplot

# we add the coefficients meaning
pplot +
  # we add some helper lines
  geom_hline(yintercept = coefficients(m.anx)[1], linetype="dotted") +
  geom_hline(yintercept = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm'], linetype="dotted") +
  # effect genderm 
  geom_segment(aes(x = 1.1, 
    y= m.anx$coefficients['(Intercept)'], 
    xend = 1.1, yend = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm']), 
    arrow = arrow(length = unit(0.2, "cm")), col='red') +
  geom_text(x=1.15, y = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm'] / 2, label="genderm", color="red", hjust=0) +
  # effect grouphi
  geom_segment(aes(x = 1.9, 
    y= m.anx$coefficients['(Intercept)'], 
    xend = 1.9, yend = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['grouphi']), 
    arrow = arrow(length = unit(0.2, "cm")), col='green') +
  geom_segment(aes(x = 2, y= m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm'], 
      xend = 2, 
      yend = m.anx$coefficients['(Intercept)']  + m.anx$coefficients['genderm'] + m.anx$coefficients['grouphi']), 
      arrow = arrow(length = unit(0.2, "cm")), col='green') +
  geom_text(x=2, y = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['grouphi'] / 2, label="grouphi", color="green", hjust=0) +
  # effect grouplo
  geom_segment(aes(x = 2.9, 
    y= m.anx$coefficients['(Intercept)'], 
    xend = 2.9, 
    yend = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['grouplo']), 
    arrow = arrow(length = unit(0.2, "cm")), col='green') +
  geom_segment(aes(x = 3.0, 
    y= m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm'], 
    xend = 3.0, 
    yend = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm'] + m.anx$coefficients['grouplo']), 
    arrow = arrow(length = unit(0.2, "cm")), col='green') +
  geom_text(x=3, y = m.anx$coefficients['(Intercept)'] + m.anx$coefficients['grouplo'] / 2, label="grouplo", color="green", hjust=0) +
  # effect genderm:grouphi
  geom_segment(aes(x = 2.1, y= m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm'] + m.anx$coefficients['grouphi'], 
    xend = 2.1, 
    yend = m.anx$coefficients['(Intercept)']  + m.anx$coefficients['genderm'] + m.anx$coefficients['grouphi'] + m.anx$coefficients['genderm:grouphi']), 
    arrow = arrow(length = unit(0.2, "cm")), col='blue') +
  geom_text(x=2.2, 
    y = m.anx$coefficients['(Intercept)']  + m.anx$coefficients['genderm'] + m.anx$coefficients['grouphi'] + m.anx$coefficients['genderm:grouphi'] / 2, 
    label="genderm:grouplo", color="blue", hjust=0)  +
  # effect genderm:grouplo
  geom_segment(aes(x = 3.1, 
    y= m.anx$coefficients['(Intercept)'] + m.anx$coefficients['genderm'] + m.anx$coefficients['grouplo'], 
    xend = 3.1, 
    yend = m.anx$coefficients['(Intercept)']  + m.anx$coefficients['genderm'] + m.anx$coefficients['grouplo'] + m.anx$coefficients['genderm:grouplo']), 
    arrow = arrow(length = unit(0.2, "cm")), col='blue') +
  geom_text(x=3.1, 
    y =  m.anx$coefficients['(Intercept)']  + m.anx$coefficients['genderm'] + m.anx$coefficients['grouplo'] + m.anx$coefficients['genderm:grouplo'] / 2.5, 
    label="genderm:grouplo", color="blue", hjust=1)

# the model summary gives us the coefficients visualized
summary(m.anx)

## 
## Call:
## lm(formula = neutr ~ gender * group, data = dd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.1667  -6.9000   0.4833   6.8500  27.9000 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       48.100      2.096  22.946  < 2e-16 ***
## genderm           -9.233      2.964  -3.115  0.00215 ** 
## grouphi           13.067      2.964   4.408 1.82e-05 ***
## grouplo          -14.933      2.964  -5.037 1.17e-06 ***
## genderm:grouphi   -4.067      4.192  -0.970  0.33339    
## genderm:grouplo    9.967      4.192   2.377  0.01852 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.48 on 174 degrees of freedom
## Multiple R-squared:  0.428,  Adjusted R-squared:  0.4115 
## F-statistic: 26.04 on 5 and 174 DF,  p-value: < 2.2e-16

# descriptives again
require(Rmisc)
summarySE(data=dd, "neutr", c("group", "gender"))

##   group gender  N    neutr       sd       se       ci
## 1    av      f 30 48.10000 12.64461 2.308580 4.721576
## 2    av      m 30 38.86667 10.29474 1.879553 3.844118
## 3    hi      f 30 61.16667 12.60565 2.301465 4.707025
## 4    hi      m 30 47.86667 10.17344 1.857407 3.798824
## 5    lo      f 30 33.16667 12.60565 2.301465 4.707025
## 6    lo      m 30 33.90000 10.18569 1.859644 3.803399

detach("package:Rmisc", unload=TRUE)

coefficients(m.anx)

##     (Intercept)         genderm         grouphi         grouplo genderm:grouphi 
##       48.100000       -9.233333       13.066667      -14.933333       -4.066667 
## genderm:grouplo 
##        9.966667

Grafik in Schwarz-Weiß

# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
require(ggplot2)
ggplot(dd, aes(x=group, y=neutr, fill=gender)) +
  # define colors
  scale_fill_manual(values=c("#CCCCCC","#FFFFFF")) +
  # create white background etc
  #theme_bw() +
  stat_summary(fun.y=mean, geom="bar", position=position_dodge(width=0.7), colour='black') +   # position=position_dodge(width=0.7) manipulates the overlap of the bars
  # add error information
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) + # position=position_dodge(width=0.7) puts error bars due to the overlap
  # put big points on top of bars
  stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) + # position=position_dodge(width=0.7) puts mean points due to the overlap
  # add verbose axis labels
  labs(x = "Anxiety-Group", y = "STAI-Trait")

## Warning: `fun.y` is deprecated. Use `fun` instead.

Liniengrafik

# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
require(ggplot2)

ggplot(dd, aes(x=group, y=neutr, fill=gender)) +
  # define colors
  scale_fill_manual(values=c("#CCCCCC","#FFFFFF")) +
  # create white background etc
  #theme_bw() +
  stat_summary(fun.y=mean,  aes(x=group, group=gender), geom="line", position=position_dodge(width=0.7)) +
  # add error information
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) +
  # put big points on top of bars
  stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
  # add verbose axis labels
  labs(x = "Anxiety-Group", y = "STAI-Trait")

## Warning: `fun.y` is deprecated. Use `fun` instead.

Ansatz mit ezANOVA

ezANOVA() bietet einen einfachen (eazy) Zugang zu varianzanalytischen Designs in R. Die wichtigsten Voraussetzungsprüfungen sind Default. Komfortable Visualisierungen sind mit ezPlot möglich.

# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
# descriptive data for current design
require(ez)

## Lade nötiges Paket: ez

m.anx <- ezANOVA(dd, dv=neutr,  wid=subj, between = .(gender,  group))

## Warning: Converting "subj" to factor for ANOVA.

## Warning: Converting "gender" to factor for ANOVA.

## Warning: Converting "group" to factor for ANOVA.

## Coefficient covariances computed by hccm()

m.anx

## $ANOVA
##         Effect DFn DFd         F            p p<.05        ges
## 1       gender   1 174 18.026107 3.539854e-05     * 0.09387321
## 2        group   2 174 50.146989 6.352854e-18     * 0.36564411
## 3 gender:group   2 174  5.932462 3.218159e-03     * 0.06383627
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd  SSn    SSd         F        p p<.05
## 1   5 174 77.6 8744.7 0.3088133 0.907175

ezPlot(dd, dv=neutr,  x=group, wid=subj, between = .(gender,  group), split = gender)

## Warning: Converting "subj" to factor for ANOVA.

## Warning: Converting "gender" to factor for ANOVA.

## Warning: Converting "group" to factor for ANOVA.

## Coefficient covariances computed by hccm()

Simple Effects Ansatz

Für jede Stufe eines Faktors werden die Unterschiede auf einem anderen Faktor berechnet und getestet. Alpha Akkumulation ist hier ein Thema.

Ausflug: R-Commander:

Levine Test auf Varianzenhomogenität in den Subgruppen: ‘Statistik | Varianzen | Levene-Test’

Modell: ‘Statistik | Mittelwerte vergleichen | Mehrfaktorielle Varianzanalyse’

Quadratsummentyp festlegen in R-Commander: Modell berechnen und auswählen. Dann: ‘Modelle | Hypothesentests | ANOVA table’

Visualisierung: ‘Modelle | Grafiken | Effect plots’

alternativ über ggplot-Plugin KMggplot2 ‘KMggplot2 | Box plot …’

line chart geht hier nicht oder nur über Umwege, weil stetige x-Variable fehlt

Beispiele, Übungen / Exercises

Beispiel Everitt Mini-Datensatz

Im Beispieldatensatz sind die Faktoren A und B bereits effektkodiert bzw. kontrast-kodiert und können so direkt als Dummy-Variablen benutzt werden, um einen regressionsanalytischen Ansatz zu rechnen.

balanced design

Differenz der Regressions SumSq zwischen 1-Prädiktor-Modell und 2-Prädiktor-Modell ist SumSq für Faktor B in Anova-Modell Differenz der Regressions SumSq zwischen 2-Prädiktor-Modell und 3-Prädiktor-Modell (Interaktion) ist SumSq für Interaktionsterm in Anova-Modell

Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren Faktoren im balancieren Modell sind unabhängig (orthogonal) Aufnahmereihenfolge der Faktoren (incl. WW) spielt keine Rolle

die konstante Gesamt-Quadratsumme wird auf immer mehr Effekte aufgeteilt Residual-Quadratsumme nimmt immer mehr ab.

# read datafiles
va.2x2.bal   <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/be/e-2x2-bal.txt")
attach(va.2x2.bal)

# descriptive data
require(psych)

## Lade nötiges Paket: psych

## 
## Attache Paket: 'psych'

## Die folgenden Objekte sind maskiert von 'package:ggplot2':
## 
##     %+%, alpha

## Das folgende Objekt ist maskiert 'package:car':
## 
##     logit

by(va.2x2.bal$y, list(A, B), describe)

## : -1
## : -1
##    vars n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 4 37.5 2.08   37.5    37.5 2.22  35  40     5    0    -1.96 1.04
## ------------------------------------------------------------ 
## : 1
## : -1
##    vars n  mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 4 29.75 2.63   30.5   29.75 1.48  26  32     6 -0.54    -1.82 1.31
## ------------------------------------------------------------ 
## : -1
## : 1
##    vars n  mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 4 21.75 0.96   21.5   21.75 0.74  21  23     2 0.32    -2.08 0.48
## ------------------------------------------------------------ 
## : 1
## : 1
##    vars n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 4   26 2.58     26      26 2.97  23  29     6    0    -2.08 1.29

# fit regression with explanatory variable A
m.A <- lm(y ~ A)
summary(m.A)

## 
## Call:
## lm(formula = y ~ A)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.625 -5.312  0.125  4.438 10.375 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   28.750      1.610  17.855 4.97e-11 ***
## A             -0.875      1.610  -0.543    0.595    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.441 on 14 degrees of freedom
## Multiple R-squared:  0.02066,    Adjusted R-squared:  -0.0493 
## F-statistic: 0.2953 on 1 and 14 DF,  p-value: 0.5954

anova(m.A)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value Pr(>F)
## A          1  12.25  12.250  0.2953 0.5954
## Residuals 14 580.75  41.482

# add B
m.A.B <- lm(y ~ A + B)
m.B.A <- lm(y ~ B + A)
summary(m.A.B)

## 
## Call:
## lm(formula = y ~ A + B)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.750 -2.750 -0.375  2.750  6.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  28.7500     0.9818  29.283 2.96e-13 ***
## A            -0.8750     0.9818  -0.891 0.389011    
## B            -4.8750     0.9818  -4.965 0.000258 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6099 
## F-statistic: 12.72 on 2 and 13 DF,  p-value: 0.0008687

anova(m.A.B)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## A          1  12.25   12.25  0.7943 0.3890110    
## B          1 380.25  380.25 24.6546 0.0002583 ***
## Residuals 13 200.50   15.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# difference in SumSq between m.A and m.A.B is SumSq for factor B
# add interaction
m.A.B.AB <- lm(y ~ A + B + A:B)
summary(m.A.B.AB)

## 
## Call:
## lm(formula = y ~ A + B + A:B)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7500 -0.8125  0.2500  1.2500  3.0000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  28.7500     0.5425  52.999 1.34e-15 ***
## A            -0.8750     0.5425  -1.613  0.13272    
## B            -4.8750     0.5425  -8.987 1.12e-06 ***
## A:B           3.0000     0.5425   5.530  0.00013 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.17 on 12 degrees of freedom
## Multiple R-squared:  0.9047, Adjusted R-squared:  0.8809 
## F-statistic: 37.98 on 3 and 12 DF,  p-value: 2.102e-06

anova(m.A.B.AB)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## A          1  12.25   12.25  2.6018 0.1327167    
## B          1 380.25  380.25 80.7611 1.121e-06 ***
## A:B        1 144.00  144.00 30.5841 0.0001298 ***
## Residuals 12  56.50    4.71                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# difference SumSq of regression between 2 (A, B) and 3 (A, B, A:B) predictor model is SumSq for interaction
# SumSq der Residuen im 3-Prädiktor-Modell ist SumSq error-Quadratsumme im Anova-Modell
# SS resid in 3 predictor model is SS error in Anova (residuals)
# regression weights (coefficients) don't change when new explanatory variables are entered in model
# Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren
# factors and their effects are orthogonal
# sequence of entered explanatory variables doesn't matter
summary(m.A.B)

## 
## Call:
## lm(formula = y ~ A + B)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.750 -2.750 -0.375  2.750  6.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  28.7500     0.9818  29.283 2.96e-13 ***
## A            -0.8750     0.9818  -0.891 0.389011    
## B            -4.8750     0.9818  -4.965 0.000258 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6099 
## F-statistic: 12.72 on 2 and 13 DF,  p-value: 0.0008687

summary(m.B.A)

## 
## Call:
## lm(formula = y ~ B + A)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.750 -2.750 -0.375  2.750  6.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  28.7500     0.9818  29.283 2.96e-13 ***
## B            -4.8750     0.9818  -4.965 0.000258 ***
## A            -0.8750     0.9818  -0.891 0.389011    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6099 
## F-statistic: 12.72 on 2 and 13 DF,  p-value: 0.0008687

anova(m.A.B)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## A          1  12.25   12.25  0.7943 0.3890110    
## B          1 380.25  380.25 24.6546 0.0002583 ***
## Residuals 13 200.50   15.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.B.A)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## B          1 380.25  380.25 24.6546 0.0002583 ***
## A          1  12.25   12.25  0.7943 0.3890110    
## Residuals 13 200.50   15.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# the constant total SS is divided in more and more effects
# residual SS diminishes with every additional effect entered in the model

detach(va.2x2.bal)

Beispiel Aktivation und Leistung unter Druck

2-faktoriell balanced, 2 x 3 Faktorstufen

Szenario: Aktivation und Leistung unter Druck in bei Sensation-Seekern und Controls.

generierte Daten, 2 x 3 Design (A, B), 30 Beobachtungen pro Gruppe

Faktor A: Druck in 3 Stufen, je höher die Stufe, desto höher der Druck.

Faktor B: zwei Gruppen: Sensation-Seeker (B=1) vs. Kontrollgruppe (B=2)

sseek  pressure dsseek dpressure1 dpressure2 dsseek_pressure1 dsseek_pressure2
y=1    no =0    [ 1]   [ 1]       [ 0]
c=2    med=1    [-1]   [ 0]       [ 1]
       hi= 2           [-1]       [-1]

1      0          1      1          0         1                0     # sseek   no
1      1          1      0          1         0                1     # sseek  med
1      2          1     -1         -1        -1               -1     # sseek   hi
2      0         -1      1          0        -1                0     # control no
2      1         -1      0          1         0               -1     # control med 
2      2         -1     -1         -1         1                1     # control hi

Leistung in einer Prüfung (Werte von 0 - 100, je höher desto höher die Leistung).

Daten (Text, R):

[https://md.psych.bio.uni-goettingen.de/data/virt/v-2x3-bal.txt]

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v-2x3-bal.txt")

Lösungsansatz: Dummy-Codierung der Gruppenzugehörigkeit. Dichotome Variablen können behandelt werden wie intervallskalierte. Aufsplittung von mehr als zwei Faktorstufen in (Anzahl-der-Faktorstufen - 1) binäre Dummy-Variablen. Hier demonstriert am Beispiel der Effektcodierung (vgl. Dummy-Kodierungs-Seite).

Beim Lösungsansatz Regression mit Dummy-Kodierung können mit den Koeffizienten die Gruppenmittelwerte berechnet werden. Die jeweiligen Gruppenmittelwerte sind die Vorhersage für eine Person mit einer bestimmten Faktorstufenkombination. (Intercept) 57.0918
dsseek -8.7570 Effekt Sensation-Seeker zu sein oder nicht.Sensation-Seeker zu sein vermindert die Leistung um durchschnittlich 8.76 Leistungspunkte dpressure1 -13.3728 Effekt bei mildem Druck dpressure2 10.9439 Effekt bei starkem Druck dsseek_pressure1 3.3425 Interaktion Sensation-Seeker und milder Druck dsseek_pressure2 -11.8567 Interaktion Sensation-Seeker und starker Druck

Interpretation der Koeffizienten: Bei Effektkodierung entspricht der Intercept dem Gesamtmittelwert (grand mean). Die einzelnen (Teil-)Effekte werden auf ihn aufaddiert.

Vorhersage sseek no-pressure

intercept + 1 dsseek + 1 dpressure1 + 0 dpressure2 + 1 dsseek_pressure1 + 0 dsseek_pressure2

57.09 + (-8.75) + (-13.37) + 3.34 = 38.31

Vorhersage control, hoher Druck

intercept + (-1) dsseek + (-1) dpressure1 + (-1) dpressure2 + 1 dsseek_pressure1 + 1 dsseek_pressure2

57.09 + 8.75 + 13.37 - 10.94 + 3.34 - 11.86 = 59.75

Beispiel Conway

Telefonieren und Autofahren

Quelle: [https://www.youtube.com/watch?v=bd2VaB1iWnA]

Faktoren (UV): - Schwierigkeit der Fahrsituation (gering, hoch) der zu fahrende Kurs im Simulator - Schwierigkeit der Konversation (keine Konv., geringe, hohe) keinerlei Telefongespräch geringe Konversationsschwierigkeit: wiederholen, was am Telefon gehört wird hohe Konversationsschwierigkeit: ein Wort ausdenken und nennen, das mit dem letzten Buchstaben des letzten Wortes des am Telefon Gesagten beginnt.

2*3 faktorielles (varianzanalytisches) Design

Ergebnisvariable (AV): - Fehler im Fahrsimulator

Daten: Strayer & Johnson (2001)

Graph: x: Konversationsschwierigkeit, Fahrschwierigkeit als Balkenpaar

Es werden 3 F-Ratios berechnet. Verhältnis systematischer Varianz zu unsystematischer Varianz ()

Haupteffekte: Der Effekt eines Faktors wenn über die Faktorstufen des (der) anderen Faktors (Faktoren) gemittelt wird. Interaktionen: Die Wirkung eines Faktors hängt ab von der Ausprägung eines anderen Faktors. Einfacher Effekt: Die (sig.) Wirkung eines Faktors bei einer bestimmten Ausprägung eines anderen Faktors.

Orthogonalität der Effekte (bei gleicher Zellbesetzung) by design. Quadratsummen (Varianzanteile) in faktoriellen Designs sind additiv. Alle Prädiktoren sind hier orthogonal.

SS_s/AB bedeutet: Vpn (s) sind abhängig von der Zuordnung hinsichtlich zweier Gruppierungsvariablen/Faktoren (A und B)

Error Term ist für alle Vergleiche derselbe. Effektgröße: Eta^2 - komplett: SS_effect/SS_total - partiell: SS_effect/(SS_effect + SS_s/AB) hier werden die anderen systematischen Effekte auspartialisiert

Voraussetzungen: - stetige Antwortvariable (AV) - normalverteilte Antwortvariable - Homogenität der Varianzen (z. B. Levintest)

Wenn man die Effektgrößen pro Faktorstufe Konversationsschwierigkeiten anschaut, erkennt man, dass der Effekt der Fahr-Schwierigkeit stark von der Konversationsschwierigkeit abhängt.

Inhaltlicher Nachtrag: Textnachrichten schreiben ist noch deutlich gefährlicher als telefonieren.

Rechenbeispiel:

Beispiel phonological similarity effect in simple and complex span tasks

https://www.youtube.com/watch?v=T_7XuaFYfLs Daten:

Links und Referenzen

Field (2012, p. 245) Kapitel 7: Regression

check

interaction or moderation multiplication of predictive variables to get an additional predictive variable
effect of group membership in one factor depends on group membership in other factor(s)
visualization of parameters interaction effects included graphical representation

Screencast

Multiple Regression - check StudIP - ownCloud

Version: 02 Juni, 2022 09:40

Lineares Modell, kategoriale Prädiktoren, faktorielle varianzanalytische Modelle

Analysis of Variance als Regressionsmodell

Peter Zezula

Rmd