Rmd

Konzeptionelles

(Aus-)Balancierte varianzanalytische Designs sind Designs mit gleicher Zellbesetzung. Die Effekte sind dann unabhängig voneinander. Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren. Die Aufnahmereihenfolge der Prädiktoren (Faktoren) in das Modell spielt keine Rolle, auch nicht bei Quadratsummenbildung vom Typ I.

Unbalancierte varianzanalytische Designs haben ungleiche Zellbesetzungen. Die Effekte sind abhängig voneinander. Die Aufnahmereihenfolge der Prädiktoren (Faktoren) spielt eine wichtige Rolle. Ein hinzugenommener Prädiktor nimmt die maximal ihm zur Verfügung stehende Varianz, auch die mit anderen potenziellen Prädiktoren gemeinsame. Je nach Vorgehen (Aufnahmereihenfolge, verwendeter Quadratsummentyp) werden unterschiedliche Hypothesen geprüft. SumSq der Residuen im 3-Prädiktor-Modell ist SumSq error-Quadratsumme im Anova-Modell

Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren Faktoren im balancieren Modell sind unabhängig (orthogonal) Aufnahmereihenfolge der Faktoren (incl. WW) spielt keine Rolle

Die relevanten Ausführungen dazu finden sich in Field (2012, p. 474 - 477) in Kapitel 11: Analysis of Covariance, ANCOVA

Etwas über den Hintergrund unter lm_cat_unbal_ss_explained.html

[https://www.uni-kiel.de/psychologie/dwoll/r/ssTypes.php] bzw. als Rmd-File

Konzeptioneller Hintergrund und ein empfehlenswertes Tutorial mit Beispiel von Ista Zahn unter InteractionsAndTypesOfSS.pdf external link lokale Kopie Paper Ista Zahn mit dem zugehörigen Datensatz unter Zahn Data

Versteh-Beispiel Everitt

2x2-Design, je 2 Faktorstufen, Everitt (2010)

Unterschiede zwischen balanced und unbalanced Designs. Im Beispieldatensatz sind die Faktoren A und B bereits effektkodiert bzw. kontrast-kodiert und können so direkt als Dummy-Variablen benutzt werden, um einen regressionsanalytischen Ansatz zu rechnen.

balanced design

Differenz der Regressions SumSq zwischen 1-Prädiktor-Modell und 2-Prädiktor-Modell ist SumSq für Faktor B in Anova-Modell Differenz der Regressions SumSq zwischen 2-Prädiktor-Modell und 3-Prädiktor-Modell (Interaktion) ist SumSq für Interaktionsterm in Anova-Modell

Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren Faktoren im balancieren Modell sind unabhängig (orthogonal) Aufnahmereihenfolge der Faktoren (incl. WW) spielt keine Rolle

die konstante Gesamt-Quadratsumme wird auf immer mehr Effekte aufgeteilt Residual-Quadratsumme nimmt immer mehr ab.

# read datafiles
dd   <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/be/e-2x2-bal.txt")
# generate factors
dd$f.A <- factor(dd$A, labels=c('a1', 'a2'))
dd$f.B <- factor(dd$B, labels=c('b1', 'b2'))
#contrasts(dd$f.A) <- contr.sum(2)
#contrasts(dd$f.B) <- contr.sum(2)
#contrasts(dd$f.A) <- contr.treatment(2)
#contrasts(dd$f.B) <- contr.treatment(2)

# frequencies
table(dd$f.A, dd$f.B)

##     
##      b1 b2
##   a1  4  4
##   a2  4  4

require(Rmisc)

## Lade nötiges Paket: Rmisc

## Lade nötiges Paket: lattice

## Lade nötiges Paket: plyr

Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))

##   f.A f.B N     y        sd        se       ci
## 1  a1  b1 4 37.50 2.0816660 1.0408330 3.312395
## 2  a1  b2 4 21.75 0.9574271 0.4787136 1.523480
## 3  a2  b1 4 29.75 2.6299556 1.3149778 4.184846
## 4  a2  b2 4 26.00 2.5819889 1.2909944 4.108521

# explanatory variables are uncorrelated
cor(dd$A, dd$B)

## [1] 0

# fit regression with explanatory variable A
m.A <- lm(y ~ f.A, data=dd)
cbind(dd, model.matrix(m.A))

##     A  B  y f.A f.B (Intercept) f.Aa2
## 1   1  1 23  a2  b2           1     1
## 2   1  1 25  a2  b2           1     1
## 3   1  1 27  a2  b2           1     1
## 4   1  1 29  a2  b2           1     1
## 5   1 -1 26  a2  b1           1     1
## 6   1 -1 32  a2  b1           1     1
## 7   1 -1 30  a2  b1           1     1
## 8   1 -1 31  a2  b1           1     1
## 9  -1  1 22  a1  b2           1     0
## 10 -1  1 23  a1  b2           1     0
## 11 -1  1 21  a1  b2           1     0
## 12 -1  1 21  a1  b2           1     0
## 13 -1 -1 37  a1  b1           1     0
## 14 -1 -1 38  a1  b1           1     0
## 15 -1 -1 40  a1  b1           1     0
## 16 -1 -1 35  a1  b1           1     0

summary(m.A)

## 
## Call:
## lm(formula = y ~ f.A, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.625 -5.312  0.125  4.438 10.375 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   29.625      2.277  13.010 3.29e-09 ***
## f.Aa2         -1.750      3.220  -0.543    0.595    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.441 on 14 degrees of freedom
## Multiple R-squared:  0.02066,    Adjusted R-squared:  -0.0493 
## F-statistic: 0.2953 on 1 and 14 DF,  p-value: 0.5954

# mean(dd$y[dd$f.A == "a1"])
# Rmisc::summarySE(data = dd, "y", groupvars=c("f.A"))
anova(m.A)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value Pr(>F)
## f.A        1  12.25  12.250  0.2953 0.5954
## Residuals 14 580.75  41.482

anova(m.A)$`Sum Sq`

## [1]  12.25 580.75

sum(anova(m.A)$`Sum Sq`)

## [1] 593

# add B
m.A.B <- lm(y ~ f.A + f.B, data=dd)
cbind(dd, model.matrix(m.A.B))

##     A  B  y f.A f.B (Intercept) f.Aa2 f.Bb2
## 1   1  1 23  a2  b2           1     1     1
## 2   1  1 25  a2  b2           1     1     1
## 3   1  1 27  a2  b2           1     1     1
## 4   1  1 29  a2  b2           1     1     1
## 5   1 -1 26  a2  b1           1     1     0
## 6   1 -1 32  a2  b1           1     1     0
## 7   1 -1 30  a2  b1           1     1     0
## 8   1 -1 31  a2  b1           1     1     0
## 9  -1  1 22  a1  b2           1     0     1
## 10 -1  1 23  a1  b2           1     0     1
## 11 -1  1 21  a1  b2           1     0     1
## 12 -1  1 21  a1  b2           1     0     1
## 13 -1 -1 37  a1  b1           1     0     0
## 14 -1 -1 38  a1  b1           1     0     0
## 15 -1 -1 40  a1  b1           1     0     0
## 16 -1 -1 35  a1  b1           1     0     0

m.B.A <- lm(y ~ f.B + f.A, data=dd)
cbind(dd, model.matrix(m.B.A))

##     A  B  y f.A f.B (Intercept) f.Bb2 f.Aa2
## 1   1  1 23  a2  b2           1     1     1
## 2   1  1 25  a2  b2           1     1     1
## 3   1  1 27  a2  b2           1     1     1
## 4   1  1 29  a2  b2           1     1     1
## 5   1 -1 26  a2  b1           1     0     1
## 6   1 -1 32  a2  b1           1     0     1
## 7   1 -1 30  a2  b1           1     0     1
## 8   1 -1 31  a2  b1           1     0     1
## 9  -1  1 22  a1  b2           1     1     0
## 10 -1  1 23  a1  b2           1     1     0
## 11 -1  1 21  a1  b2           1     1     0
## 12 -1  1 21  a1  b2           1     1     0
## 13 -1 -1 37  a1  b1           1     0     0
## 14 -1 -1 38  a1  b1           1     0     0
## 15 -1 -1 40  a1  b1           1     0     0
## 16 -1 -1 35  a1  b1           1     0     0

summary(m.A.B)

## 
## Call:
## lm(formula = y ~ f.A + f.B, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.750 -2.750 -0.375  2.750  6.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   34.500      1.701  20.288 3.17e-11 ***
## f.Aa2         -1.750      1.964  -0.891 0.389011    
## f.Bb2         -9.750      1.964  -4.965 0.000258 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6099 
## F-statistic: 12.72 on 2 and 13 DF,  p-value: 0.0008687

summary(m.B.A)

## 
## Call:
## lm(formula = y ~ f.B + f.A, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.750 -2.750 -0.375  2.750  6.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   34.500      1.701  20.288 3.17e-11 ***
## f.Bb2         -9.750      1.964  -4.965 0.000258 ***
## f.Aa2         -1.750      1.964  -0.891 0.389011    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6099 
## F-statistic: 12.72 on 2 and 13 DF,  p-value: 0.0008687

# in SS type 1 SS are equal, independent of sequence to enter 
anova(m.A.B)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## f.A        1  12.25   12.25  0.7943 0.3890110    
## f.B        1 380.25  380.25 24.6546 0.0002583 ***
## Residuals 13 200.50   15.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.B.A)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## f.B        1 380.25  380.25 24.6546 0.0002583 ***
## f.A        1  12.25   12.25  0.7943 0.3890110    
## Residuals 13 200.50   15.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# mean(dd$y)
# mean(dd$y[dd$f.A == "a1"])
# mean(dd$y[dd$f.A == "a1" & dd$f.B == "b1"])
# Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))
# Rmisc::summarySE(data = dd, "y", groupvars=c("A", "B"))
# coefficients(m.A.B)
# predict(m.A.B)
# dd$y

# model.matrix(m.A.B)
# mean(dd$y[dd$f.B == "b1"])
# mean(c(dd$y[dd$f.A == "a1"], dd$y[dd$f.B == "b1"]))


# difference in SumSq between m.A and m.A.B is SumSq for factor B
anova(m.A, m.A.B)

## Analysis of Variance Table
## 
## Model 1: y ~ f.A
## Model 2: y ~ f.A + f.B
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     14 580.75                                  
## 2     13 200.50  1    380.25 24.655 0.0002583 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.A)$`Sum Sq`

## [1]  12.25 580.75

anova(m.A.B)$`Sum Sq`

## [1]  12.25 380.25 200.50

anova(m.A, m.B.A)

## Analysis of Variance Table
## 
## Model 1: y ~ f.A
## Model 2: y ~ f.B + f.A
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     14 580.75                                  
## 2     13 200.50  1    380.25 24.655 0.0002583 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.B.A)$`Sum Sq`

## [1] 380.25  12.25 200.50

# add interaction
m.A.B.AB <- lm(y ~ f.A + f.B + f.A:f.B, data=dd)
cbind(dd, model.matrix(m.A.B.AB))

##     A  B  y f.A f.B (Intercept) f.Aa2 f.Bb2 f.Aa2:f.Bb2
## 1   1  1 23  a2  b2           1     1     1           1
## 2   1  1 25  a2  b2           1     1     1           1
## 3   1  1 27  a2  b2           1     1     1           1
## 4   1  1 29  a2  b2           1     1     1           1
## 5   1 -1 26  a2  b1           1     1     0           0
## 6   1 -1 32  a2  b1           1     1     0           0
## 7   1 -1 30  a2  b1           1     1     0           0
## 8   1 -1 31  a2  b1           1     1     0           0
## 9  -1  1 22  a1  b2           1     0     1           0
## 10 -1  1 23  a1  b2           1     0     1           0
## 11 -1  1 21  a1  b2           1     0     1           0
## 12 -1  1 21  a1  b2           1     0     1           0
## 13 -1 -1 37  a1  b1           1     0     0           0
## 14 -1 -1 38  a1  b1           1     0     0           0
## 15 -1 -1 40  a1  b1           1     0     0           0
## 16 -1 -1 35  a1  b1           1     0     0           0

summary(m.A.B.AB)

## 
## Call:
## lm(formula = y ~ f.A + f.B + f.A:f.B, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7500 -0.8125  0.2500  1.2500  3.0000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   37.500      1.085  34.564 2.19e-13 ***
## f.Aa2         -7.750      1.534  -5.051 0.000284 ***
## f.Bb2        -15.750      1.534 -10.265 2.70e-07 ***
## f.Aa2:f.Bb2   12.000      2.170   5.530 0.000130 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.17 on 12 degrees of freedom
## Multiple R-squared:  0.9047, Adjusted R-squared:  0.8809 
## F-statistic: 37.98 on 3 and 12 DF,  p-value: 2.102e-06

Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))

##   f.A f.B N     y        sd        se       ci
## 1  a1  b1 4 37.50 2.0816660 1.0408330 3.312395
## 2  a1  b2 4 21.75 0.9574271 0.4787136 1.523480
## 3  a2  b1 4 29.75 2.6299556 1.3149778 4.184846
## 4  a2  b2 4 26.00 2.5819889 1.2909944 4.108521

anova(m.A.B.AB)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## f.A        1  12.25   12.25  2.6018 0.1327167    
## f.B        1 380.25  380.25 80.7611 1.121e-06 ***
## f.A:f.B    1 144.00  144.00 30.5841 0.0001298 ***
## Residuals 12  56.50    4.71                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.A.B, m.A.B.AB)

## Analysis of Variance Table
## 
## Model 1: y ~ f.A + f.B
## Model 2: y ~ f.A + f.B + f.A:f.B
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1     13 200.5                                  
## 2     12  56.5  1       144 30.584 0.0001298 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# difference SumSq of regression between 2 (A, B) and 3 (A, B, A:B) predictor model is SumSq for interaction
# SumSq der Residuen im 3-Prädiktor-Modell ist SumSq error-Quadratsumme im Anova-Modell
# SS resid in 3 predictor model is SS error in Anova (residuals)
# regression weights (coefficients) don't change when new explanatory variables are entered in model
# Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren
# factors and their effects are orthogonal
# sequence of entered explanatory variables doesn't matter
summary(m.A.B)

## 
## Call:
## lm(formula = y ~ f.A + f.B, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.750 -2.750 -0.375  2.750  6.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   34.500      1.701  20.288 3.17e-11 ***
## f.Aa2         -1.750      1.964  -0.891 0.389011    
## f.Bb2         -9.750      1.964  -4.965 0.000258 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6099 
## F-statistic: 12.72 on 2 and 13 DF,  p-value: 0.0008687

summary(m.B.A)

## 
## Call:
## lm(formula = y ~ f.B + f.A, data = dd)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.750 -2.750 -0.375  2.750  6.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   34.500      1.701  20.288 3.17e-11 ***
## f.Bb2         -9.750      1.964  -4.965 0.000258 ***
## f.Aa2         -1.750      1.964  -0.891 0.389011    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6099 
## F-statistic: 12.72 on 2 and 13 DF,  p-value: 0.0008687

summary(m.A.B.AB)

## 
## Call:
## lm(formula = y ~ f.A + f.B + f.A:f.B, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7500 -0.8125  0.2500  1.2500  3.0000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   37.500      1.085  34.564 2.19e-13 ***
## f.Aa2         -7.750      1.534  -5.051 0.000284 ***
## f.Bb2        -15.750      1.534 -10.265 2.70e-07 ***
## f.Aa2:f.Bb2   12.000      2.170   5.530 0.000130 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.17 on 12 degrees of freedom
## Multiple R-squared:  0.9047, Adjusted R-squared:  0.8809 
## F-statistic: 37.98 on 3 and 12 DF,  p-value: 2.102e-06

anova(m.A.B)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## f.A        1  12.25   12.25  0.7943 0.3890110    
## f.B        1 380.25  380.25 24.6546 0.0002583 ***
## Residuals 13 200.50   15.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.B.A)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## f.B        1 380.25  380.25 24.6546 0.0002583 ***
## f.A        1  12.25   12.25  0.7943 0.3890110    
## Residuals 13 200.50   15.42                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.A.B.AB)

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## f.A        1  12.25   12.25  2.6018 0.1327167    
## f.B        1 380.25  380.25 80.7611 1.121e-06 ***
## f.A:f.B    1 144.00  144.00 30.5841 0.0001298 ***
## Residuals 12  56.50    4.71                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# a visualization
require(ggplot2)

## Lade nötiges Paket: ggplot2

ggplot(dd, aes(f.A, y, color=f.B)) +
  # scale_fill_manual(values=c_colors) +
  stat_summary(fun.y=mean, geom="bar", position=position_dodge()) +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) + 
  # put points on top of bars
  stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
    geom_hline(aes(yintercept=coefficients(m.A.B.AB)[1])) + 
  # add verbose axis labels
  labs(x = "group B", y = "y")

## Warning: `fun.y` is deprecated. Use `fun` instead.

# the constant total SS is divided in more and more effects
# residual SS diminishes with every additional effect entered in the model

unbalanced design (same experimental design)

derselbe Datensatz mit ungleicher Teilstichproben-Größe (teils ein paar Beobachtungen mehr) Differenz der Regressions SumSq zwischen 1-Prädiktor-Modell und 2-Prädiktor-Modell ist SumSq für Faktor B in Anova-Modell

Wechselwirkung dazu Reihenfolge der Aufnahme ins Modell verändert die Parameter der Prädiktoren im Modell sowohl die Regressionsparameter als auch die Quadratsummen und damit schliesslich auch die statistische Beurteilung.

Die QS der Interaktion kommt hinzu. QS der Haupteffekte sind verschieden und abhängig von der Aufnahmereihenfolge. QS der Interaktion bleibt gleich, unabhängig von der Aufnahmereihenfolge (gilt für die höchste WW).

Berechnung mit aov()

# unbalanced design
# same experimental design, some observations added, unequal cell size
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/be/e-2x2-unbal.txt")

# generate factors
dd$f.A <- factor(dd$A, labels=c('a1', 'a2'))
dd$f.B <- factor(dd$B, labels=c('b1', 'b2'))

# n in cells
table(dd$f.A, dd$f.B)

##     
##      b1 b2
##   a1 13  7
##   a2  4  8

# descriptives
require(Rmisc)
Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))

##   f.A f.B  N        y       sd        se       ci
## 1  a1  b1 13 37.69231 2.657838 0.7371516 1.606115
## 2  a1  b2  7 20.85714 2.193063 0.8288998 2.028245
## 3  a2  b1  4 29.75000 2.629956 1.3149778 4.184846
## 4  a2  b2  8 26.12500 2.587746 0.9149063 2.163410

# explanatory variables are correlated
cor(dd$A, dd$B)

## [1] 0.3072118

# fit regression including explanatory variable A
mu.A <- lm(y ~ f.A, data=dd)
unique(model.matrix(mu.A))

##    (Intercept) f.Aa2
## 1            1     1
## 13           1     0

summary(mu.A)

## 
## Call:
## lm(formula = y ~ f.A, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.800  -4.333   2.167   5.450   9.200 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   31.800      1.584  20.073   <2e-16 ***
## f.Aa2         -4.467      2.587  -1.727   0.0945 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.085 on 30 degrees of freedom
## Multiple R-squared:  0.09039,    Adjusted R-squared:  0.06007 
## F-statistic: 2.981 on 1 and 30 DF,  p-value: 0.09453

anova(mu.A)

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value  Pr(>F)  
## f.A        1  149.63 149.633   2.981 0.09453 .
## Residuals 30 1505.87  50.196                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# add B to a slightly more complex model
# as we can see, input sequence of explanatory variables matters in unbalanced designs
mu.A.B <- lm(y ~ f.A + f.B, data=dd)
unique(model.matrix(mu.A.B))

##    (Intercept) f.Aa2 f.Bb2
## 1            1     1     1
## 9            1     1     0
## 13           1     0     1
## 20           1     0     0

mu.B.A <- lm(y ~ f.B + f.A, data=dd)
unique(model.matrix(mu.B.A))

##    (Intercept) f.Bb2 f.Aa2
## 1            1     1     1
## 9            1     0     1
## 13           1     1     0
## 20           1     0     0

# coefficients keep the same
summary(mu.A.B)

## 
## Call:
## lm(formula = y ~ f.A + f.B, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3025 -3.0300 -0.1663  3.1749  6.6513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  35.9838     1.0470  34.369  < 2e-16 ***
## f.Aa2        -0.6813     1.5523  -0.439    0.664    
## f.Bb2       -11.9538     1.5060  -7.938 9.39e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.046 on 29 degrees of freedom
## Multiple R-squared:  0.7133, Adjusted R-squared:  0.6935 
## F-statistic: 36.07 on 2 and 29 DF,  p-value: 1.358e-08

summary(mu.B.A)

## 
## Call:
## lm(formula = y ~ f.B + f.A, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3025 -3.0300 -0.1663  3.1749  6.6513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  35.9838     1.0470  34.369  < 2e-16 ***
## f.Bb2       -11.9538     1.5060  -7.938 9.39e-09 ***
## f.Aa2        -0.6813     1.5523  -0.439    0.664    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.046 on 29 degrees of freedom
## Multiple R-squared:  0.7133, Adjusted R-squared:  0.6935 
## F-statistic: 36.07 on 2 and 29 DF,  p-value: 1.358e-08

# SS differ due to the sequence to enter
anova(mu.A.B)

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## f.A        1  149.63  149.63  9.1422  0.005184 ** 
## f.B        1 1031.22 1031.22 63.0047 9.387e-09 ***
## Residuals 29  474.65   16.37                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(mu.B.A)

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## f.B        1 1177.70 1177.70 71.9543 2.399e-09 ***
## f.A        1    3.15    3.15  0.1926     0.664    
## Residuals 29  474.65   16.37                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# total SS stays the same
sum(anova(m.A.B)$`Sum Sq`)

## [1] 593

sum(anova(m.B.A)$`Sum Sq`)

## [1] 593

# difference SS is SS of factor B

# SS of highest interaction stays the same, also in unbalaced designs
mu.A.B.AB <- aov(y ~ f.A + f.B + f.A:f.B, data=dd)
mu.B.A.AB <- aov(y ~ f.B + f.A + f.A:f.B, data=dd)

# sequence to enter alters SS
# A and B add up to the same SS
# interaction stays unchanged
anova(mu.A.B.AB)

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## f.A        1  149.63  149.63  23.116 4.690e-05 ***
## f.B        1 1031.22 1031.22 159.304 4.484e-13 ***
## f.A:f.B    1  293.40  293.40  45.325 2.613e-07 ***
## Residuals 28  181.25    6.47                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(mu.B.A.AB)

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## f.B        1 1177.70 1177.70 181.932 9.005e-14 ***
## f.A        1    3.15    3.15   0.487     0.491    
## f.B:f.A    1  293.40  293.40  45.325 2.613e-07 ***
## Residuals 28  181.25    6.47                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#c_colors <- c("#66ff66", "#ff6666", "#6666ff")
# sequence to enter alters coefficients and SS
# and therefore the statistical tests

ggplot(dd, aes(f.A, y, color=f.B)) +
  # scale_fill_manual(values=c_colors) +
  stat_summary(fun.y=mean, geom="bar", position=position_dodge()) +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) + 
  # put points on top of bars
  stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
    geom_hline(aes(yintercept=coefficients(mu.A.B.AB)[1])) + 
  # add verbose axis labels
  labs(x = "group B", y = "y")

## Warning: `fun.y` is deprecated. Use `fun` instead.

# let us control the visualized values
coefficients(mu.A.B.AB)

## (Intercept)       f.Aa2       f.Bb2 f.Aa2:f.Bb2 
##   37.692308   -7.942308  -16.835165   13.210165

Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))

##   f.A f.B  N        y       sd        se       ci
## 1  a1  b1 13 37.69231 2.657838 0.7371516 1.606115
## 2  a1  b2  7 20.85714 2.193063 0.8288998 2.028245
## 3  a2  b1  4 29.75000 2.629956 1.3149778 4.184846
## 4  a2  b2  8 26.12500 2.587746 0.9149063 2.163410

aov(mu.A.B.AB)

## Call:
##    aov(formula = mu.A.B.AB)
## 
## Terms:
##                       f.A       f.B   f.A:f.B Residuals
## Sum of Squares   149.6333 1031.2154  293.3999  181.2514
## Deg. of Freedom         1         1         1        28
## 
## Residual standard error: 2.544261
## Estimated effects may be unbalanced

sum(summary(aov(mu.A.B.AB))[[1]]$`Sum Sq`)

## [1] 1655.5

aov(mu.B.A.AB)

## Call:
##    aov(formula = mu.B.A.AB)
## 
## Terms:
##                       f.B       f.A   f.B:f.A Residuals
## Sum of Squares  1177.6961    3.1527  293.3999  181.2514
## Deg. of Freedom         1         1         1        28
## 
## Residual standard error: 2.544261
## Estimated effects may be unbalanced

sum(summary(aov(mu.B.A.AB))[[1]]$`Sum Sq`)

## [1] 1655.5

Beispiel STAI-State Angstgruppe und Geschlecht

In bestimmten Gruppen werden die Beobachtungen ‘reduziert’ - es wird eine Zufallsauswahl gezogen. Hierdurch werden die Zellhäufigkeiten in den Faktorstufen-Kombinationen ungleich.

# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
length(dd$subj)

## [1] 180

# percentages
p.del <- c(.5, .7, .8)
s.del <- c()

# select some women from group 'av' to delete in dataframe
s.a <- dd$subj[dd$gender == 'f' & dd$group == 'av'] 
#s.d <- s.a[!is.na(sample(s.a, length(s.a) * p.del[1]))]
set.seed(42)  # to get reproducable results
s.d <- sample(s.a, length(s.a) * p.del[1])
s.del <- c(s.del, s.d)

# select some trained women to delete in dataframe
s.a <- dd$subj[dd$gender == 'f' & dd$group == 'hi'] 
set.seed(43)  # to get reproducable results
s.d <- sample(s.a, length(s.a) * p.del[2])
s.del <- c(s.del, s.d)

# select some anxious men to delete in dataframe
s.a <- dd$subj[dd$gender == 'm' & dd$group == 'lo'] 
set.seed(44)  # to get reproducable results
s.d <- sample(s.a, length(s.a) * p.del[3])
s.del <- c(s.del, s.d)


# now take all selected observations out of dataframe
dd.u <- dd[-s.del,]
# write.table(dd.u, file="v_stait_exam_wide_unbal.txt", quote=F, sep="\t", row.names=F)

# this way the unbalanced dataframe was created
# we read a reduced dataframe constructed in the above described way
dd.u <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide_unbal.txt")
# control crosstabs
table(dd.u$gender, dd.u$group)

##    
##     av hi lo
##   f 15  9 30
##   m 30 30  6

# crosstabulation including test of indepencence
mytable <- xtabs(~gender + group, data=dd.u)
ftable(mytable) # print table

##        group av hi lo
## gender               
## f            15  9 30
## m            30 30  6

summary(mytable) # chi-square test of indepedence

## Call: xtabs(formula = ~gender + group, data = dd.u)
## Number of cases in table: 120 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 31.422, df = 2, p-value = 1.503e-07

# write dummy variables to dataframe
dd.u <- cbind(dd.u, model.matrix(neutr ~ gender + group + gender:group, data=dd.u))
# now we can check correlation of dummy variables
cor(dd.u[c('genderm', 'grouphi', 'grouplo', 'genderm:grouphi', 'genderm:grouplo')])

##                    genderm    grouphi    grouplo genderm:grouphi
## genderm          1.0000000  0.3057762 -0.5044296       0.5222330
## grouphi          0.3057762  1.0000000 -0.4542568       0.8320503
## grouplo         -0.5044296 -0.4542568  1.0000000      -0.3779645
## genderm:grouphi  0.5222330  0.8320503 -0.3779645       1.0000000
## genderm:grouplo  0.2075143 -0.1591890  0.3504383      -0.1324532
##                 genderm:grouplo
## genderm               0.2075143
## grouphi              -0.1591890
## grouplo               0.3504383
## genderm:grouphi      -0.1324532
## genderm:grouplo       1.0000000

# descriptives
require(Rmisc)
summarySE(data=dd.u, "neutr", c("group","gender"))

##   group gender  N    neutr        sd       se       ci
## 1    av      f 15 49.73333 12.020617 3.103710 6.656796
## 2    av      m 30 38.86667 10.294737 1.879553 3.844118
## 3    hi      f  9 54.88889  9.130231 3.043410 7.018117
## 4    hi      m 30 47.86667 10.173438 1.857407 3.798824
## 5    lo      f 30 33.16667 12.605646 2.301465 4.707025
## 6    lo      m  6 31.50000  7.176350 2.929733 7.531118

detach("package:Rmisc", unload=TRUE)

# package(car) is required. It defines 'Anova()'. The capital 'A' is significant!
require(car)

## Lade nötiges Paket: car

## Lade nötiges Paket: carData

# the balanced and unbalanced designs compared
m.b <- (lm(neutr ~ gender*group, data=dd  ))
m.u <- (lm(neutr ~ gender*group, data=dd.u))

# model test
car::Anova(m.b)

## Anova Table (Type II tests)
## 
## Response: neutr
##               Sum Sq  Df F value    Pr(>F)    
## gender        2376.2   1 18.0261  3.54e-05 ***
## group        13220.7   2 50.1470 < 2.2e-16 ***
## gender:group  1564.0   2  5.9325  0.003218 ** 
## Residuals    22936.7 174                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

car::Anova(m.u)

## Anova Table (Type II tests)
## 
## Response: neutr
##               Sum Sq  Df F value    Pr(>F)    
## gender        1251.1   1 10.4639  0.001593 ** 
## group         6410.9   2 26.8094 2.865e-10 ***
## gender:group   285.0   2  1.1918  0.307425    
## Residuals    13630.4 114                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# a look at sums of squares
car::Anova(m.b)$`Sum Sq`; sum(car::Anova(m.b)$`Sum Sq`)

## [1]  2376.200 13220.744  1564.033 22936.667

## [1] 40097.64

car::Anova(m.u)$`Sum Sq`; sum(car::Anova(m.u)$`Sum Sq`)

## [1]  1251.1225  6410.9426   284.9988 13630.4222

## [1] 21577.49

# parameters in detail
# balanced
summary(m.b)

## 
## Call:
## lm(formula = neutr ~ gender * group, data = dd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.1667  -6.9000   0.4833   6.8500  27.9000 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       48.100      2.096  22.946  < 2e-16 ***
## genderm           -9.233      2.964  -3.115  0.00215 ** 
## grouphi           13.067      2.964   4.408 1.82e-05 ***
## grouplo          -14.933      2.964  -5.037 1.17e-06 ***
## genderm:grouphi   -4.067      4.192  -0.970  0.33339    
## genderm:grouplo    9.967      4.192   2.377  0.01852 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.48 on 174 degrees of freedom
## Multiple R-squared:  0.428,  Adjusted R-squared:  0.4115 
## F-statistic: 26.04 on 5 and 174 DF,  p-value: < 2.2e-16

# unbalanced
summary(m.u)

## 
## Call:
## lm(formula = neutr ~ gender * group, data = dd.u)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.733  -7.083   1.133   6.308  27.833 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       49.733      2.823  17.615  < 2e-16 ***
## genderm          -10.867      3.458  -3.143  0.00213 ** 
## grouphi            5.156      4.610   1.118  0.26582    
## grouplo          -16.567      3.458  -4.791 5.04e-06 ***
## genderm:grouphi    3.844      5.406   0.711  0.47846    
## genderm:grouplo    9.200      5.989   1.536  0.12728    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.93 on 114 degrees of freedom
## Multiple R-squared:  0.3302, Adjusted R-squared:  0.3008 
## F-statistic: 11.24 on 5 and 114 DF,  p-value: 8.007e-09

# ... no surprise, results differ due to the loss of data in dd.u (m.u), although it is part of dd (m.b)

# with SS Type I, there should be differences depending on sequence of entrance
summary(aov(lm(neutr ~ gender + group  + gender:group, data=dd.u)))

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## gender         1     24      24   0.201    0.655    
## group          2   6411    3205  26.809 2.87e-10 ***
## gender:group   2    285     142   1.192    0.307    
## Residuals    114  13630     120                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(lm(neutr ~ group  + gender + gender:group, data=dd.u)))

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## group          2   5184  2591.9  21.678 1.05e-08 ***
## gender         1   1251  1251.1  10.464  0.00159 ** 
## group:gender   2    285   142.5   1.192  0.30742    
## Residuals    114  13630   119.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# the highest order interaction is always the same
summary(aov(lm(neutr ~                group + gender + gender:group, data=dd.u)))

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## group          2   5184  2591.9  21.678 1.05e-08 ***
## gender         1   1251  1251.1  10.464  0.00159 ** 
## group:gender   2    285   142.5   1.192  0.30742    
## Residuals    114  13630   119.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(lm(neutr ~ gender:group + group + gender,                data=dd.u)))

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## group          2   5184  2591.9  21.678 1.05e-08 ***
## gender         1   1251  1251.1  10.464  0.00159 ** 
## gender:group   2    285   142.5   1.192  0.30742    
## Residuals    114  13630   119.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# the three types of SS result in differences
summary(aov(lm(neutr ~ gender + group + gender:group, data=dd.u)))

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## gender         1     24      24   0.201    0.655    
## group          2   6411    3205  26.809 2.87e-10 ***
## gender:group   2    285     142   1.192    0.307    
## Residuals    114  13630     120                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(lm(neutr ~ gender + group + gender:group, data=dd.u), type=2)

## Anova Table (Type II tests)
## 
## Response: neutr
##               Sum Sq  Df F value    Pr(>F)    
## gender        1251.1   1 10.4639  0.001593 ** 
## group         6410.9   2 26.8094 2.865e-10 ***
## gender:group   285.0   2  1.1918  0.307425    
## Residuals    13630.4 114                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(lm(neutr ~ gender + group + gender:group, data=dd.u), type=3)

## Anova Table (Type III tests)
## 
## Response: neutr
##              Sum Sq  Df  F value    Pr(>F)    
## (Intercept)   37101   1 310.3001 < 2.2e-16 ***
## gender         1181   1   9.8762  0.002134 ** 
## group          4713   2  19.7083 4.455e-08 ***
## gender:group    285   2   1.1918  0.307425    
## Residuals     13630 114                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# but coefficients keep the same
coefficients(lm(neutr ~ gender + group + gender:group, data=dd.u))

##     (Intercept)         genderm         grouphi         grouplo genderm:grouphi 
##       49.733333      -10.866667        5.155556      -16.566667        3.844444 
## genderm:grouplo 
##        9.200000

coefficients(lm(neutr ~ group + gender + gender:group, data=dd.u))

##     (Intercept)         grouphi         grouplo         genderm grouphi:genderm 
##       49.733333        5.155556      -16.566667      -10.866667        3.844444 
## grouplo:genderm 
##        9.200000

# the highes order interaction is always the same
coefficients(lm(neutr ~ gender:group + group + gender, data=dd.u))

##     (Intercept)         grouphi         grouplo         genderm genderm:grouphi 
##       49.733333        5.155556      -16.566667      -10.866667        3.844444 
## genderm:grouplo 
##        9.200000

require(Rmisc)

## Lade nötiges Paket: Rmisc

summarySE(data=dd.u, "neutr", c("group","gender"))

##   group gender  N    neutr        sd       se       ci
## 1    av      f 15 49.73333 12.020617 3.103710 6.656796
## 2    av      m 30 38.86667 10.294737 1.879553 3.844118
## 3    hi      f  9 54.88889  9.130231 3.043410 7.018117
## 4    hi      m 30 47.86667 10.173438 1.857407 3.798824
## 5    lo      f 30 33.16667 12.605646 2.301465 4.707025
## 6    lo      m  6 31.50000  7.176350 2.929733 7.531118

detach("package:Rmisc", unload=TRUE)

Grafik dazu

dd.u <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide_unbal.txt")
# we want SS type I parameters
m.m <- aov(lm(neutr ~ gender + group + gender:group, data=dd.u))
require(ggplot2)

ggplot(dd.u, aes(x=group, y=neutr, fill=gender)) +
  # define colors
  scale_fill_manual(values=c("#CCCCCC","#FFFFFF")) +
  # create white background etc
  #theme_bw() +
  stat_summary(fun.y=mean, geom="bar", position=position_dodge(width=0.7), colour='black') +
  # add error information
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=.2) +
  # put big points on top of bars
  stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
        geom_hline(aes(yintercept=coefficients(m.m)[1])) + 
  # add verbose axis labels
  labs(x = "Anxiety-Group", y = "STAI-Trait")

## Warning: `fun.y` is deprecated. Use `fun` instead.

mit `ezAnova()` der `library(ez)`

Wenn wir wissen, was genau geprüft werden soll und wenn es nicht um das Verständnis der Vorgänge ‘hinter den Kulissen’ geht, ist ezAnova() ein komfortabler Weg, varianzanalytische Designs zu rechnen. Wir können den Quadratsummen-Typ explizit festlegen.

dd.u <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide_unbal.txt")
require(ez)

## Lade nötiges Paket: ez

# fit model using ezANOVA
m.f <- ezANOVA(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), detailed=TRUE, type=2)

## Warning: Converting "subj" to factor for ANOVA.

## Warning: Converting "gender" to factor for ANOVA.

## Warning: Converting "group" to factor for ANOVA.

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

m.f

## $ANOVA
##         Effect DFn DFd       SSn      SSd         F            p p<.05
## 1       gender   1 114 1251.1225 13630.42 10.463944 1.593089e-03     *
## 2        group   2 114 6410.9426 13630.42 26.809421 2.865332e-10     *
## 3 gender:group   2 114  284.9988 13630.42  1.191814 3.074246e-01      
##          ges
## 1 0.08407209
## 2 0.31988553
## 3 0.02048079
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd         F         p p<.05
## 1   5 114 154.5028 5472.989 0.6436453 0.6668645

#ezANOVA(data=dd.r, dv=.(anx), wid = .(subj), within = .(sit), detailed=TRUE, type=1)

# compare it to Anova
require(car)
Anova(lm(neutr ~ gender + group + gender:group, data=dd.u), type=2)

## Anova Table (Type II tests)
## 
## Response: neutr
##               Sum Sq  Df F value    Pr(>F)    
## gender        1251.1   1 10.4639  0.001593 ** 
## group         6410.9   2 26.8094 2.865e-10 ***
## gender:group   285.0   2  1.1918  0.307425    
## Residuals    13630.4 114                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ez offers ezPlot()
ezPlot(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), x=group)

## Warning: Converting "subj" to factor for ANOVA.

## Warning: Converting "gender" to factor for ANOVA.

## Warning: Converting "group" to factor for ANOVA.

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

## Warning in ezStats(data = data, dv = dv, wid = wid, within = within, within_full
## = within_full, : Unbalanced groups. Mean N will be used in computation of FLSD

#ezPlot(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), x=group, split=.(group))
ezPlot(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), x=group, split=.(gender))

## Warning: Converting "subj" to factor for ANOVA.

## Warning: Converting "gender" to factor for ANOVA.

## Warning: Converting "group" to factor for ANOVA.

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

## Warning in ezStats(data = data, dv = dv, wid = wid, within = within, within_full
## = within_full, : Unbalanced groups. Mean N will be used in computation of FLSD

# descriptives - the ez way
ezStats(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group))

## Warning: Converting "subj" to factor for ANOVA.

## Warning: Converting "gender" to factor for ANOVA.

## Warning: Converting "group" to factor for ANOVA.

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

## Warning in ezStats(data = dd.u, dv = .(neutr), wid = .(subj), between = .
## (gender, : Unbalanced groups. Mean N will be used in computation of FLSD

##   gender group  N     Mean        SD     FLSD
## 1      f    av 15 49.73333 12.020617 6.849912
## 2      f    hi  9 54.88889  9.130231 6.849912
## 3      f    lo 30 33.16667 12.605646 6.849912
## 4      m    av 30 38.86667 10.294737 6.849912
## 5      m    hi 30 47.86667 10.173438 6.849912
## 6      m    lo  6 31.50000  7.176350 6.849912

# some visualizations in ggplot
require(ggplot2)
ggplot(dd.u, aes(x=group, y=neutr)) +
  # define colors
  scale_fill_manual(values=c("#FFFFFF", "#CCCCCC", '#AAAAAA')) +
  # create white background etc
  #theme_bw() +
  geom_point() +
  # add verbose axis labels
  labs(x = "Anxiety-Group", y = "STAI-Trait")

ggplot(dd.u, aes(x=group, y=neutr, fill=gender)) +
  # define colors
  scale_fill_manual(values=c("#FFFFFF", "#CCCCCC", '#AAAAAA')) +
  # create white background etc
  #theme_bw() +
  stat_summary(fun.y=mean, geom="bar", position=position_dodge(width=0.7), colour='black') +
  # add error information
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) +
  # put big points on top of bars
  stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
  # add verbose axis labels
  labs(x = "Anxiety-Group", y = "STAI-Trait")

## Warning: `fun.y` is deprecated. Use `fun` instead.

Beispiele Übungen / Exercises

PACS Data

Führen Sie einfache VAs durch mit den PACS-Daten, beispielsweise, ob sich die verschiedenen Gewichts-Untergruppen hinsichtlich PACS-Werte unterscheiden.

Zahn, Geschlecht und Bezahlung

Vollziehen Sie das Beispiel von Zahn (2010) nach

InteractionsAndTypesOfSS.pdf mit dem zugehörigen Datensatz

zahn

Abwägen von Vor- und Nachteilen

Überlegen Sie sich Situationen, in denen es bei unbalanced Data von Vorteil sein könnte, mit QS Type I zu rechnen d. h. in denen diese Art der Auswertung näher an einer psychologischen Fragestellung wäre, als die Anwendung von QS III

Links und Referenzen

Konzept und ein Teil der Daten entstammen dem Buch Everitt (2010)

Field (2012, p. 474 - 477) in Kapitel 11: Analysis of Covariance, ANCOVA