(Aus-)Balancierte varianzanalytische Designs sind Designs mit gleicher Zellbesetzung. Die Effekte sind dann unabhängig voneinander. Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren. Die Aufnahmereihenfolge der Prädiktoren (Faktoren) in das Modell spielt keine Rolle, auch nicht bei Quadratsummenbildung vom Typ I.
Unbalancierte varianzanalytische Designs haben ungleiche Zellbesetzungen. Die Effekte sind abhängig voneinander. Die Aufnahmereihenfolge der Prädiktoren (Faktoren) spielt eine wichtige Rolle. Ein hinzugenommener Prädiktor nimmt die maximal ihm zur Verfügung stehende Varianz, auch die mit anderen potenziellen Prädiktoren gemeinsame. Je nach Vorgehen (Aufnahmereihenfolge, verwendeter Quadratsummentyp) werden unterschiedliche Hypothesen geprüft. SumSq der Residuen im 3-Prädiktor-Modell ist SumSq error-Quadratsumme im Anova-Modell
Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren Faktoren im balancieren Modell sind unabhängig (orthogonal) Aufnahmereihenfolge der Faktoren (incl. WW) spielt keine Rolle
Die relevanten Ausführungen dazu finden sich in Field (2012, p. 474 - 477) in Kapitel 11: Analysis of Covariance, ANCOVA
Etwas über den Hintergrund unter lm_cat_unbal_ss_explained.html
[https://www.uni-kiel.de/psychologie/dwoll/r/ssTypes.php] bzw. als Rmd-File
Konzeptioneller Hintergrund und ein empfehlenswertes Tutorial mit Beispiel von Ista Zahn unter InteractionsAndTypesOfSS.pdf external link lokale Kopie Paper Ista Zahn mit dem zugehörigen Datensatz unter Zahn Data
2x2-Design, je 2 Faktorstufen, Everitt (2010)
Unterschiede zwischen balanced und unbalanced Designs. Im Beispieldatensatz sind die Faktoren A und B bereits effektkodiert bzw. kontrast-kodiert und können so direkt als Dummy-Variablen benutzt werden, um einen regressionsanalytischen Ansatz zu rechnen.
Differenz der Regressions SumSq zwischen 1-Prädiktor-Modell und 2-Prädiktor-Modell ist SumSq für Faktor B in Anova-Modell Differenz der Regressions SumSq zwischen 2-Prädiktor-Modell und 3-Prädiktor-Modell (Interaktion) ist SumSq für Interaktionsterm in Anova-Modell
Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren Faktoren im balancieren Modell sind unabhängig (orthogonal) Aufnahmereihenfolge der Faktoren (incl. WW) spielt keine Rolle
die konstante Gesamt-Quadratsumme wird auf immer mehr Effekte aufgeteilt Residual-Quadratsumme nimmt immer mehr ab.
# read datafiles
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/be/e-2x2-bal.txt")
# generate factors
dd$f.A <- factor(dd$A, labels=c('a1', 'a2'))
dd$f.B <- factor(dd$B, labels=c('b1', 'b2'))
#contrasts(dd$f.A) <- contr.sum(2)
#contrasts(dd$f.B) <- contr.sum(2)
#contrasts(dd$f.A) <- contr.treatment(2)
#contrasts(dd$f.B) <- contr.treatment(2)
# frequencies
table(dd$f.A, dd$f.B)
##
## b1 b2
## a1 4 4
## a2 4 4
require(Rmisc)
## Lade nötiges Paket: Rmisc
## Lade nötiges Paket: lattice
## Lade nötiges Paket: plyr
Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))
## f.A f.B N y sd se ci
## 1 a1 b1 4 37.50 2.0816660 1.0408330 3.312395
## 2 a1 b2 4 21.75 0.9574271 0.4787136 1.523480
## 3 a2 b1 4 29.75 2.6299556 1.3149778 4.184846
## 4 a2 b2 4 26.00 2.5819889 1.2909944 4.108521
# explanatory variables are uncorrelated
cor(dd$A, dd$B)
## [1] 0
# fit regression with explanatory variable A
m.A <- lm(y ~ f.A, data=dd)
cbind(dd, model.matrix(m.A))
## A B y f.A f.B (Intercept) f.Aa2
## 1 1 1 23 a2 b2 1 1
## 2 1 1 25 a2 b2 1 1
## 3 1 1 27 a2 b2 1 1
## 4 1 1 29 a2 b2 1 1
## 5 1 -1 26 a2 b1 1 1
## 6 1 -1 32 a2 b1 1 1
## 7 1 -1 30 a2 b1 1 1
## 8 1 -1 31 a2 b1 1 1
## 9 -1 1 22 a1 b2 1 0
## 10 -1 1 23 a1 b2 1 0
## 11 -1 1 21 a1 b2 1 0
## 12 -1 1 21 a1 b2 1 0
## 13 -1 -1 37 a1 b1 1 0
## 14 -1 -1 38 a1 b1 1 0
## 15 -1 -1 40 a1 b1 1 0
## 16 -1 -1 35 a1 b1 1 0
summary(m.A)
##
## Call:
## lm(formula = y ~ f.A, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.625 -5.312 0.125 4.438 10.375
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.625 2.277 13.010 3.29e-09 ***
## f.Aa2 -1.750 3.220 -0.543 0.595
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.441 on 14 degrees of freedom
## Multiple R-squared: 0.02066, Adjusted R-squared: -0.0493
## F-statistic: 0.2953 on 1 and 14 DF, p-value: 0.5954
# mean(dd$y[dd$f.A == "a1"])
# Rmisc::summarySE(data = dd, "y", groupvars=c("f.A"))
anova(m.A)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 12.25 12.250 0.2953 0.5954
## Residuals 14 580.75 41.482
anova(m.A)$`Sum Sq`
## [1] 12.25 580.75
sum(anova(m.A)$`Sum Sq`)
## [1] 593
# add B
m.A.B <- lm(y ~ f.A + f.B, data=dd)
cbind(dd, model.matrix(m.A.B))
## A B y f.A f.B (Intercept) f.Aa2 f.Bb2
## 1 1 1 23 a2 b2 1 1 1
## 2 1 1 25 a2 b2 1 1 1
## 3 1 1 27 a2 b2 1 1 1
## 4 1 1 29 a2 b2 1 1 1
## 5 1 -1 26 a2 b1 1 1 0
## 6 1 -1 32 a2 b1 1 1 0
## 7 1 -1 30 a2 b1 1 1 0
## 8 1 -1 31 a2 b1 1 1 0
## 9 -1 1 22 a1 b2 1 0 1
## 10 -1 1 23 a1 b2 1 0 1
## 11 -1 1 21 a1 b2 1 0 1
## 12 -1 1 21 a1 b2 1 0 1
## 13 -1 -1 37 a1 b1 1 0 0
## 14 -1 -1 38 a1 b1 1 0 0
## 15 -1 -1 40 a1 b1 1 0 0
## 16 -1 -1 35 a1 b1 1 0 0
m.B.A <- lm(y ~ f.B + f.A, data=dd)
cbind(dd, model.matrix(m.B.A))
## A B y f.A f.B (Intercept) f.Bb2 f.Aa2
## 1 1 1 23 a2 b2 1 1 1
## 2 1 1 25 a2 b2 1 1 1
## 3 1 1 27 a2 b2 1 1 1
## 4 1 1 29 a2 b2 1 1 1
## 5 1 -1 26 a2 b1 1 0 1
## 6 1 -1 32 a2 b1 1 0 1
## 7 1 -1 30 a2 b1 1 0 1
## 8 1 -1 31 a2 b1 1 0 1
## 9 -1 1 22 a1 b2 1 1 0
## 10 -1 1 23 a1 b2 1 1 0
## 11 -1 1 21 a1 b2 1 1 0
## 12 -1 1 21 a1 b2 1 1 0
## 13 -1 -1 37 a1 b1 1 0 0
## 14 -1 -1 38 a1 b1 1 0 0
## 15 -1 -1 40 a1 b1 1 0 0
## 16 -1 -1 35 a1 b1 1 0 0
summary(m.A.B)
##
## Call:
## lm(formula = y ~ f.A + f.B, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.750 -2.750 -0.375 2.750 6.000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.500 1.701 20.288 3.17e-11 ***
## f.Aa2 -1.750 1.964 -0.891 0.389011
## f.Bb2 -9.750 1.964 -4.965 0.000258 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared: 0.6619, Adjusted R-squared: 0.6099
## F-statistic: 12.72 on 2 and 13 DF, p-value: 0.0008687
summary(m.B.A)
##
## Call:
## lm(formula = y ~ f.B + f.A, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.750 -2.750 -0.375 2.750 6.000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.500 1.701 20.288 3.17e-11 ***
## f.Bb2 -9.750 1.964 -4.965 0.000258 ***
## f.Aa2 -1.750 1.964 -0.891 0.389011
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared: 0.6619, Adjusted R-squared: 0.6099
## F-statistic: 12.72 on 2 and 13 DF, p-value: 0.0008687
# in SS type 1 SS are equal, independent of sequence to enter
anova(m.A.B)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 12.25 12.25 0.7943 0.3890110
## f.B 1 380.25 380.25 24.6546 0.0002583 ***
## Residuals 13 200.50 15.42
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.B.A)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.B 1 380.25 380.25 24.6546 0.0002583 ***
## f.A 1 12.25 12.25 0.7943 0.3890110
## Residuals 13 200.50 15.42
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# mean(dd$y)
# mean(dd$y[dd$f.A == "a1"])
# mean(dd$y[dd$f.A == "a1" & dd$f.B == "b1"])
# Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))
# Rmisc::summarySE(data = dd, "y", groupvars=c("A", "B"))
# coefficients(m.A.B)
# predict(m.A.B)
# dd$y
# model.matrix(m.A.B)
# mean(dd$y[dd$f.B == "b1"])
# mean(c(dd$y[dd$f.A == "a1"], dd$y[dd$f.B == "b1"]))
# difference in SumSq between m.A and m.A.B is SumSq for factor B
anova(m.A, m.A.B)
## Analysis of Variance Table
##
## Model 1: y ~ f.A
## Model 2: y ~ f.A + f.B
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 14 580.75
## 2 13 200.50 1 380.25 24.655 0.0002583 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.A)$`Sum Sq`
## [1] 12.25 580.75
anova(m.A.B)$`Sum Sq`
## [1] 12.25 380.25 200.50
anova(m.A, m.B.A)
## Analysis of Variance Table
##
## Model 1: y ~ f.A
## Model 2: y ~ f.B + f.A
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 14 580.75
## 2 13 200.50 1 380.25 24.655 0.0002583 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.B.A)$`Sum Sq`
## [1] 380.25 12.25 200.50
# add interaction
m.A.B.AB <- lm(y ~ f.A + f.B + f.A:f.B, data=dd)
cbind(dd, model.matrix(m.A.B.AB))
## A B y f.A f.B (Intercept) f.Aa2 f.Bb2 f.Aa2:f.Bb2
## 1 1 1 23 a2 b2 1 1 1 1
## 2 1 1 25 a2 b2 1 1 1 1
## 3 1 1 27 a2 b2 1 1 1 1
## 4 1 1 29 a2 b2 1 1 1 1
## 5 1 -1 26 a2 b1 1 1 0 0
## 6 1 -1 32 a2 b1 1 1 0 0
## 7 1 -1 30 a2 b1 1 1 0 0
## 8 1 -1 31 a2 b1 1 1 0 0
## 9 -1 1 22 a1 b2 1 0 1 0
## 10 -1 1 23 a1 b2 1 0 1 0
## 11 -1 1 21 a1 b2 1 0 1 0
## 12 -1 1 21 a1 b2 1 0 1 0
## 13 -1 -1 37 a1 b1 1 0 0 0
## 14 -1 -1 38 a1 b1 1 0 0 0
## 15 -1 -1 40 a1 b1 1 0 0 0
## 16 -1 -1 35 a1 b1 1 0 0 0
summary(m.A.B.AB)
##
## Call:
## lm(formula = y ~ f.A + f.B + f.A:f.B, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7500 -0.8125 0.2500 1.2500 3.0000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.500 1.085 34.564 2.19e-13 ***
## f.Aa2 -7.750 1.534 -5.051 0.000284 ***
## f.Bb2 -15.750 1.534 -10.265 2.70e-07 ***
## f.Aa2:f.Bb2 12.000 2.170 5.530 0.000130 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.17 on 12 degrees of freedom
## Multiple R-squared: 0.9047, Adjusted R-squared: 0.8809
## F-statistic: 37.98 on 3 and 12 DF, p-value: 2.102e-06
Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))
## f.A f.B N y sd se ci
## 1 a1 b1 4 37.50 2.0816660 1.0408330 3.312395
## 2 a1 b2 4 21.75 0.9574271 0.4787136 1.523480
## 3 a2 b1 4 29.75 2.6299556 1.3149778 4.184846
## 4 a2 b2 4 26.00 2.5819889 1.2909944 4.108521
anova(m.A.B.AB)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 12.25 12.25 2.6018 0.1327167
## f.B 1 380.25 380.25 80.7611 1.121e-06 ***
## f.A:f.B 1 144.00 144.00 30.5841 0.0001298 ***
## Residuals 12 56.50 4.71
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.A.B, m.A.B.AB)
## Analysis of Variance Table
##
## Model 1: y ~ f.A + f.B
## Model 2: y ~ f.A + f.B + f.A:f.B
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 13 200.5
## 2 12 56.5 1 144 30.584 0.0001298 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# difference SumSq of regression between 2 (A, B) and 3 (A, B, A:B) predictor model is SumSq for interaction
# SumSq der Residuen im 3-Prädiktor-Modell ist SumSq error-Quadratsumme im Anova-Modell
# SS resid in 3 predictor model is SS error in Anova (residuals)
# regression weights (coefficients) don't change when new explanatory variables are entered in model
# Regressionsgewichte verändern sich nicht mit Aufnahme neuer Prädiktoren
# factors and their effects are orthogonal
# sequence of entered explanatory variables doesn't matter
summary(m.A.B)
##
## Call:
## lm(formula = y ~ f.A + f.B, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.750 -2.750 -0.375 2.750 6.000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.500 1.701 20.288 3.17e-11 ***
## f.Aa2 -1.750 1.964 -0.891 0.389011
## f.Bb2 -9.750 1.964 -4.965 0.000258 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared: 0.6619, Adjusted R-squared: 0.6099
## F-statistic: 12.72 on 2 and 13 DF, p-value: 0.0008687
summary(m.B.A)
##
## Call:
## lm(formula = y ~ f.B + f.A, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.750 -2.750 -0.375 2.750 6.000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.500 1.701 20.288 3.17e-11 ***
## f.Bb2 -9.750 1.964 -4.965 0.000258 ***
## f.Aa2 -1.750 1.964 -0.891 0.389011
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.927 on 13 degrees of freedom
## Multiple R-squared: 0.6619, Adjusted R-squared: 0.6099
## F-statistic: 12.72 on 2 and 13 DF, p-value: 0.0008687
summary(m.A.B.AB)
##
## Call:
## lm(formula = y ~ f.A + f.B + f.A:f.B, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7500 -0.8125 0.2500 1.2500 3.0000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.500 1.085 34.564 2.19e-13 ***
## f.Aa2 -7.750 1.534 -5.051 0.000284 ***
## f.Bb2 -15.750 1.534 -10.265 2.70e-07 ***
## f.Aa2:f.Bb2 12.000 2.170 5.530 0.000130 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.17 on 12 degrees of freedom
## Multiple R-squared: 0.9047, Adjusted R-squared: 0.8809
## F-statistic: 37.98 on 3 and 12 DF, p-value: 2.102e-06
anova(m.A.B)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 12.25 12.25 0.7943 0.3890110
## f.B 1 380.25 380.25 24.6546 0.0002583 ***
## Residuals 13 200.50 15.42
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.B.A)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.B 1 380.25 380.25 24.6546 0.0002583 ***
## f.A 1 12.25 12.25 0.7943 0.3890110
## Residuals 13 200.50 15.42
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.A.B.AB)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 12.25 12.25 2.6018 0.1327167
## f.B 1 380.25 380.25 80.7611 1.121e-06 ***
## f.A:f.B 1 144.00 144.00 30.5841 0.0001298 ***
## Residuals 12 56.50 4.71
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# a visualization
require(ggplot2)
## Lade nötiges Paket: ggplot2
ggplot(dd, aes(f.A, y, color=f.B)) +
# scale_fill_manual(values=c_colors) +
stat_summary(fun.y=mean, geom="bar", position=position_dodge()) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) +
# put points on top of bars
stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
geom_hline(aes(yintercept=coefficients(m.A.B.AB)[1])) +
# add verbose axis labels
labs(x = "group B", y = "y")
## Warning: `fun.y` is deprecated. Use `fun` instead.
# the constant total SS is divided in more and more effects
# residual SS diminishes with every additional effect entered in the model
derselbe Datensatz mit ungleicher Teilstichproben-Größe (teils ein paar Beobachtungen mehr) Differenz der Regressions SumSq zwischen 1-Prädiktor-Modell und 2-Prädiktor-Modell ist SumSq für Faktor B in Anova-Modell
Wechselwirkung dazu Reihenfolge der Aufnahme ins Modell verändert die Parameter der Prädiktoren im Modell sowohl die Regressionsparameter als auch die Quadratsummen und damit schliesslich auch die statistische Beurteilung.
Die QS der Interaktion kommt hinzu. QS der Haupteffekte sind verschieden und abhängig von der Aufnahmereihenfolge. QS der Interaktion bleibt gleich, unabhängig von der Aufnahmereihenfolge (gilt für die höchste WW).
Berechnung mit aov()
# unbalanced design
# same experimental design, some observations added, unequal cell size
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/be/e-2x2-unbal.txt")
# generate factors
dd$f.A <- factor(dd$A, labels=c('a1', 'a2'))
dd$f.B <- factor(dd$B, labels=c('b1', 'b2'))
# n in cells
table(dd$f.A, dd$f.B)
##
## b1 b2
## a1 13 7
## a2 4 8
# descriptives
require(Rmisc)
Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))
## f.A f.B N y sd se ci
## 1 a1 b1 13 37.69231 2.657838 0.7371516 1.606115
## 2 a1 b2 7 20.85714 2.193063 0.8288998 2.028245
## 3 a2 b1 4 29.75000 2.629956 1.3149778 4.184846
## 4 a2 b2 8 26.12500 2.587746 0.9149063 2.163410
# explanatory variables are correlated
cor(dd$A, dd$B)
## [1] 0.3072118
# fit regression including explanatory variable A
mu.A <- lm(y ~ f.A, data=dd)
unique(model.matrix(mu.A))
## (Intercept) f.Aa2
## 1 1 1
## 13 1 0
summary(mu.A)
##
## Call:
## lm(formula = y ~ f.A, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.800 -4.333 2.167 5.450 9.200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.800 1.584 20.073 <2e-16 ***
## f.Aa2 -4.467 2.587 -1.727 0.0945 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.085 on 30 degrees of freedom
## Multiple R-squared: 0.09039, Adjusted R-squared: 0.06007
## F-statistic: 2.981 on 1 and 30 DF, p-value: 0.09453
anova(mu.A)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 149.63 149.633 2.981 0.09453 .
## Residuals 30 1505.87 50.196
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# add B to a slightly more complex model
# as we can see, input sequence of explanatory variables matters in unbalanced designs
mu.A.B <- lm(y ~ f.A + f.B, data=dd)
unique(model.matrix(mu.A.B))
## (Intercept) f.Aa2 f.Bb2
## 1 1 1 1
## 9 1 1 0
## 13 1 0 1
## 20 1 0 0
mu.B.A <- lm(y ~ f.B + f.A, data=dd)
unique(model.matrix(mu.B.A))
## (Intercept) f.Bb2 f.Aa2
## 1 1 1 1
## 9 1 0 1
## 13 1 1 0
## 20 1 0 0
# coefficients keep the same
summary(mu.A.B)
##
## Call:
## lm(formula = y ~ f.A + f.B, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3025 -3.0300 -0.1663 3.1749 6.6513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.9838 1.0470 34.369 < 2e-16 ***
## f.Aa2 -0.6813 1.5523 -0.439 0.664
## f.Bb2 -11.9538 1.5060 -7.938 9.39e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.046 on 29 degrees of freedom
## Multiple R-squared: 0.7133, Adjusted R-squared: 0.6935
## F-statistic: 36.07 on 2 and 29 DF, p-value: 1.358e-08
summary(mu.B.A)
##
## Call:
## lm(formula = y ~ f.B + f.A, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3025 -3.0300 -0.1663 3.1749 6.6513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.9838 1.0470 34.369 < 2e-16 ***
## f.Bb2 -11.9538 1.5060 -7.938 9.39e-09 ***
## f.Aa2 -0.6813 1.5523 -0.439 0.664
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.046 on 29 degrees of freedom
## Multiple R-squared: 0.7133, Adjusted R-squared: 0.6935
## F-statistic: 36.07 on 2 and 29 DF, p-value: 1.358e-08
# SS differ due to the sequence to enter
anova(mu.A.B)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 149.63 149.63 9.1422 0.005184 **
## f.B 1 1031.22 1031.22 63.0047 9.387e-09 ***
## Residuals 29 474.65 16.37
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(mu.B.A)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.B 1 1177.70 1177.70 71.9543 2.399e-09 ***
## f.A 1 3.15 3.15 0.1926 0.664
## Residuals 29 474.65 16.37
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# total SS stays the same
sum(anova(m.A.B)$`Sum Sq`)
## [1] 593
sum(anova(m.B.A)$`Sum Sq`)
## [1] 593
# difference SS is SS of factor B
# SS of highest interaction stays the same, also in unbalaced designs
mu.A.B.AB <- aov(y ~ f.A + f.B + f.A:f.B, data=dd)
mu.B.A.AB <- aov(y ~ f.B + f.A + f.A:f.B, data=dd)
# sequence to enter alters SS
# A and B add up to the same SS
# interaction stays unchanged
anova(mu.A.B.AB)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.A 1 149.63 149.63 23.116 4.690e-05 ***
## f.B 1 1031.22 1031.22 159.304 4.484e-13 ***
## f.A:f.B 1 293.40 293.40 45.325 2.613e-07 ***
## Residuals 28 181.25 6.47
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(mu.B.A.AB)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## f.B 1 1177.70 1177.70 181.932 9.005e-14 ***
## f.A 1 3.15 3.15 0.487 0.491
## f.B:f.A 1 293.40 293.40 45.325 2.613e-07 ***
## Residuals 28 181.25 6.47
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#c_colors <- c("#66ff66", "#ff6666", "#6666ff")
# sequence to enter alters coefficients and SS
# and therefore the statistical tests
ggplot(dd, aes(f.A, y, color=f.B)) +
# scale_fill_manual(values=c_colors) +
stat_summary(fun.y=mean, geom="bar", position=position_dodge()) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) +
# put points on top of bars
stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
geom_hline(aes(yintercept=coefficients(mu.A.B.AB)[1])) +
# add verbose axis labels
labs(x = "group B", y = "y")
## Warning: `fun.y` is deprecated. Use `fun` instead.
# let us control the visualized values
coefficients(mu.A.B.AB)
## (Intercept) f.Aa2 f.Bb2 f.Aa2:f.Bb2
## 37.692308 -7.942308 -16.835165 13.210165
Rmisc::summarySE(data = dd, "y", groupvars=c("f.A", "f.B"))
## f.A f.B N y sd se ci
## 1 a1 b1 13 37.69231 2.657838 0.7371516 1.606115
## 2 a1 b2 7 20.85714 2.193063 0.8288998 2.028245
## 3 a2 b1 4 29.75000 2.629956 1.3149778 4.184846
## 4 a2 b2 8 26.12500 2.587746 0.9149063 2.163410
aov(mu.A.B.AB)
## Call:
## aov(formula = mu.A.B.AB)
##
## Terms:
## f.A f.B f.A:f.B Residuals
## Sum of Squares 149.6333 1031.2154 293.3999 181.2514
## Deg. of Freedom 1 1 1 28
##
## Residual standard error: 2.544261
## Estimated effects may be unbalanced
sum(summary(aov(mu.A.B.AB))[[1]]$`Sum Sq`)
## [1] 1655.5
aov(mu.B.A.AB)
## Call:
## aov(formula = mu.B.A.AB)
##
## Terms:
## f.B f.A f.B:f.A Residuals
## Sum of Squares 1177.6961 3.1527 293.3999 181.2514
## Deg. of Freedom 1 1 1 28
##
## Residual standard error: 2.544261
## Estimated effects may be unbalanced
sum(summary(aov(mu.B.A.AB))[[1]]$`Sum Sq`)
## [1] 1655.5
In bestimmten Gruppen werden die Beobachtungen ‘reduziert’ - es wird eine Zufallsauswahl gezogen. Hierdurch werden die Zellhäufigkeiten in den Faktorstufen-Kombinationen ungleich.
# get data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide.txt")[c('subj','gender','group', 'neutr')]
length(dd$subj)
## [1] 180
# percentages
p.del <- c(.5, .7, .8)
s.del <- c()
# select some women from group 'av' to delete in dataframe
s.a <- dd$subj[dd$gender == 'f' & dd$group == 'av']
#s.d <- s.a[!is.na(sample(s.a, length(s.a) * p.del[1]))]
set.seed(42) # to get reproducable results
s.d <- sample(s.a, length(s.a) * p.del[1])
s.del <- c(s.del, s.d)
# select some trained women to delete in dataframe
s.a <- dd$subj[dd$gender == 'f' & dd$group == 'hi']
set.seed(43) # to get reproducable results
s.d <- sample(s.a, length(s.a) * p.del[2])
s.del <- c(s.del, s.d)
# select some anxious men to delete in dataframe
s.a <- dd$subj[dd$gender == 'm' & dd$group == 'lo']
set.seed(44) # to get reproducable results
s.d <- sample(s.a, length(s.a) * p.del[3])
s.del <- c(s.del, s.d)
# now take all selected observations out of dataframe
dd.u <- dd[-s.del,]
# write.table(dd.u, file="v_stait_exam_wide_unbal.txt", quote=F, sep="\t", row.names=F)
# this way the unbalanced dataframe was created
# we read a reduced dataframe constructed in the above described way
dd.u <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide_unbal.txt")
# control crosstabs
table(dd.u$gender, dd.u$group)
##
## av hi lo
## f 15 9 30
## m 30 30 6
# crosstabulation including test of indepencence
mytable <- xtabs(~gender + group, data=dd.u)
ftable(mytable) # print table
## group av hi lo
## gender
## f 15 9 30
## m 30 30 6
summary(mytable) # chi-square test of indepedence
## Call: xtabs(formula = ~gender + group, data = dd.u)
## Number of cases in table: 120
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 31.422, df = 2, p-value = 1.503e-07
# write dummy variables to dataframe
dd.u <- cbind(dd.u, model.matrix(neutr ~ gender + group + gender:group, data=dd.u))
# now we can check correlation of dummy variables
cor(dd.u[c('genderm', 'grouphi', 'grouplo', 'genderm:grouphi', 'genderm:grouplo')])
## genderm grouphi grouplo genderm:grouphi
## genderm 1.0000000 0.3057762 -0.5044296 0.5222330
## grouphi 0.3057762 1.0000000 -0.4542568 0.8320503
## grouplo -0.5044296 -0.4542568 1.0000000 -0.3779645
## genderm:grouphi 0.5222330 0.8320503 -0.3779645 1.0000000
## genderm:grouplo 0.2075143 -0.1591890 0.3504383 -0.1324532
## genderm:grouplo
## genderm 0.2075143
## grouphi -0.1591890
## grouplo 0.3504383
## genderm:grouphi -0.1324532
## genderm:grouplo 1.0000000
# descriptives
require(Rmisc)
summarySE(data=dd.u, "neutr", c("group","gender"))
## group gender N neutr sd se ci
## 1 av f 15 49.73333 12.020617 3.103710 6.656796
## 2 av m 30 38.86667 10.294737 1.879553 3.844118
## 3 hi f 9 54.88889 9.130231 3.043410 7.018117
## 4 hi m 30 47.86667 10.173438 1.857407 3.798824
## 5 lo f 30 33.16667 12.605646 2.301465 4.707025
## 6 lo m 6 31.50000 7.176350 2.929733 7.531118
detach("package:Rmisc", unload=TRUE)
# package(car) is required. It defines 'Anova()'. The capital 'A' is significant!
require(car)
## Lade nötiges Paket: car
## Lade nötiges Paket: carData
# the balanced and unbalanced designs compared
m.b <- (lm(neutr ~ gender*group, data=dd ))
m.u <- (lm(neutr ~ gender*group, data=dd.u))
# model test
car::Anova(m.b)
## Anova Table (Type II tests)
##
## Response: neutr
## Sum Sq Df F value Pr(>F)
## gender 2376.2 1 18.0261 3.54e-05 ***
## group 13220.7 2 50.1470 < 2.2e-16 ***
## gender:group 1564.0 2 5.9325 0.003218 **
## Residuals 22936.7 174
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::Anova(m.u)
## Anova Table (Type II tests)
##
## Response: neutr
## Sum Sq Df F value Pr(>F)
## gender 1251.1 1 10.4639 0.001593 **
## group 6410.9 2 26.8094 2.865e-10 ***
## gender:group 285.0 2 1.1918 0.307425
## Residuals 13630.4 114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# a look at sums of squares
car::Anova(m.b)$`Sum Sq`; sum(car::Anova(m.b)$`Sum Sq`)
## [1] 2376.200 13220.744 1564.033 22936.667
## [1] 40097.64
car::Anova(m.u)$`Sum Sq`; sum(car::Anova(m.u)$`Sum Sq`)
## [1] 1251.1225 6410.9426 284.9988 13630.4222
## [1] 21577.49
# parameters in detail
# balanced
summary(m.b)
##
## Call:
## lm(formula = neutr ~ gender * group, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.1667 -6.9000 0.4833 6.8500 27.9000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.100 2.096 22.946 < 2e-16 ***
## genderm -9.233 2.964 -3.115 0.00215 **
## grouphi 13.067 2.964 4.408 1.82e-05 ***
## grouplo -14.933 2.964 -5.037 1.17e-06 ***
## genderm:grouphi -4.067 4.192 -0.970 0.33339
## genderm:grouplo 9.967 4.192 2.377 0.01852 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.48 on 174 degrees of freedom
## Multiple R-squared: 0.428, Adjusted R-squared: 0.4115
## F-statistic: 26.04 on 5 and 174 DF, p-value: < 2.2e-16
# unbalanced
summary(m.u)
##
## Call:
## lm(formula = neutr ~ gender * group, data = dd.u)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.733 -7.083 1.133 6.308 27.833
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.733 2.823 17.615 < 2e-16 ***
## genderm -10.867 3.458 -3.143 0.00213 **
## grouphi 5.156 4.610 1.118 0.26582
## grouplo -16.567 3.458 -4.791 5.04e-06 ***
## genderm:grouphi 3.844 5.406 0.711 0.47846
## genderm:grouplo 9.200 5.989 1.536 0.12728
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.93 on 114 degrees of freedom
## Multiple R-squared: 0.3302, Adjusted R-squared: 0.3008
## F-statistic: 11.24 on 5 and 114 DF, p-value: 8.007e-09
# ... no surprise, results differ due to the loss of data in dd.u (m.u), although it is part of dd (m.b)
# with SS Type I, there should be differences depending on sequence of entrance
summary(aov(lm(neutr ~ gender + group + gender:group, data=dd.u)))
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 24 24 0.201 0.655
## group 2 6411 3205 26.809 2.87e-10 ***
## gender:group 2 285 142 1.192 0.307
## Residuals 114 13630 120
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(lm(neutr ~ group + gender + gender:group, data=dd.u)))
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 5184 2591.9 21.678 1.05e-08 ***
## gender 1 1251 1251.1 10.464 0.00159 **
## group:gender 2 285 142.5 1.192 0.30742
## Residuals 114 13630 119.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# the highest order interaction is always the same
summary(aov(lm(neutr ~ group + gender + gender:group, data=dd.u)))
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 5184 2591.9 21.678 1.05e-08 ***
## gender 1 1251 1251.1 10.464 0.00159 **
## group:gender 2 285 142.5 1.192 0.30742
## Residuals 114 13630 119.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(lm(neutr ~ gender:group + group + gender, data=dd.u)))
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 5184 2591.9 21.678 1.05e-08 ***
## gender 1 1251 1251.1 10.464 0.00159 **
## gender:group 2 285 142.5 1.192 0.30742
## Residuals 114 13630 119.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# the three types of SS result in differences
summary(aov(lm(neutr ~ gender + group + gender:group, data=dd.u)))
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 24 24 0.201 0.655
## group 2 6411 3205 26.809 2.87e-10 ***
## gender:group 2 285 142 1.192 0.307
## Residuals 114 13630 120
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(neutr ~ gender + group + gender:group, data=dd.u), type=2)
## Anova Table (Type II tests)
##
## Response: neutr
## Sum Sq Df F value Pr(>F)
## gender 1251.1 1 10.4639 0.001593 **
## group 6410.9 2 26.8094 2.865e-10 ***
## gender:group 285.0 2 1.1918 0.307425
## Residuals 13630.4 114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(neutr ~ gender + group + gender:group, data=dd.u), type=3)
## Anova Table (Type III tests)
##
## Response: neutr
## Sum Sq Df F value Pr(>F)
## (Intercept) 37101 1 310.3001 < 2.2e-16 ***
## gender 1181 1 9.8762 0.002134 **
## group 4713 2 19.7083 4.455e-08 ***
## gender:group 285 2 1.1918 0.307425
## Residuals 13630 114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# but coefficients keep the same
coefficients(lm(neutr ~ gender + group + gender:group, data=dd.u))
## (Intercept) genderm grouphi grouplo genderm:grouphi
## 49.733333 -10.866667 5.155556 -16.566667 3.844444
## genderm:grouplo
## 9.200000
coefficients(lm(neutr ~ group + gender + gender:group, data=dd.u))
## (Intercept) grouphi grouplo genderm grouphi:genderm
## 49.733333 5.155556 -16.566667 -10.866667 3.844444
## grouplo:genderm
## 9.200000
# the highes order interaction is always the same
coefficients(lm(neutr ~ gender:group + group + gender, data=dd.u))
## (Intercept) grouphi grouplo genderm genderm:grouphi
## 49.733333 5.155556 -16.566667 -10.866667 3.844444
## genderm:grouplo
## 9.200000
require(Rmisc)
## Lade nötiges Paket: Rmisc
summarySE(data=dd.u, "neutr", c("group","gender"))
## group gender N neutr sd se ci
## 1 av f 15 49.73333 12.020617 3.103710 6.656796
## 2 av m 30 38.86667 10.294737 1.879553 3.844118
## 3 hi f 9 54.88889 9.130231 3.043410 7.018117
## 4 hi m 30 47.86667 10.173438 1.857407 3.798824
## 5 lo f 30 33.16667 12.605646 2.301465 4.707025
## 6 lo m 6 31.50000 7.176350 2.929733 7.531118
detach("package:Rmisc", unload=TRUE)
dd.u <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide_unbal.txt")
# we want SS type I parameters
m.m <- aov(lm(neutr ~ gender + group + gender:group, data=dd.u))
require(ggplot2)
ggplot(dd.u, aes(x=group, y=neutr, fill=gender)) +
# define colors
scale_fill_manual(values=c("#CCCCCC","#FFFFFF")) +
# create white background etc
#theme_bw() +
stat_summary(fun.y=mean, geom="bar", position=position_dodge(width=0.7), colour='black') +
# add error information
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=.2) +
# put big points on top of bars
stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
geom_hline(aes(yintercept=coefficients(m.m)[1])) +
# add verbose axis labels
labs(x = "Anxiety-Group", y = "STAI-Trait")
## Warning: `fun.y` is deprecated. Use `fun` instead.
ezAnova()
der library(ez)
Wenn wir wissen, was genau geprüft werden soll und wenn es nicht um das Verständnis der Vorgänge ‘hinter den Kulissen’ geht, ist ezAnova()
ein komfortabler Weg, varianzanalytische Designs zu rechnen. Wir können den Quadratsummen-Typ explizit festlegen.
dd.u <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/virt/v_stait_exam_wide_unbal.txt")
require(ez)
## Lade nötiges Paket: ez
# fit model using ezANOVA
m.f <- ezANOVA(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), detailed=TRUE, type=2)
## Warning: Converting "subj" to factor for ANOVA.
## Warning: Converting "gender" to factor for ANOVA.
## Warning: Converting "group" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
m.f
## $ANOVA
## Effect DFn DFd SSn SSd F p p<.05
## 1 gender 1 114 1251.1225 13630.42 10.463944 1.593089e-03 *
## 2 group 2 114 6410.9426 13630.42 26.809421 2.865332e-10 *
## 3 gender:group 2 114 284.9988 13630.42 1.191814 3.074246e-01
## ges
## 1 0.08407209
## 2 0.31988553
## 3 0.02048079
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 5 114 154.5028 5472.989 0.6436453 0.6668645
#ezANOVA(data=dd.r, dv=.(anx), wid = .(subj), within = .(sit), detailed=TRUE, type=1)
# compare it to Anova
require(car)
Anova(lm(neutr ~ gender + group + gender:group, data=dd.u), type=2)
## Anova Table (Type II tests)
##
## Response: neutr
## Sum Sq Df F value Pr(>F)
## gender 1251.1 1 10.4639 0.001593 **
## group 6410.9 2 26.8094 2.865e-10 ***
## gender:group 285.0 2 1.1918 0.307425
## Residuals 13630.4 114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ez offers ezPlot()
ezPlot(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), x=group)
## Warning: Converting "subj" to factor for ANOVA.
## Warning: Converting "gender" to factor for ANOVA.
## Warning: Converting "group" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
## Warning in ezStats(data = data, dv = dv, wid = wid, within = within, within_full
## = within_full, : Unbalanced groups. Mean N will be used in computation of FLSD
#ezPlot(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), x=group, split=.(group))
ezPlot(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group), x=group, split=.(gender))
## Warning: Converting "subj" to factor for ANOVA.
## Warning: Converting "gender" to factor for ANOVA.
## Warning: Converting "group" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
## Warning in ezStats(data = data, dv = dv, wid = wid, within = within, within_full
## = within_full, : Unbalanced groups. Mean N will be used in computation of FLSD
# descriptives - the ez way
ezStats(data=dd.u, dv=.(neutr), wid = .(subj), between=.(gender, group))
## Warning: Converting "subj" to factor for ANOVA.
## Warning: Converting "gender" to factor for ANOVA.
## Warning: Converting "group" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
## Warning in ezStats(data = dd.u, dv = .(neutr), wid = .(subj), between = .
## (gender, : Unbalanced groups. Mean N will be used in computation of FLSD
## gender group N Mean SD FLSD
## 1 f av 15 49.73333 12.020617 6.849912
## 2 f hi 9 54.88889 9.130231 6.849912
## 3 f lo 30 33.16667 12.605646 6.849912
## 4 m av 30 38.86667 10.294737 6.849912
## 5 m hi 30 47.86667 10.173438 6.849912
## 6 m lo 6 31.50000 7.176350 6.849912
# some visualizations in ggplot
require(ggplot2)
ggplot(dd.u, aes(x=group, y=neutr)) +
# define colors
scale_fill_manual(values=c("#FFFFFF", "#CCCCCC", '#AAAAAA')) +
# create white background etc
#theme_bw() +
geom_point() +
# add verbose axis labels
labs(x = "Anxiety-Group", y = "STAI-Trait")
ggplot(dd.u, aes(x=group, y=neutr, fill=gender)) +
# define colors
scale_fill_manual(values=c("#FFFFFF", "#CCCCCC", '#AAAAAA')) +
# create white background etc
#theme_bw() +
stat_summary(fun.y=mean, geom="bar", position=position_dodge(width=0.7), colour='black') +
# add error information
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position=position_dodge(width=0.7), width=0.2) +
# put big points on top of bars
stat_summary(fun.data = mean_cl_normal, geom = "point", shape=21, size=5, position=position_dodge(width=0.7)) +
# add verbose axis labels
labs(x = "Anxiety-Group", y = "STAI-Trait")
## Warning: `fun.y` is deprecated. Use `fun` instead.
Führen Sie einfache VAs durch mit den PACS-Daten, beispielsweise, ob sich die verschiedenen Gewichts-Untergruppen hinsichtlich PACS-Werte unterscheiden.
Vollziehen Sie das Beispiel von Zahn (2010) nach
InteractionsAndTypesOfSS.pdf mit dem zugehörigen Datensatz
Überlegen Sie sich Situationen, in denen es bei unbalanced Data von Vorteil sein könnte, mit QS Type I zu rechnen d. h. in denen diese Art der Auswertung näher an einer psychologischen Fragestellung wäre, als die Anwendung von QS III
Konzept und ein Teil der Daten entstammen dem Buch Everitt (2010)
Field (2012, p. 474 - 477) in Kapitel 11: Analysis of Covariance, ANCOVA
InteractionsAndTypesOfSS.pdf mit dem zugehörigen Datensatz
ein ganzes Buch zu Anova-Designs, auch unbalanced, mit R-Code
estimated reaction variable is the mean of each subgroup that results of her factor level combination
… and is the same for every subgroup member
unequal subgroup sizes result in different overall probabilities to be member of one group compared to another
as group membership defines predicted values, some preditions are more probable than others per se, without knowing anything of an observations group membership
various manners to calculate sums of squares try to deal with this problem SS type 3 estimate the unique part a factor accounts for but does not attribute all explainable variance to effects.
sequence of taking factors into account in our lm formula in combination with type of SS matters for the effect, a factor accounts for
type argument defines type of SS in car::Anova(..., type = ...)
and ezANOVA(..., type = ...)
Version: 28 Mai, 2022 16:27