Die Präsentation als rmd
Chapter 18 of Serious Stats introduces multilevel models by considering them as an extension of repeated measures ANOVA models that can cope with missing outcomes, time-varying covariates and can relax the sphericity assumption of conventional repeated measures ANOVA. They can also deal with other – less well known – problems such as having stimuli that are random factor (e.g., see this post on my Psychological Statistics blog). Last, but not least, multilevel generalised linear models allow you to have discrete and bounded outcomes (e.g., dichotomous, ordinal or count data) rather than be constrained by as assuming a continuous response with normal errors. Quelle
unconditional model: Modell, in dem nur die Ebenen definiert werden. Null-Modell mit Ebenen. Auf jeder Ebene eine Konstante als Vorhersagewert.
Ebenenspezifische Modelle, also für jede Ausprägung von Ebene eine einzelne Analyse. Generalisierbarkeit fraglich.
Darstellungen hier mit nlme::lme()
. Eine gängige
Alternative für Multi-Level-Modelle in R ist lme4::lmer()
oder besser lmerTest::lmer()
, ein Wrapper für
lme4::lmer()
der auch Signifikanztests mit ausgibt.
Bei Convergenzfehlern, die nicht auf falschen Modellen beruhen: post
Correlation in Ausgabe bei Random Effects Da wir für jede Beobachtung auf Ebenen über Ebene 1 Parameterschätzungen für jede einzelne Beobachtung erhalten, können wir diese Parameter miteinander korrelieren. Dies ist Teil der Ausgabe.
lme4::lmer()
The general trick is, as mentioned in another answer, is that the formula follows the form dependent ~ independent | grouping. The grouping is generally a random factor, you can include fixed factors without any grouping and you can have additional random factors without any fixed factor (an intercept-only model). A + between factors indicates no interaction, a * indicates interaction.
For random factors, you have three basic variants:
Note that variant 3 has the slope and the intercept calculated in the same grouping, i.e. at the same time. If we want the slope and the intercept calculated independently, i.e. without any assumed correlation between the two, we need a fourth variant:
Intercept and slope, separately, by random factor: (1 | random.factor) + (0 + fixed.factor | random.factor)
There’s also a nice summary in response to a related question that you should look at.
If you’re up to digging into the math a bit, Barr et al. (2013) summarize the lmer syntax quite nicely in their Table 1, adapted here to meet the constraints of tableless markdown. That paper dealt with psycholinguistic data, so the two random effects are Subject and Item.
Models and equivalent lme4 formula syntax:
Nr | Formula | Syntax |
---|---|---|
1. | \(Y_{si}=β_0+β_1X_i+e_{si}\) | n/a (Not a mixed-effects model) |
2. | \(Y_{si}=β_0+S_{0s}+β_1X_i+e_{si}\) | Y ∼ X+(1∣Subject) |
3. | \(Y_{si}=β_0+S_{0s}+(β_1+S_{1s})Xi+e_{si}\) | Y ∼ X+(1 + X∣Subject) |
4. | \(Y_{si}=β_0+S_{0s}+I_{0i}+(β_1+S_{1s})Xi+e_{si}\) | Y ∼ X+(1 + X∣Subject)+(1∣Item) |
5. | \(Y_{si}=β_0+S_{0s}+I_{0i}+β_1X_i+e_{si}\) | Y ∼ X+(1∣Subject)+(1∣Item) |
6. | As (4), but S0s, S1s independent | Y ∼ X+(1∣Subject)+(0 + X∣ Subject)+(1∣Item) |
7. | \(Y_{si}=β_0+I_{0i}+(β_1+S_{1s})X_i+e_{si}\) | Y ∼ X+(0 + X∣Subject)+(1∣Item) |
References:
Barr, Dale J, R. Levy, C. Scheepers und H. J. Tily (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68:255– 278.
Einfache Regression Körpergewicht und Körpergröße,
ML-Parameterschätzung mit lme aus library(nlme)
. Modelltest
eines Nullmodells gegen ein Modell mit Slope.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
require(nlme)
## Lade nötiges Paket: nlme
m.0 <- lme(fixed= weight ~ 1, random = ~1 | subj, data=ddf, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
## Data: ddf
## AIC BIC logLik
## 273.7854 277.9889 -133.8927
##
## Random effects:
## Formula: ~1 | subj
## (Intercept) Residual
## StdDev: 19.6556 7.370852
##
## Fixed effects: weight ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 67.83333 3.898153 30 17.40141 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.4488246 -0.2439264 -0.1310233 0.2035043 1.0398234
##
## Number of Observations: 30
## Number of Groups: 30
m.1 <- lme(fixed= weight ~ 1 + height, random = ~1 | subj, data=ddf, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
## Data: ddf
## AIC BIC logLik
## 264.2429 269.8477 -128.1214
##
## Random effects:
## Formula: ~1 | subj
## (Intercept) Residual
## StdDev: 16.21583 6.080935
##
## Fixed effects: weight ~ 1 + height
## Value Std.Error DF t-value p-value
## (Intercept) -131.92913 55.2076 28 -2.389691 0.0238
## height 1.13782 0.3139 28 3.624762 0.0011
## Correlation:
## (Intr)
## height -0.998
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -0.47221438 -0.28102107 -0.07517554 0.23402698 0.88130174
##
## Number of Observations: 30
## Number of Groups: 30
anova(m.0, m.1)
## Model df AIC BIC logLik Test L.Ratio p-value
## m.0 1 3 273.7853 277.9889 -133.8927
## m.1 2 4 264.2429 269.8476 -128.1214 1 vs 2 11.54249 7e-04
# compare it with least square linear model
lm.0 <- lm(ddf$weight ~ 1)
summary(lm.0)
##
## Call:
## lm(formula = ddf$weight ~ 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.833 -14.583 -7.833 12.167 62.167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 67.833 3.898 17.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.35 on 29 degrees of freedom
mean(ddf$weight)
## [1] 67.83333
lm.1 <- lm(ddf$weight ~ ddf$height)
summary(lm.1)
##
## Call:
## lm(formula = ddf$weight ~ ddf$height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.291 -13.861 -3.708 11.543 43.469
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -131.9291 55.2076 -2.390 0.02384 *
## ddf$height 1.1378 0.3139 3.625 0.00114 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.93 on 28 degrees of freedom
## Multiple R-squared: 0.3194, Adjusted R-squared: 0.2951
## F-statistic: 13.14 on 1 and 28 DF, p-value: 0.001138
anova(lm.0, lm.1)
## Analysis of Variance Table
##
## Model 1: ddf$weight ~ 1
## Model 2: ddf$weight ~ ddf$height
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 29 13220.2
## 2 28 8997.9 1 4222.2 13.139 0.001138 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# get a plot for it
plot(ddf$height, ddf$weight)
abline(lm.1, col = "red")
proportion of the variation that is being contributed by different levels of random effects. source
Wenn in höherem Level nur noch wenig Residualvarianz übrig ist, ist der ICC hoch. Die Kontext-Einflüsse (der Level-Ausprägung) sind also stark bzw. haben großen Einfluss auf die Ausprägung der AV.
Je größer der Varianzanteil zwischen den Gruppen (Stufen auf einem Level) ist, desto weniger Varianz besteht anteilig innerhalb der Gruppen. Das bedeutet, dass sich die Werte innerhalb der Gruppen ähnlicher sind, also stärker miteinander korellieren.
Partitionierung der Residualvarianz:
todo: Grafik hierzu
ICC-Paper of Andy Field (2012) on companon site. paper
ICC and VarCorr
VarCorr()
to get the necessary variances to
calculate ICCnlme::lme()
we have to co as.numeric()
on the VarCorr()
object, because content type is
string.VarCorr()
is different in nlme::lme()
and
lme4::lmer()
Example in “check” below.
lme()
und missing dataMixed models based on likelihood methods can often handle missing observations within subjects, but they not do well with missing individual elements in the design matrices (think unit nonresponse vs item nonresponse in the survey world). Continuing with the example I recently sent to you
set.seed(0112358)
library(nlme)
# we create a dataframe withount missings
school <- factor(rep(1:20, each=4))
year <- rep(2006:2009, 20)
year.c <- year - mean(year)
tmt <- sample(0:1, 20, replace = TRUE)[school]
math <- rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school]
tmt <- factor(tmt)
df.f <- data.frame(math, school, tmt, year, year.c)
rm(math, school, year, year.c, tmt)
# Now create missing observations,
# -- note just a single obs for the first school:
df.m <- df.f[-c(2:6, sample(75, 5)), ]
table(df.m$school)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 1 2 4 4 3 4 3 3 4 3 4 4 4 4 3 4 4 4 4 4
#
# school: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# numobs: 1 2 4 3 4 4 4 4 4 3 3 4 4 4 3 4 4 4 4 4
m.f <- lme(math ~ year.c*tmt, data = df.f, random=~1|school)
m.m <- lme(math ~ year.c*tmt, data = df.m, random=~1|school)
intervals(m.f)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 3.1358160 3.84090374 4.5459915
## year.c -0.2776773 0.16550116 0.6086796
## tmt1 0.4895508 1.48740943 2.4852680
## year.c:tmt1 -0.5819210 0.01566072 0.6132424
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.3849488 0.7518102 1.468295
##
## Within-group standard error:
## lower est. upper
## 1.238062 1.485190 1.781648
intervals(m.m)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 2.9435062 3.61474585 4.2859855
## year.c -0.2744510 0.18578479 0.6460206
## tmt1 0.6135467 1.56693004 2.5203134
## year.c:tmt1 -0.5838501 0.02736675 0.6385836
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.3063612 0.6718072 1.473179
##
## Within-group standard error:
## lower est. upper
## 1.139500 1.391602 1.699479
Note that even with the missing obs, in this instance we got pretty good estimates of the variance components (parameters: within-group sd = 1.5; between-school sd = 1). However, I had to change the seed from the one I used in the last email because, as is not uncommon with unbalanced data and relatively small sample sizes, there were convergence problems. If you were using classical MoM estimators fitting a mixed model with unbalanced data would lead to very precarious inferences (not that the likelihood inference isn’t without it share of pitfalls, but that’s another email…)
formula | description |
---|---|
\(Y_{ijk} = \gamma_{000} + v_{00k} + u_{0jk} + \varepsilon_{ijk}\) | intercept |
\(+ \gamma_{100} \cdot X_{1_{ijk}} + \gamma_{010} \cdot X_{2_{0jk}} + \gamma_{001} \cdot X_{3_{00k}}\) | Fixed Effects |
\(+ v_{10k} \cdot X_{1_{ijk}} + u_{1jk} \cdot X_{1_{ijk}}\) | random slopes des Level-1-Prädiktors |
\(+ v_{20k} \cdot X_{2_{0jk}}\) | random slope des Level-2-Prädiktors |
\(+ \gamma_{110} \cdot X_{1_{ijk}} \cdot
X_{2_{0jk}}\) \(+ \gamma_{101} \cdot X_{1_{ijk}} \cdot X_{3_{00k}}\) \(+ \gamma_{011} \cdot X_{2_{0jk}} \cdot X_{3_{00k}}\) |
cross-level-2-fach-Interaktionen |
\(+ \gamma_{111} \cdot X_{1_{ijk}} \cdot X_{2_{0jk}} \cdot X_{3_{00k}}\) | cross-level-3-fach-Interaktion |
nlme::gls()
bzw. nlme::lme()
Im Summary von nlme::gls()
bzw. nlme::lme()
taucht immer auch eine Korrelationstabelle mit auf, sowohl bei den
fixed, als auch bei den random Effects.
Der Koeffizient zeigt die Korrelation zwischen Intercept und Slope, wenn der Slope variieren würde, also z. B. bei Wiederholung von Erhebungen. Für die einzelne untersuchte Stichprobe gibt es keine berechenbare Korrelation zwischen Intercept und Slope, weil sie für die Stichprobe konstant sind.
Eine Erklärung und Visualisierung
des Intercept in der Correlation Tabelle von nlme::gls()
bzw. nlme::lme()
Visualisation
Nach Zentrieren der Daten verschwindet die Korrelation, sie ist also abhängig von der Skalierung der Erklärvariablen. Meist ist diese Korrelation für die Interpretation irrelevant.
nlme::lme()
Inspired by StackExchange [https://stats.stackexchange.com/questions/367957/autoregression-in-nlme-undestanding-how-to-specify-corar1]
Daten sind simuliert. Wir können uns vorstellen,
es
value` seien die Stimmungs- bzw. Befindlichkeitswerte als
Rating zwischen -10 und 10 im Laufe einer Woche.
set.seed(100)
n.subj <- 20
n.times <- 6
# subj.c <-rep(NA, n.subj)
subj.c <- rnorm(n.subj, sd=2)
subj.s <- rnorm(n.subj, sd=1)
mat = matrix(ncol = n.times, nrow = n.subj)
# converting the matrix to data frame
dd=data.frame(mat)
# line <- rep(subj.c[1], n.times)
# line <- line + rnorm(n.subj, sd=0.2)
for (subj in 1:n.subj) {
resid <- rnorm(n.times, 0, sd=2)
for (tt in 1:n.times) {
dd[subj,tt] <- subj.c[subj] + (tt - n.times/2) * subj.s[subj] + resid[tt]
}
}
for (cc in 1:ncol(dd)) {colnames(dd)[cc] <- paste('t', as.character(cc), sep="_")}
# for (cc in 1:ncol(dd)) {colnames(dd)[cc] <- as.numeric(cc)}
dd <- cbind(1:nrow(dd), dd)
colnames(dd)[1] <- "subj"
dd.w <- dd
dd <- dd.w %>% tidyr::pivot_longer(cols = 2:ncol(dd), names_to = "time.s", values_to = "value")
dd <- dd %>% dplyr::rowwise() %>% dplyr::mutate(time = as.numeric(strsplit(time.s, split="_")[[1]][2]))
# dd <- dd %>% dplyr::mutate(time.n = as.numeric(time))
dd %>% ggplot(aes(x=time, y=value, group=subj, color=factor(subj))) +
geom_point() +
geom_line()
#value = rnorm(100)
#time = rep(1:5, each = 20)
#id = rep(letters[1:20],5)
#dd = data.frame(value, time, id)
fit.0 <- dd %>% nlme::lme(value ~ time, random=~ 1 | subj,
correlation = NULL,
data=.)
fit.1 <- dd %>% nlme::lme(value ~ time, random=~ 1 | subj,
correlation=corAR1(),
data=.)
# random intercept and slope model without correlation between time steps
fit.2a <- dd %>% nlme::lme(value ~ time, random=~ 1 + time | subj,
correlation=NULL,
data=.)
fit.2b <- dd %>% nlme::lme(value ~ time, random=~ 1 + time | subj,
correlation=corAR1(form=~1), # time is equally spaced (one daily value measured)
# correlation=corAR1(form=~1|subj), # time is equally spaced and subj are uncorrelated
data=.)
# just to show syntax and idea, corCAR1
# random intercept and slope model with time as continuous variable for correlation structure per subj
fit.2c <- dd %>% nlme::lme(value ~ time, random=~ 1 + time | subj,
correlation=corCAR1(form=~time), # this would also account for varying time lags
data=.)
anova(fit.2a, fit.2b)
## Model df AIC BIC logLik Test L.Ratio p-value
## fit.2a 1 6 569.9247 586.5488 -278.9623
## fit.2b 2 7 563.4894 582.8841 -274.7447 1 vs 2 8.435332 0.0037
anova(fit.0, fit.1, fit.2a)
## Model df AIC BIC logLik Test L.Ratio p-value
## fit.0 1 4 588.9028 599.9855 -290.4514
## fit.1 2 5 589.9272 603.7806 -289.9636 1 vs 2 0.975598 0.3233
## fit.2a 3 6 569.9247 586.5488 -278.9623 2 vs 3 22.002509 <.0001
summary(fit.0)
## Linear mixed-effects model fit by REML
## Data: .
## AIC BIC logLik
## 588.9028 599.9855 -290.4514
##
## Random effects:
## Formula: ~1 | subj
## (Intercept) Residual
## StdDev: 1.321498 2.50516
##
## Fixed effects: value ~ time
## Value Std.Error DF t-value p-value
## (Intercept) -0.12179816 0.5993916 99 -0.2032030 0.8394
## time 0.08129237 0.1339064 99 0.6070834 0.5452
## Correlation:
## (Intr)
## time -0.782
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.55525559 -0.59529315 0.05664013 0.54989467 2.84974370
##
## Number of Observations: 120
## Number of Groups: 20
summary(fit.1)
## Linear mixed-effects model fit by REML
## Data: .
## AIC BIC logLik
## 589.9272 603.7806 -289.9636
##
## Random effects:
## Formula: ~1 | subj
## (Intercept) Residual
## StdDev: 1.231883 2.56317
##
## Correlation Structure: AR(1)
## Formula: ~1 | subj
## Parameter estimate(s):
## Phi
## 0.1292435
## Fixed effects: value ~ time
## Value Std.Error DF t-value p-value
## (Intercept) -0.16008511 0.6338599 99 -0.2525560 0.8011
## time 0.09201056 0.1451945 99 0.6337054 0.5277
## Correlation:
## (Intr)
## time -0.802
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.59655646 -0.59732118 0.04912839 0.60991983 2.85002705
##
## Number of Observations: 120
## Number of Groups: 20
summary(fit.2b)
## Linear mixed-effects model fit by REML
## Data: .
## AIC BIC logLik
## 563.4894 582.8841 -274.7447
##
## Random effects:
## Formula: ~1 + time | subj
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 2.1001965 (Intr)
## time 0.8395891 -0.866
## Residual 1.8947812
##
## Correlation Structure: AR(1)
## Formula: ~1 | subj
## Parameter estimate(s):
## Phi
## -0.3971962
## Fixed effects: value ~ time
## Value Std.Error DF t-value p-value
## (Intercept) -0.02110225 0.5551392 99 -0.03801254 0.9698
## time 0.05301270 0.2030656 99 0.26106201 0.7946
## Correlation:
## (Intr)
## time -0.863
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.56356806 -0.69260877 -0.07906837 0.68254335 2.20481596
##
## Number of Observations: 120
## Number of Groups: 20
This dataset contains three features from 26 students in Oxford: student id, (measured) height, and (measurement) year.
df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/div/oxboys.txt", fileEncoding = "UTF-8")
head(df)
## subj age height occasion
## 1 1 -1.0000 140.5 1
## 2 1 -0.7479 143.4 2
## 3 1 -0.4630 144.8 3
## 4 1 -0.1643 147.1 4
## 5 1 -0.0027 147.7 5
## 6 1 0.2466 150.2 6
# Plot the relationship between height and year, and draw a linearly regressed line ignoring the id variable.
plot(df$occasion, df$height, col="blue", main="Oxford Students: Height vs. Year", cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.5, xlim = c(0,10))
abline(lm(height~occasion, data=df), col="red", lwd=5)
library(nlme)
# Plot the relationship between height and year, but this time, fit a different linear regression for each individual. In ggplot, you can do it by specifying the group variable: ggplot(oxboys, aes(x=year, y=height, group=id)) + stat_smooth(method=“lm”) Other plotting packages also should have some such facility. Comment on differences between (a) and (b).
library(ggplot2)
ggplot(data=df, aes(x=occasion, y=height, group=subj)) +
stat_smooth(method="lm", color="red") +
ggtitle("Oxford Boys: Height vs. Occasion") +
expand_limits(x=c(min(df$occasion),max(df$occasion))) +
scale_x_continuous(breaks=seq(min(df$occasion), max(df$occasion), 1)) +
theme(axis.title = element_text(face="bold", colour="blue", size=20),
plot.title = element_text(family = "Trebuchet MS", color="blue", face="bold", size=22))
## `geom_smooth()` using formula 'y ~ x'
The first graph plots the relationship between year and height for all students. The abline represents the line of best fit for a global MLR model. In contrast, the ggplot graph shows the relationship between year and height with respect to each student (group=id). This is a graphic representation of a local model (i.e., 26 different linear regressions).
Pitch hier Stimmlage
Description: This experiment (from a collaboration with Tim Wells and Andrew Dunn) involves looking at the pitch of male voices making attractiveness ratings with respect to female faces. The effect of interest (for this example) is whether average pitch goes up or done for higher ratings (and if so, by how much). A conventional ANOVA is problematic because this is a design with two fully crossed random factors – each participant (n = 30) sees each face (n = 32) and any conclusions ought to generalise both to other participants and (crucially) to other faces. Furthermore, there is a time-varying covariate – the baseline pitch of the numerical rating when no face is presented.
lme4::lmer()
# we could get the original data via
# pitch.dat <- read.csv('http://www2.ntupsychology.net/seriousstats/pitch.csv')
# we use a local copy for convenience
# write.table(pitch.dat, "pitch.txt", quote=F, sep="\t", row.names=F)
df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/div/pitch.txt")
head(df)
## Constant Participant Face Context attract pitch base
## 1 1 1 1 1 2 99.88 97.79
## 2 1 1 2 1 1 108.50 100.40
## 3 1 1 3 1 3 99.06 101.00
## 4 1 1 4 1 4 97.02 100.28
## 5 1 1 5 1 2 102.04 97.79
## 6 1 1 6 1 3 98.38 101.00
require(psych)
## Lade nötiges Paket: psych
##
## Attache Paket: 'psych'
## Die folgenden Objekte sind maskiert von 'package:ggplot2':
##
## %+%, alpha
describe(df)
## vars n mean sd median trimmed mad min max range
## Constant 1 1920 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00
## Participant 2 1920 15.50 8.66 15.50 15.50 11.12 1.00 30.00 29.00
## Face 3 1920 16.50 9.24 16.50 16.50 11.86 1.00 32.00 31.00
## Context 4 1920 1.50 0.50 1.50 1.50 0.74 1.00 2.00 1.00
## attract 5 1920 3.86 2.25 4.00 3.69 2.97 1.00 9.00 8.00
## pitch 6 1920 114.95 18.42 114.14 114.03 16.98 74.25 186.26 112.01
## base 7 1920 112.67 17.05 109.80 111.94 17.28 82.99 159.13 76.14
## skew kurtosis se
## Constant NaN NaN 0.00
## Participant 0.00 -1.2 0.20
## Face 0.00 -1.2 0.21
## Context 0.00 -2.0 0.01
## attract 0.46 -0.8 0.05
## pitch 0.48 0.1 0.42
## base 0.39 -0.5 0.39
require(Rmisc)
## Lade nötiges Paket: Rmisc
## Lade nötiges Paket: lattice
## Lade nötiges Paket: plyr
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attache Paket: 'plyr'
## Die folgenden Objekte sind maskiert von 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## Das folgende Objekt ist maskiert 'package:purrr':
##
## compact
summarySE(data=df, "pitch")
## .id N pitch sd se ci
## 1 <NA> 1920 114.9475 18.4154 0.420272 0.8242378
summarySE(data=df, "pitch", c("attract"))
## attract N pitch sd se ci
## 1 1 349 112.7665 17.37810 0.9302284 1.829577
## 2 2 314 114.8755 18.12165 1.0226639 2.012165
## 3 3 280 112.6467 16.07398 0.9606043 1.890953
## 4 4 264 116.1327 18.04405 1.1105343 2.186670
## 5 5 238 113.3962 17.06602 1.1062258 2.179291
## 6 6 181 120.0590 22.23088 1.6524080 3.260582
## 7 7 141 119.6528 19.95647 1.6806383 3.322712
## 8 8 109 116.1674 20.26676 1.9412037 3.847803
## 9 9 44 109.5525 17.85199 2.6912883 5.427500
summarySE(data=df, "pitch", c("Context"))
## Context N pitch sd se ci
## 1 1 960 114.7716 18.54056 0.5983939 1.174313
## 2 2 960 115.1233 18.29735 0.5905444 1.158908
summarySE(data=df, "pitch", c("attract", "Context"))
## attract Context N pitch sd se ci
## 1 1 1 177 112.5491 18.07397 1.358523 2.681092
## 2 1 2 172 112.9902 16.68154 1.271956 2.510756
## 3 2 1 167 114.1406 18.77047 1.452502 2.867759
## 4 2 2 147 115.7105 17.38127 1.433583 2.833255
## 5 3 1 147 111.3256 15.13055 1.247947 2.466374
## 6 3 2 133 114.1068 16.99413 1.473578 2.914883
## 7 4 1 128 116.7216 17.43147 1.540739 3.048845
## 8 4 2 136 115.5785 18.64952 1.599183 3.162692
## 9 5 1 121 112.4814 16.14998 1.468180 2.906894
## 10 5 2 117 114.3422 17.98501 1.662715 3.293216
## 11 6 1 90 121.7013 21.80328 2.298268 4.566609
## 12 6 2 91 118.4347 22.64857 2.374215 4.716793
## 13 7 1 66 121.2992 21.26681 2.617763 5.228034
## 14 7 2 75 118.2040 18.75236 2.165336 4.314526
## 15 8 1 45 116.3747 21.07783 3.142097 6.332481
## 16 8 2 64 116.0217 19.84412 2.480515 4.956914
## 17 9 1 19 109.8332 20.53339 4.710683 9.896778
## 18 9 2 25 109.3392 15.95831 3.191661 6.587265
require(ggplot2)
ggplot(data=df, aes(x=attract, y=pitch)) + geom_point() + geom_smooth(method = lm, se=F)
## `geom_smooth()` using formula 'y ~ x'
# regression line indicates a slightly growing pitch with growing attractivity
# require(ez)
# ezANOVA(data=)
library(lme4)
## Lade nötiges Paket: Matrix
##
## Attache Paket: 'Matrix'
## Die folgenden Objekte sind maskiert von 'package:tidyr':
##
## expand, pack, unpack
##
## Attache Paket: 'lme4'
## Das folgende Objekt ist maskiert 'package:nlme':
##
## lmList
pitch.me <- lmer(pitch ~ base + attract + (1|Face) + (1|Participant), data=df)
pitch.me
## Linear mixed model fit by REML ['lmerMod']
## Formula: pitch ~ base + attract + (1 | Face) + (1 | Participant)
## Data: df
## REML criterion at convergence: 14121.79
## Random effects:
## Groups Name Std.Dev.
## Face (Intercept) 0.6665
## Participant (Intercept) 13.1046
## Residual 9.1909
## Number of obs: 1920, groups: Face, 32; Participant, 30
## Fixed Effects:
## (Intercept) base attract
## 90.0333 0.2054 0.4591
coef(pitch.me)
## $Face
## (Intercept) base attract
## 1 90.32040 0.2054025 0.4591019
## 2 89.85338 0.2054025 0.4591019
## 3 89.86314 0.2054025 0.4591019
## 4 89.67938 0.2054025 0.4591019
## 5 90.19382 0.2054025 0.4591019
## 6 89.88064 0.2054025 0.4591019
## 7 90.91743 0.2054025 0.4591019
## 8 89.58486 0.2054025 0.4591019
## 9 90.15139 0.2054025 0.4591019
## 10 89.93114 0.2054025 0.4591019
## 11 89.74059 0.2054025 0.4591019
## 12 89.77798 0.2054025 0.4591019
## 13 90.04976 0.2054025 0.4591019
## 14 90.09177 0.2054025 0.4591019
## 15 90.05765 0.2054025 0.4591019
## 16 90.22825 0.2054025 0.4591019
## 17 89.98391 0.2054025 0.4591019
## 18 90.17051 0.2054025 0.4591019
## 19 90.07468 0.2054025 0.4591019
## 20 90.22505 0.2054025 0.4591019
## 21 89.49214 0.2054025 0.4591019
## 22 89.86455 0.2054025 0.4591019
## 23 90.77257 0.2054025 0.4591019
## 24 90.15636 0.2054025 0.4591019
## 25 90.63089 0.2054025 0.4591019
## 26 89.48716 0.2054025 0.4591019
## 27 89.78225 0.2054025 0.4591019
## 28 90.04786 0.2054025 0.4591019
## 29 89.98818 0.2054025 0.4591019
## 30 90.01073 0.2054025 0.4591019
## 31 90.21006 0.2054025 0.4591019
## 32 89.84846 0.2054025 0.4591019
##
## $Participant
## (Intercept) base attract
## 1 79.24162 0.2054025 0.4591019
## 2 114.29585 0.2054025 0.4591019
## 3 94.88277 0.2054025 0.4591019
## 4 94.88836 0.2054025 0.4591019
## 5 86.91915 0.2054025 0.4591019
## 6 91.32206 0.2054025 0.4591019
## 7 93.85191 0.2054025 0.4591019
## 8 66.57846 0.2054025 0.4591019
## 9 96.38930 0.2054025 0.4591019
## 10 103.64559 0.2054025 0.4591019
## 11 110.36171 0.2054025 0.4591019
## 12 94.21022 0.2054025 0.4591019
## 13 89.44698 0.2054025 0.4591019
## 14 89.51116 0.2054025 0.4591019
## 15 79.57811 0.2054025 0.4591019
## 16 87.44646 0.2054025 0.4591019
## 17 110.05654 0.2054025 0.4591019
## 18 96.62059 0.2054025 0.4591019
## 19 86.64688 0.2054025 0.4591019
## 20 96.25067 0.2054025 0.4591019
## 21 80.82324 0.2054025 0.4591019
## 22 77.52477 0.2054025 0.4591019
## 23 74.52455 0.2054025 0.4591019
## 24 69.88148 0.2054025 0.4591019
## 25 71.22787 0.2054025 0.4591019
## 26 103.50483 0.2054025 0.4591019
## 27 114.56861 0.2054025 0.4591019
## 28 88.38784 0.2054025 0.4591019
## 29 85.04500 0.2054025 0.4591019
## 30 73.36771 0.2054025 0.4591019
##
## attr(,"class")
## [1] "coef.mer"
fixef(pitch.me)
## (Intercept) base attract
## 90.0333423 0.2054025 0.4591019
ranef(pitch.me)
## $Face
## (Intercept)
## 1 0.28706115
## 2 -0.17996662
## 3 -0.17020403
## 4 -0.35396199
## 5 0.16048244
## 6 -0.15270484
## 7 0.88409253
## 8 -0.44848593
## 9 0.11805208
## 10 -0.10219812
## 11 -0.29275053
## 12 -0.25536038
## 13 0.01642258
## 14 0.05842830
## 15 0.02430616
## 16 0.19490993
## 17 -0.04943207
## 18 0.13716879
## 19 0.04133560
## 20 0.19170746
## 21 -0.54120048
## 22 -0.16879624
## 23 0.73922783
## 24 0.12302105
## 25 0.59754718
## 26 -0.54618038
## 27 -0.25108817
## 28 0.01451355
## 29 -0.04516416
## 30 -0.02261380
## 31 0.17671811
## 32 -0.18488701
##
## $Participant
## (Intercept)
## 1 -10.7917253
## 2 24.2625050
## 3 4.8494235
## 4 4.8550207
## 5 -3.1141924
## 6 1.2887170
## 7 3.8185653
## 8 -23.4548810
## 9 6.3559594
## 10 13.6122507
## 11 20.3283647
## 12 4.1768759
## 13 -0.5863578
## 14 -0.5221842
## 15 -10.4552290
## 16 -2.5868870
## 17 20.0232022
## 18 6.5872516
## 19 -3.3864650
## 20 6.2173245
## 21 -9.2101024
## 22 -12.5085737
## 23 -15.5087923
## 24 -20.1518667
## 25 -18.8054717
## 26 13.4714843
## 27 24.5352670
## 28 -1.6455045
## 29 -4.9883462
## 30 -16.6656326
##
## with conditional variances for "Face" "Participant"
head(predict(pitch.me))
## 1 2 3 4 5 6
## 100.5332 100.1432 101.1944 101.3218 100.4066 101.2119
Der Datensatz ist Teil der library(lme4)
. Beschreibung
des Datensatzes findet sich unter help(sleepstudy)
require(lme4)
df <- sleepstudy
head(sleepstudy)
## Reaction Days Subject
## 1 249.5600 0 308
## 2 258.7047 1 308
## 3 250.8006 2 308
## 4 321.4398 3 308
## 5 356.8519 4 308
## 6 414.6901 5 308
# some descriptives
require(psych)
psych::describe(sleepstudy)
## vars n mean sd median trimmed mad min max range skew
## Reaction 1 180 298.51 56.33 288.65 295.17 59.34 194.33 466.35 272.02 0.58
## Days 2 180 4.50 2.88 4.50 4.50 3.71 0.00 9.00 9.00 0.00
## Subject* 3 180 9.50 5.20 9.50 9.50 6.67 1.00 18.00 17.00 0.00
## kurtosis se
## Reaction 0.03 4.20
## Days -1.24 0.21
## Subject* -1.23 0.39
require(Rmisc)
summarySE(data=df, "Reaction")
## .id N Reaction sd se ci
## 1 <NA> 180 298.5079 56.32876 4.198498 8.284918
summarySE(data=df, "Reaction", "Days")
## Days N Reaction sd se ci
## 1 0 18 256.6518 32.12945 7.572984 15.97760
## 2 1 18 264.4958 33.43033 7.879605 16.62451
## 3 2 18 265.3619 29.47342 6.946953 14.65679
## 4 3 18 282.9920 38.85774 9.158857 19.32350
## 5 4 18 288.6494 42.53789 10.026276 21.15359
## 6 5 18 308.5185 51.76962 12.202218 25.74443
## 7 6 18 312.1783 63.17372 14.890189 31.41555
## 8 7 18 318.7506 50.10396 11.809617 24.91611
## 9 8 18 336.6295 60.19972 14.189209 29.93661
## 10 9 18 350.8512 66.98616 15.788788 33.31143
summarySE(data=df, "Reaction", "Subject")
## Subject N Reaction sd se ci
## 1 308 10 342.1338 79.82176 25.241858 57.101050
## 2 309 10 215.2330 10.81219 3.419116 7.734577
## 3 310 10 231.0013 21.85600 6.911473 15.634838
## 4 330 10 303.2214 22.90920 7.244525 16.388254
## 5 331 10 309.4361 27.24261 8.614869 19.488187
## 6 332 10 307.3021 64.30613 20.335382 46.001831
## 7 333 10 316.1583 30.06821 9.508402 21.509500
## 8 334 10 295.3021 41.85561 13.235905 29.941698
## 9 335 10 250.0700 13.83385 4.374648 9.896142
## 10 337 10 375.7210 59.62379 18.854698 42.652289
## 11 349 10 275.8345 42.93794 13.578170 30.715954
## 12 350 10 313.6027 63.36056 20.036368 45.325413
## 13 351 10 290.0978 28.97881 9.163905 20.730192
## 14 352 10 337.4215 47.60238 15.053194 34.052691
## 15 369 10 306.0346 37.46043 11.846028 26.797577
## 16 370 10 291.7018 59.20821 18.723281 42.355004
## 17 371 10 294.9840 36.50599 11.544207 26.114811
## 18 372 10 317.8861 35.82280 11.328165 25.626090
require(ggplot2)
ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + geom_point(aes(color=Subject)) + geom_smooth(method = lm, se=F)
## `geom_smooth()` using formula 'y ~ x'
# we convert Days to type factor
df$f.Days <- factor(df$Days)
# the classical anova approach
require(ez)
## Lade nötiges Paket: ez
m.ez <- ezANOVA(data=df, dv=.(Reaction), wid = .(Subject), within = .(f.Days), detailed=TRUE, type=2, return_aov = T)
m.ez
## $ANOVA
## Effect DFn DFd SSn SSd F p p<.05 ges
## 1 (Intercept) 1 17 16039253.0 250618.1 1087.9793 7.475644e-17 * 0.975566
## 2 f.Days 9 153 166235.1 151101.0 18.7027 8.995345e-21 * 0.292691
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 f.Days 0.0002190204 5.154879e-08 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05
## 2 f.Days 0.3690298 5.463092e-09 * 0.4693648 7.076552e-11 *
##
## $aov
##
## Call:
## aov(formula = formula(aov_formula), data = data)
##
## Grand Mean: 298.5079
##
## Stratum 1: Subject
##
## Terms:
## Residuals
## Sum of Squares 250618.1
## Deg. of Freedom 17
##
## Residual standard error: 121.4176
##
## Stratum 2: Subject:f.Days
##
## Terms:
## f.Days Residuals
## Sum of Squares 166235.1 151101.0
## Deg. of Freedom 9 153
##
## Residual standard error: 31.42592
## Estimated effects may be unbalanced
# descriptives
ezStats(data=df, dv=.(Reaction), wid = .(Subject), within = .(f.Days))
## f.Days N Mean SD FLSD
## 1 0 18 256.6518 32.12945 20.69491
## 2 1 18 264.4958 33.43033 20.69491
## 3 2 18 265.3619 29.47342 20.69491
## 4 3 18 282.9920 38.85774 20.69491
## 5 4 18 288.6494 42.53789 20.69491
## 6 5 18 308.5185 51.76962 20.69491
## 7 6 18 312.1783 63.17372 20.69491
## 8 7 18 318.7506 50.10396 20.69491
## 9 8 18 336.6295 60.19972 20.69491
## 10 9 18 350.8512 66.98616 20.69491
# and a plot
ezPlot(data=df, dv=.(Reaction), wid = .(Subject), within = .(f.Days), x=.(f.Days))
# we store aov element for convenience
o.aov <- m.ez[['aov']][['Subject:f.Days']]
# we calculate predicted values and store them for further use
# reference group is mean at day zero 256.6518
mean.day.0 <- 256.6518
df$pred.ez <- NA
adds.one <- c(0, o.aov[['coefficients']])
adds <- rep(adds.one, length(unique(df$Subject)))
# predicted values are group means
# we get to group means by adding the coefficients to the mean of the reference group
df$pred.ez <-adds + mean.day.0
# correlation of predicted values and reaction variable
cor(df$Reaction, df$pred.ez)
## [1] 0.5410093
# standard deviation of residuals
sd(df$Reaction - df$pred.ez)
## [1] 47.37342
require(lattice)
xyplot(Reaction ~ Days | Subject, sleepstudy, type = c("g","p","r"),
index = function(x,y) coef(lm(y ~ x))[1],
xlab = "Days of sleep deprivation",
ylab = "Average reaction time (ms)", aspect = "xy")
(fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
## REML criterion at convergence: 1743.628
## Random effects:
## Groups Name Std.Dev. Corr
## Subject (Intercept) 24.741
## Days 5.922 0.07
## Residual 25.592
## Number of obs: 180, groups: Subject, 18
## Fixed Effects:
## (Intercept) Days
## 251.41 10.47
(fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
## Data: sleepstudy
## REML criterion at convergence: 1743.669
## Random effects:
## Groups Name Std.Dev.
## Subject (Intercept) 25.051
## Subject.1 Days 5.988
## Residual 25.565
## Number of obs: 180, groups: Subject, 18
## Fixed Effects:
## (Intercept) Days
## 251.41 10.47
require(lme4)
df <- sleepstudy
# detach package lme4, we only need it's sleepstudy data
# detach("package:lme4", unload=TRUE)
require(nlme)
m.0 <- lme(fixed= Reaction ~ 1, random = ~1 | Subject, data=df, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
## Data: df
## AIC BIC logLik
## 1916.541 1926.12 -955.2705
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 34.58954 44.25907
##
## Fixed effects: Reaction ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 298.5079 8.81949 162 33.84639 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.4914591 -0.5481683 -0.1470795 0.5178317 3.3461410
##
## Number of Observations: 180
## Number of Groups: 18
m.1 <- lme(fixed= Reaction ~ 1 + Days, random = ~1 | Subject, data=df, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
## Data: df
## AIC BIC logLik
## 1802.079 1814.85 -897.0393
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 36.01208 30.89543
##
## Fixed effects: Reaction ~ 1 + Days
## Value Std.Error DF t-value p-value
## (Intercept) 251.40510 9.559442 161 26.29914 0
## Days 10.46729 0.806227 161 12.98305 0
## Correlation:
## (Intr)
## Days -0.38
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.23470320 -0.55439805 0.01549172 0.52570857 4.26476610
##
## Number of Observations: 180
## Number of Groups: 18
m.2 <- lme(fixed= Reaction ~ 1 + Days, random = ~1 + Days | Subject, data=df, method='ML')
summary(m.2)
## Linear mixed-effects model fit by maximum likelihood
## Data: df
## AIC BIC logLik
## 1763.939 1783.097 -875.9697
##
## Random effects:
## Formula: ~1 + Days | Subject
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 23.780376 (Intr)
## Days 5.716807 0.081
## Residual 25.591842
##
## Fixed effects: Reaction ~ 1 + Days
## Value Std.Error DF t-value p-value
## (Intercept) 251.40510 6.669396 161 37.69533 0
## Days 10.46729 1.510647 161 6.92901 0
## Correlation:
## (Intr)
## Days -0.138
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.94156355 -0.46559311 0.02894656 0.46361051 5.17933587
##
## Number of Observations: 180
## Number of Groups: 18
anova(m.0, m.1, m.2)
## Model df AIC BIC logLik Test L.Ratio p-value
## m.0 1 3 1916.541 1926.120 -955.2705
## m.1 2 4 1802.079 1814.851 -897.0393 1 vs 2 116.4624 <.0001
## m.2 3 6 1763.939 1783.097 -875.9697 2 vs 3 42.1393 <.0001
# an lme object "understands" predict
df$pred.m.1 <- predict(m.1)
cor(df$Reaction, df$pred.m.1)
## [1] 0.852554
df$pred.m.2 <- predict(m.2)
cor(df$Reaction, df$pred.m.2)
## [1] 0.9090176
# variance of residuals
sd(m.1$residuals)
## [1] 39.53289
# ... is smaller than in classical analysis
# we can get at the coefficients via
fixef(m.1)
## (Intercept) Days
## 251.40510 10.46729
ranef(m.1)
## (Intercept)
## 308 40.635097
## 309 -77.565875
## 310 -62.878604
## 330 4.390385
## 331 10.178962
## 332 8.191280
## 333 16.440367
## 334 -2.986060
## 335 -45.117122
## 337 71.919657
## 349 -21.119011
## 350 14.059942
## 351 -7.833572
## 352 36.245865
## 369 7.010741
## 370 -6.339518
## 371 -3.282269
## 372 18.049734
# we extend the data with effects of model m.1
df$inter <- NA
df$slope <- NA
require(dplyr)
df <- df %>% dplyr::group_by(Subject) %>% dplyr::mutate(inter=fixef(m.1)[1] + ranef(m.1)[Subject,1])
df <- df %>% dplyr::mutate(slope=fixef(m.1)[2])
# we visualize the random intercept fixed slope model
require(ggplot2)
pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) +
geom_point(aes(color=Subject)) +
geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
geom_smooth(method = lm, se=F)
pplot
## `geom_smooth()` using formula 'y ~ x'
# we extend the data with effects of model m.2
df$inter <- NA
df$slope <- NA
require(dplyr)
df <- df %>% dplyr::group_by(Subject) %>% dplyr::mutate(inter=fixef(m.2)[1] + ranef(m.2)[Subject,1])
df <- df %>% dplyr::group_by(Subject) %>% dplyr::mutate(slope=fixef(m.2)[2] + ranef(m.2)[Subject,2])
# we visualize the random intercept random slope model
pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) +
geom_point(aes(color=Subject)) +
geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
geom_smooth(method = lm, se=F, size=3)
pplot
## `geom_smooth()` using formula 'y ~ x'
# pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + geom_point(aes(color=Subject))
# pplot + geom_smooth(method = lm, se=F)
#
# # data for ggplot have to be in one dataframe
# require(dplyr)
# df.t <- dplyr::data_frame(subj = 1:length(ranef(m.2)[,1]), r.int = ranef(m.2)[,1], r.slope = ranef(m.2)[,2], f.int <- fixef(m.2)[1], f.slope <- fixef(m.2)[2])
# names(df.t) <- c("subj", "r.int", "r.slope", "f.int", "f.slope")
# df.t <- df.t %>% dplyr::mutate(intercept = f.int + r.int, slope = f.slope + r.slope)
# df.t <- df.t %>% dplyr::mutate(x0 = 0, y0 = intercept, x9 = 9, y9 = intercept + 9 * slope, Days = 0, Reaction=300)
# head(df.t)
#
# # df.t$subj <- 1:length(ranef(m.2)[1])
# # df.t <- cbind(df.t, ranef(m.2))
# # df.t$fix.intercept <- fixef(m.2)[1]
# # df.t$fix.slope <- fixef(m.2)[2]
# # names(df.t) <- c("subj", "r.int", "r.slope", "f.int", "f.slope")
# # df.t <- df.t %>% dplyr::mutate(intercept = f.int + r.int, slope = f.slope + f.int)
# # head(df.t)
#
# # we extend the data with effects
# df$inter <- NA
# df$slope <- NA
# for (ss in rownames(ranef(m.2))){
# df$inter[df$Subject == ss] <- fixef(m.2)[1] + ranef(m.2)[ss,1]
# df$slope[df$Subject == ss] <- fixef(m.2)[2] + ranef(m.2)[ss,2]
# }
#
# #pplot <- pplot + geom_line(data=df.t, aes(intercept, slope), color=subj)
# #pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject, show.legend=F) + geom_point(aes(color=Subject), show.legend=F)
# pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + geom_point(aes(color=Subject))
# pplot <- pplot + geom_abline(aes(intercept=inter, slope=slope, color = Subject))
# #pplot <- pplot + geom_line()
# #pplot <- pplot + geom_segment(data=df.t, mapping=aes(x=x0, y=y0, xend=x9, yend=y9))
# #pplot <- pplot + geom_segment(data=df.t, mapping=aes(x=x0, y=y0, xend=x9, yend=y9, color=factor(subj)))
# pplot <- pplot + geom_smooth(method = lm, se=F)
# pplot
#
# #pplot <- pplot + geom_line(data=df.t, aes(mapping = NULL, x=x0, y=intercept), color=subj)
# #pplot <- pplot + geom_line(data=df.t, aes(intercept = intercept, slope = slope, color=factor(subj)))
# pplot
#
#
#
#
# pplot + geom_abline(intercept = fixef(m.2)[1], slope = fixef(m.2)[2])
# for (nn in 1:length(ranef(m.2)[,1])){
# #cat(ranef(m.2)[nn])
# c.int <- ranef(m.2)[nn,1] + fixef(m.2)[1]
# #cat(c.int)
# c.slope <- ranef(m.2)[nn,2] + fixef(m.2)[2]
# #cat(c.slope)
# pplot <- pplot + geom_abline(intercept = c.int, slope = c.slope)
# }
# pplot <- pplot + geom_abline(intercept = fixef(m.2)[1], slope = fixef(m.2)[2], size=2)
# pplot
Im britischen Schulsystem gibt es beim Übergang von Primary School zu Secondary School einen standardisierten Test, in Nordirland beispielsweise der “Common Entrance Assessment (CEA)” durchgeführt von der AQE (Association for Quality Education). Auf Basis der Ergebnisse dieses Tests können Eltern ihre Kinder in den Highschools anmelden. Die Highschools haben unterschiedliche Minimal-Punktwerte, also Cut-Off-Werte, um Schüler zu akzeptieren. Die Schuleingangstests haben einen Mittelwert von 100.
Ein Beispiel: Belfast High School. Hier wurden 2016 nur überdurchschnittliche Schüler aufgenommen.
AQE Highest 2016: 124
AQE Lowest 2016: 101
AQE Applications 2016: 220
AQE 1st Preference 2016: 179
AQE Admitted 2016: 136
Die Abschlussnote von Schülern soll vorhergesagt werden
normexam
. Die Schüler sind an verschiedenen Schulen.
Vorhersagevariable auf Level Schüler ist standLRT
(Literarisches Textverständnis). Vorhersagevariable auf Schul-Level ist
schavg
(Durchschnittliches Aufnahmeniveau der jeweiligen
Schule).
Die Schul-Level Variablen erkennen wir daran, dass ihre Ausprägung für die jeweilige Schule gleich ist, also innerhalb einer Schule für alle Schülerinnen und Schüler gleich bleiben (sich wiederholen).
Der verwendete Datensatz beschreibt 65 Schulen in London von
insgesamt 4059 Schülern. Infos hierzu gibt es mit
?mlmRev::Exam
A data frame with 4059 observations on the following 9 variables.
var | description |
---|---|
school | School ID - a factor. |
normexam | Normalized exam score. |
schgend | School gender - a factor. Levels are mixed, boys, and girls. |
schavg | School average of intake score. |
vr | Student level Verbal Reasoning (VR) score band at intake - a factor. Levels are bottom 25%, mid 50%, and top 25%. |
intake | Band of student’s intake score - a factor. Levels are bottom 25%, mid 50% and top 25%./ |
standLRT | Standardised LR test score (London Reading Test - a reading test taken when they were 11 years old) |
sex | Sex of the student - levels are F and M. |
type | School type - levels are Mxd and Sngl. |
student | Student id (within school) - a factor |
require(nlme)
require(mlmRev)
## Lade nötiges Paket: mlmRev
##
## Attache Paket: 'mlmRev'
## Die folgenden Objekte sind maskiert von 'package:nlme':
##
## bdf, Oxboys
# mlmRev hat data Exam
dd <- mlmRev::Exam
names(dd)
## [1] "school" "normexam" "schgend" "schavg" "vr" "intake"
## [7] "standLRT" "sex" "type" "student"
head(dd)
## school normexam schgend schavg vr intake standLRT sex type
## 1 1 0.2613242 mixed 0.1661752 mid 50% bottom 25% 0.6190592 F Mxd
## 2 1 0.1340672 mixed 0.1661752 mid 50% mid 50% 0.2058022 F Mxd
## 3 1 -1.7238820 mixed 0.1661752 mid 50% top 25% -1.3645760 M Mxd
## 4 1 0.9675862 mixed 0.1661752 mid 50% mid 50% 0.2058022 F Mxd
## 5 1 0.5443412 mixed 0.1661752 mid 50% mid 50% 0.3711052 F Mxd
## 6 1 1.7348992 mixed 0.1661752 mid 50% bottom 25% 2.1894372 M Mxd
## student
## 1 143
## 2 145
## 3 142
## 4 141
## 5 138
## 6 155
require(Rmisc)
summarySE(dd, "normexam", c("school"))
## school N normexam sd se ci
## 1 1 73 0.501209573 1.1158114 0.13059584 0.2603381
## 2 2 55 0.783102291 1.2844851 0.17319993 0.3472450
## 3 3 52 0.855444696 0.9581753 0.13287501 0.2667577
## 4 4 79 0.073628516 1.0866424 0.12225682 0.2433946
## 5 5 35 0.403608663 0.7978350 0.13485872 0.2740659
## 6 6 80 0.944579967 0.8352044 0.09337869 0.1858656
## 7 7 88 0.391500023 0.7877530 0.08397475 0.1669089
## 8 8 102 -0.048192541 1.0484182 0.10380887 0.2059289
## 9 9 34 -0.435682141 0.8291304 0.14219469 0.2892973
## 10 10 50 -0.269390664 0.6613442 0.09352820 0.1879520
## 11 11 62 0.557222542 0.8758982 0.11123918 0.2224365
## 12 12 47 -0.072478545 0.7990063 0.11654705 0.2345970
## 13 13 64 -0.245749669 0.9855739 0.12319674 0.2461890
## 14 14 198 0.015569462 0.9443966 0.06711534 0.1323568
## 15 15 91 -0.040602934 0.9607184 0.10071065 0.2000793
## 16 16 88 -0.254118025 0.7288980 0.07770078 0.1544387
## 17 17 126 -0.245425479 0.9231316 0.08223910 0.1627614
## 18 18 120 -0.006150902 0.8295298 0.07572537 0.1499438
## 19 19 55 0.206285044 0.9124906 0.12304020 0.2466808
## 20 20 39 0.499090605 0.8537705 0.13671269 0.2767604
## 21 21 73 0.380324951 0.7738230 0.09056914 0.1805463
## 22 22 90 -0.498201307 0.8950668 0.09434832 0.1874681
## 23 23 28 -0.737632557 1.0780764 0.20373729 0.4180344
## 24 24 37 0.012641427 1.0217807 0.16797972 0.3406787
## 25 25 73 -0.613106307 0.8802210 0.10302207 0.2053708
## 26 26 75 -0.392018208 0.8688488 0.10032601 0.1999039
## 27 27 39 -0.328896456 0.8753218 0.14016366 0.2837465
## 28 28 57 -0.872710309 0.6198061 0.08209533 0.1644567
## 29 29 79 0.063508116 0.7452036 0.08384196 0.1669165
## 30 30 42 0.335456290 1.3144203 0.20281946 0.4096022
## 31 31 49 -0.236679098 0.7157689 0.10225270 0.2055928
## 32 32 42 -0.371331538 1.0198787 0.15737070 0.3178166
## 33 33 77 0.077111969 0.7779287 0.08865320 0.1765682
## 34 34 26 -0.370894600 1.3554668 0.26582892 0.5474849
## 35 35 38 0.083951826 0.7335080 0.11899071 0.2410981
## 36 36 70 -0.249203923 0.9643744 0.11526478 0.2299469
## 37 37 22 -0.665348082 0.7122275 0.15184741 0.3157840
## 38 38 54 -0.287545137 0.9875188 0.13438429 0.2695408
## 39 39 48 0.038325454 0.7469371 0.10781108 0.2168879
## 40 40 71 -0.259676242 1.2444905 0.14769385 0.2945661
## 41 41 60 0.072922377 0.8958935 0.11565936 0.2314338
## 42 42 58 0.022684469 0.6780823 0.08903657 0.1782927
## 43 43 61 0.147594741 0.9470311 0.12125491 0.2425459
## 44 44 29 -0.433727697 0.6801447 0.12629971 0.2587132
## 45 45 53 -0.225505181 0.8165485 0.11216156 0.2250686
## 46 46 83 -0.457403342 0.6929915 0.07606570 0.1513189
## 47 47 82 -0.122257644 0.9812088 0.10835639 0.2155952
## 48 48 2 -0.414295000 0.4033479 0.28521000 3.6239367
## 49 49 113 0.046892811 0.7810911 0.07347887 0.1455890
## 50 50 73 -0.321780110 0.9139714 0.10697226 0.2132453
## 51 51 58 -0.278070169 0.7557403 0.09923356 0.1987118
## 52 52 61 0.533366836 1.1360822 0.14546043 0.2909642
## 53 53 70 1.003548429 1.2838470 0.15344907 0.3061225
## 54 54 8 -0.633628850 0.6522696 0.23061214 0.5453111
## 55 55 51 0.717123745 0.9759933 0.13666640 0.2745025
## 56 56 38 -0.034280063 1.2405573 0.20124496 0.4077610
## 57 57 63 0.001081203 0.9535847 0.12014037 0.2401572
## 58 58 37 0.281887811 0.7257508 0.11931269 0.2419773
## 59 59 47 -1.049087153 0.7428851 0.10836093 0.2181192
## 60 60 80 0.195428218 0.9267497 0.10361376 0.2062380
## 61 61 64 -0.052334084 1.0869840 0.13587300 0.2715205
## 62 62 71 0.037990287 0.8509089 0.10098431 0.2014068
## 63 63 30 0.735678180 0.7038345 0.12850201 0.2628161
## 64 64 59 0.343793698 0.9803036 0.12762466 0.2554685
## 65 65 80 -0.308687153 0.8915633 0.09967981 0.1984077
require(psych)
psych::describe (dd)
## vars n mean sd median trimmed mad min max range skew
## school* 1 4059 31.01 18.94 29.00 30.51 23.72 1.00 65.00 64.00 0.20
## normexam 2 4059 0.00 1.00 0.00 0.00 1.00 -3.67 3.67 7.33 0.00
## schgend* 3 4059 1.80 0.91 1.00 1.76 0.00 1.00 3.00 2.00 0.39
## schavg 4 4059 0.00 0.31 -0.02 0.01 0.32 -0.76 0.64 1.39 -0.20
## vr* 5 4059 2.13 0.65 2.00 2.16 0.00 1.00 3.00 2.00 -0.13
## intake* 6 4059 1.84 0.63 2.00 1.80 0.00 1.00 3.00 2.00 0.14
## standLRT 7 4059 0.00 0.99 0.04 0.02 0.98 -2.93 3.02 5.95 -0.13
## sex* 8 4059 1.40 0.49 1.00 1.37 0.00 1.00 2.00 1.00 0.41
## type* 9 4059 1.47 0.50 1.00 1.46 0.00 1.00 2.00 1.00 0.14
## student* 10 4059 121.67 134.99 78.00 92.20 69.68 1.00 650.00 649.00 2.12
## kurtosis se
## school* -1.23 0.30
## normexam -0.02 0.02
## schgend* -1.69 0.01
## schavg -0.35 0.00
## vr* -0.69 0.01
## intake* -0.57 0.01
## standLRT 0.07 0.02
## sex* -1.83 0.01
## type* -1.98 0.01
## student* 4.19 2.12
# we specify the null model
m.0 <- lme(fixed = normexam ~ 1, # 1, a constant, predicts and results in mean
data = dd,
random = ~ 1 | school,
method="ML") # levels are defined, school defines level two
m.0
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## Log-likelihood: -5505.324
## Fixed: normexam ~ 1
## (Intercept)
## -0.01316707
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 0.4106567 0.9207391
##
## Number of Observations: 4059
## Number of Groups: 65
# ... we find mean in fixed (Intercept)
# ... we find sd of residuals for (Intercept) level 2 and level 1 Residual
# we add a level 1 predictor
# random intercept, fixed predictor in individual level
m.1 <- lme(fixed = normexam ~ standLRT, random = ~ 1 | school, data = dd, method="ML")
m.1
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## Log-likelihood: -4678.622
## Fixed: normexam ~ standLRT
## (Intercept) standLRT
## 0.002390757 0.563371165
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 0.303528 0.7521509
##
## Number of Observations: 4059
## Number of Groups: 65
# ... sd of Intercept goes down
# ... sd of Residuals goes down
fixef(m.1)
## (Intercept) standLRT
## 0.002390757 0.563371165
head(ranef(m.1))
## (Intercept)
## 1 0.37376075
## 2 0.50204393
## 3 0.50388972
## 4 0.01813123
## 5 0.24043139
## 6 0.54139588
# we add a level 2 predictor
# random intercept, random slope
m.2 <- lme(fixed = normexam ~ standLRT, random = ~ standLRT | school, data = dd, method="ML")
m.2
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## Log-likelihood: -4658.435
## Fixed: normexam ~ standLRT
## (Intercept) standLRT
## -0.01150385 0.55672932
##
## Random effects:
## Formula: ~standLRT | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.3007325 (Intr)
## standLRT 0.1205756 0.497
## Residual 0.7440805
##
## Number of Observations: 4059
## Number of Groups: 65
fixef(m.2)
## (Intercept) standLRT
## -0.01150385 0.55672932
head(ranef(m.2))
## (Intercept) standLRT
## 1 0.37492945 0.12497525
## 2 0.47020445 0.16472812
## 3 0.47978037 0.08084307
## 4 0.03500977 0.12720641
## 5 0.24627676 0.07205334
## 6 0.51840503 0.05859459
# we can add a level 2 predictor on level 1 also
# random intercept, individual and group level predictor. There is a random slope also.
m.3 <- lme(fixed = normexam ~ standLRT + schavg, random = ~ standLRT | school, data = dd, method="ML")
m.3
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## Log-likelihood: -4655.215
## Fixed: normexam ~ standLRT + schavg
## (Intercept) standLRT schavg
## -0.001238744 0.552391343 0.294788775
##
## Random effects:
## Formula: ~standLRT | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.2729164 (Intr)
## standLRT 0.1220248 0.379
## Residual 0.7440531
##
## Number of Observations: 4059
## Number of Groups: 65
fixef(m.3)
## (Intercept) standLRT schavg
## -0.001238744 0.552391343 0.294788775
head(ranef(m.3))
## (Intercept) standLRT
## 1 0.316023289 0.12335833
## 2 0.352660821 0.15783280
## 3 0.342163770 0.05858984
## 4 -0.002440117 0.13442592
## 5 0.181470405 0.06233838
## 6 0.349439693 0.03347502
# we can also add cross-level interaction
# random intercept, cross-level interaction
m.4 <- lme(fixed = normexam ~ standLRT * schavg, random = ~ standLRT | school, data=dd, method="ML")
fixef(m.4)
## (Intercept) standLRT schavg standLRT:schavg
## -0.007008473 0.558239557 0.373510221 0.162219019
head(ranef(m.4))
## (Intercept) standLRT
## 1 0.307074694 0.09386613
## 2 0.320212211 0.10349247
## 3 0.288467624 0.01089639
## 4 -0.004713493 0.10997334
## 5 0.166694142 0.04226494
## 6 0.277333995 -0.02955381
# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
deviance(m.0)
## 'log Lik.' 11010.65 (df=3)
deviance(m.1)
## 'log Lik.' 9357.243 (df=4)
anova(m.0)
## numDF denDF F-value p-value
## (Intercept) 1 3994 0.06026954 0.8061
anova(m.1, test="F")
## numDF denDF F-value p-value
## (Intercept) 1 3993 0.0924 0.7612
## standLRT 1 3993 2041.5623 <.0001
# which is default
anova(m.1)
## numDF denDF F-value p-value
## (Intercept) 1 3993 0.0924 0.7612
## standLRT 1 3993 2041.5623 <.0001
Ecologic fallacy: Systematiken auf höheren Ebenen werden als Effekt auf level 1 fehlinterpretiert.
y
ist Reaktionsvariable x
ist
Vorhersagevariable auf Level 1 treatment
ist
Gruppierungsvariable für Level 2 g.eff
ist Variable auf
Level 2 ev. subj
kann benutzt werden als zweite Level 2
Variable (Betrachtung als Messwiederholungsdesign)
Szenario:
Repeated-Measures-ANOVA würde Mittelwertsunterschiede zwischen den Schultypen prüfen. Schultyp würde als kategoriale Variable aufgefasst.
(Es handelt sich um simulierte Daten)
# we read data like generated above
df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/div/schooltype_future.txt")
df$schooltype <- factor(df$schooltype)
# correlation is
cor(df$x, df$y)
## [1] 0.7397159
# complete scatterplot seems to visualize this correlation, we might notice some asymmetry in distribution
require(ggplot2)
# we visualize the random intercept random slope model
pplot <- ggplot(data=df, aes(x=x, y=y)) +
#geom_point(aes(color=Subject)) +
geom_point() +
#geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
geom_smooth(method = lm, se=F, size=1) +
xlab('cognitive abilities') +
ylab('future perspectives')
pplot
## `geom_smooth()` using formula 'y ~ x'
# but if we look at the groups ...
pplot <- ggplot(data=df, aes(x=x, y=y, color=schooltype, shape=schooltype)) +
#geom_point(aes(color=Subject)) +
geom_point() +
#geom_smooth(method="lm", fill=NA)
#geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
#geom_smooth(method = lm, se=F, size=2, )
xlab('cognitive abilities') +
ylab('future perspectives')
pplot
# group specific correlations show huge differences
cor(df[df$schooltype == 'a', 'x'], df[df$schooltype=='a', 'y'])
## [1] 0.7694075
cor(df[df$schooltype == 'b', 'x'], df[df$schooltype=='b', 'y'])
## [1] 0.2675309
cor(df[df$schooltype == 'c', 'x'], df[df$schooltype=='c', 'y'])
## [1] -0.763508
# and the descriptives
require(Rmisc)
summarySE(data=df, "x", c("schooltype"))
## schooltype N x sd se ci
## 1 a 100 2.93823842 1.775213 0.1775213 0.3522409
## 2 b 100 -0.01171805 1.679583 0.1679583 0.3332657
## 3 c 100 -2.90880335 1.736970 0.1736970 0.3446526
summarySE(data=df, "y", c("schooltype"))
## schooltype N y sd se ci
## 1 a 100 11.1699977 4.427624 0.4427624 0.8785366
## 2 b 100 -0.0644219 2.941416 0.2941416 0.5836408
## 3 c 100 -9.2367355 4.755953 0.4755953 0.9436843
summarySE(data=df, "y")
## .id N y sd se ci
## 1 <NA> 300 0.6229468 9.312347 0.5376486 1.058055
# require(psych)
# psych::describeBy(df, group="schooltype")
# psych::describe(df)
# we fit a repeated measures anova
require(ez)
m.ez <- ezANOVA(data=data.frame(df), dv=y, wid=subj, within=schooltype, detailed=T, return_aov = T)
## Warning: Converting "subj" to factor for ANOVA.
m.ez
## $ANOVA
## Effect DFn DFd SSn SSd F p p<.05
## 1 (Intercept) 1 99 116.4188 1561.714 7.380009 7.785155e-03 *
## 2 schooltype 2 198 20892.6094 3474.899 595.231235 1.815405e-84 *
## ges
## 1 0.02259229
## 2 0.80575535
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 schooltype 0.9606459 0.1398308
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05
## 2 schooltype 0.962136 2.224446e-81 * 0.9808178 6.660063e-83 *
##
## $aov
##
## Call:
## aov(formula = formula(aov_formula), data = data)
##
## Grand Mean: 0.6229468
##
## Stratum 1: subj
##
## Terms:
## Residuals
## Sum of Squares 1561.714
## Deg. of Freedom 99
##
## Residual standard error: 3.971761
##
## Stratum 2: subj:schooltype
##
## Terms:
## schooltype Residuals
## Sum of Squares 20892.609 3474.899
## Deg. of Freedom 2 198
##
## Residual standard error: 4.189271
## Estimated effects may be unbalanced
require(nlme)
# for demonstration reasons we fit a level 1 only zero model
df.t <- cbind(df, 1)
names(df.t) <- c("subj", "schooltype", "g.eff", "x", "y", 'const')
m.00 <- lme(fixed = y ~ 1, random = ~ 1 | const, data=df.t, method='ML')
# fitted value is the same for all subjects
unique(predict(m.00))
## [1] 0.6229468
# correlation cannot be computed due to the missing variance in the predicted values
# zero multi level model
m.0 <- lme(fixed = y ~ 1, random = ~1 | schooltype, data=df, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
## Data: df
## AIC BIC logLik
## 1724.644 1735.755 -859.3218
##
## Random effects:
## Formula: ~1 | schooltype
## (Intercept) Residual
## StdDev: 8.335013 4.118045
##
## Fixed effects: y ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 0.6229468 4.826142 297 0.1290776 0.8974
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.383275000 -0.669782463 -0.006185372 0.624697233 2.799833738
##
## Number of Observations: 300
## Number of Groups: 3
# get colors
g <- ggplot_build(pplot)
c.colors <- unique(g$data[[1]]["colour"])
# fitted value is the same for all subjects within a level2 group
unique(predict(m.0))
## [1] 11.14431494 -0.06274811 -9.21272654
# ... random intercept of schooltype 'explains' variance and reduces residual variance
pplot +
#geom_hline(yintercept=m.0$coefficients$random[['schooltype']][1], color=c1)
geom_hline(yintercept=m.0$coefficients$random[['schooltype']][1], color=c.colors[1,1]) +
geom_hline(yintercept=m.0$coefficients$random[['schooltype']][2], color=c.colors[2,1]) +
geom_hline(yintercept=m.0$coefficients$random[['schooltype']][3], color=c.colors[3,1])
# level 1 predictor, random intercept
m.1 <- lme(fixed = y ~ 1 + x, random = ~1 | schooltype, data=df, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
## Data: df
## AIC BIC logLik
## 1725.689 1740.504 -858.8447
##
## Random effects:
## Formula: ~1 | schooltype
## (Intercept) Residual
## StdDev: 8.012678 4.113069
##
## Fixed effects: y ~ 1 + x
## Value Std.Error DF t-value p-value
## (Intercept) 0.6221491 4.647731 296 0.1338608 0.8936
## x 0.1350708 0.137988 296 0.9788598 0.3284
## Correlation:
## (Intr)
## x 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.43382547 -0.69250697 -0.01532247 0.56627952 2.71538500
##
## Number of Observations: 300
## Number of Groups: 3
pplot +
geom_abline(intercept = fixef(m.1)[1] + ranef(m.1)[1,1], slope = fixef(m.1)[2], color=c.colors[1,1]) +
geom_abline(intercept = fixef(m.1)[1] + ranef(m.1)[2,1], slope = fixef(m.1)[2], color=c.colors[2,1]) +
geom_abline(intercept = fixef(m.1)[1] + ranef(m.1)[3,1], slope = fixef(m.1)[2], color=c.colors[3,1]) +
geom_abline(intercept = fixef(m.1)[1], slope = fixef(m.1)[2], size = 2)
# we don't want to run into an optimization error ("iteration limit reached without convergence")
ctrl <- lmeControl(opt='optim');
# flow.lme <- lme(rate ~ nozzle, error= nozzle|operator, control=ctrl, data=Flow);
# level 2 variable, random intercept and random slope
m.2 <- lme(fixed = y ~ 1 + x, random = ~ 1 + x | schooltype, data=df, control=ctrl, method='ML')
summary(m.2)
## Linear mixed-effects model fit by maximum likelihood
## Data: df
## AIC BIC logLik
## 1527.571 1549.793 -757.7853
##
## Random effects:
## Formula: ~1 + x | schooltype
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 8.966983 (Intr)
## x 1.623816 1
## Residual 2.923098
##
## Fixed effects: y ~ 1 + x
## Value Std.Error DF t-value p-value
## (Intercept) -3.1403554 5.198685 296 -0.6040673 0.5463
## x 0.1005276 0.944947 296 0.1063844 0.9153
## Correlation:
## (Intr)
## x 0.996
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.4952414 -0.7107150 0.0242678 0.7058308 2.9059657
##
## Number of Observations: 300
## Number of Groups: 3
# we visualize the group specific random intercept and random slope
pplot +
geom_abline(intercept = fixef(m.2)[1] + ranef(m.2)[1,1], slope = fixef(m.2)[2] + ranef(m.2)[1,2], color=c.colors[1,1]) +
geom_abline(intercept = fixef(m.2)[1] + ranef(m.2)[2,1], slope = fixef(m.2)[2] + ranef(m.2)[2,2], color=c.colors[2,1]) +
geom_abline(intercept = fixef(m.2)[1] + ranef(m.2)[3,1], slope = fixef(m.2)[2] + ranef(m.2)[3,2], color=c.colors[3,1]) +
geom_abline(intercept = fixef(m.2)[1], slope = fixef(m.2)[2], size = 2)
# we compare the models using liklihood ratio test
anova(m.0, m.1, m.2)
## Model df AIC BIC logLik Test L.Ratio p-value
## m.0 1 3 1724.643 1735.755 -859.3218
## m.1 2 4 1725.689 1740.505 -858.8447 1 vs 2 0.95419 0.3287
## m.2 3 6 1527.571 1549.793 -757.7853 2 vs 3 202.11867 <.0001
anova(m.0, m.2)
## Model df AIC BIC logLik Test L.Ratio p-value
## m.0 1 3 1724.643 1735.755 -859.3218
## m.2 2 6 1527.571 1549.793 -757.7853 1 vs 2 203.0729 <.0001
# we compare the explained variance of the various models
cor(df$y, predict(m.0))
## [1] 0.8976388
cor(df$y, predict(m.1))
## [1] 0.8978998
cor(df$y, predict(m.2))
## [1] 0.9498179
# ... the school-type specific random slope increases model fit a lot
# we compare the fixed effects
fixef(m.0)
## (Intercept)
## 0.6229468
fixef(m.1)
## (Intercept) x
## 0.6221491 0.1350708
fixef(m.2)
## (Intercept) x
## -3.1403554 0.1005276
# ... in m.1 and m.2 we see the fixed parameters (intercept and slope)
# ... for model m.2 this is the fixed part of the parameters, random part has to be added to this
# we compare the random effects
ranef(m.0)
## (Intercept)
## a 10.5213682
## b -0.6856949
## c -9.8356733
ranef(m.1)
## (Intercept)
## a 10.124301
## b -0.683188
## c -9.441113
ranef(m.2)
## (Intercept) x
## a 9.169431 1.6660904
## b 2.996174 0.5337865
## c -12.165606 -2.1998769
# ... for every school-type we get a specific intercept
# ... random intercepts of m.1 are almost the same when we add level 1 predictor x
# ... we can see the level 2 random slope component to be added to the fixed slope
# ... we can see the big differences in random slope, due to the school-type
# coefficients() works also with mle result objects
# it returns the aggregated coefficients
# fixed and random parts are already added
coefficients(m.0)
## (Intercept)
## a 11.14431494
## b -0.06274811
## c -9.21272654
coefficients(m.1)
## (Intercept) x
## a 10.74645018 0.1350708
## b -0.06103895 0.1350708
## c -8.81896400 0.1350708
coefficients(m.2)
## (Intercept) x
## a 6.0290760 1.7666181
## b -0.1441811 0.6343141
## c -15.3059610 -2.0993493
# a look at the variances
VarCorr(m.0)
## schooltype = pdLogChol(1)
## Variance StdDev
## (Intercept) 69.47245 8.335013
## Residual 16.95829 4.118045
VarCorr(m.1)
## schooltype = pdLogChol(1)
## Variance StdDev
## (Intercept) 64.20301 8.012678
## Residual 16.91734 4.113069
VarCorr(m.2)
## schooltype = pdLogChol(1 + x)
## Variance StdDev Corr
## (Intercept) 80.406782 8.966983 (Intr)
## x 2.636777 1.623816 1
## Residual 8.544504 2.923098
# ... variances of residuals go down
# we look at ICC
VarCorr(m.0)
## schooltype = pdLogChol(1)
## Variance StdDev
## (Intercept) 69.47245 8.335013
## Residual 16.95829 4.118045
var.u0 = as.numeric(VarCorr(m.0)[1])
var.epsilon = as.numeric(VarCorr(m.0)[2])
ICC = var.u0 / (var.u0 + var.epsilon)
ICC
## [1] 0.8037933
#require(psychometric)
#ICC1.lme(y, grp=schooltype, data=df)
Zur Interpretation der Ausgabe, speziell zu random StdDev post
sowie
Verschiedene Herangehensweise an denselben Datensatz.
Variable | Beschreibung |
---|---|
school | Schule, 1…65 |
student | Schüler-Nummer, 1 … 198, unter Schule genested, maximal 198 Schüler aus einer Schule |
y | Score (of what?) vielleicht Abschlussnote |
X1 | Reading test |
x2 | binär Female = 1 |
x3 | Type of school Schulform: |
# the data file comes in Stata format *.dta
require(foreign)
## Lade nötiges Paket: foreign
dd <- read.dta("http://md.psych.bio.uni-goettingen.de/mv/data/div/schools.dta")
attributes(dd)$var.labels
## [1] "School id" "Student id" "Score" "Reading test"
## [5] "Female=1" "Type of school"
# we rename vars according to labels
require(dplyr)
dd <- dplyr::rename(dd, score=y, reading.test=x1, gender.f=x2, schooltype=x3)
# and make school and student factors
dd$school <- factor(dd$school)
dd$student <- factor(dd$student)
dd$gender <- factor(dd$gender)
require(ggplot2)
# get a plot of variability
ggplot(data=dd, aes(x=school, y=score)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
# ... schools seem to be different in their score
# individual regressions
ggplot(data=dd, aes(x=reading.test, y=score, color=school)) + geom_point() + geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula 'y ~ x'
# or a bit clearer
ggplot(data=dd, aes(x=reading.test, y=score, color=school)) + geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula 'y ~ x'
# ... slopes seem to vary considerably
# global correlation
cor(dd$reading.test, dd$score)
## [1] 0.5916495
# school specific correlation
require(dplyr)
cors <- dd %>% dplyr::group_by(school) %>% dplyr::summarize(COR=cor(score,reading.test))
# we sort correlations
cors <- dplyr::arrange(cors, COR)
head(cors)
## # A tibble: 6 × 2
## school COR
## <fct> <dbl>
## 1 48 -1
## 2 54 0.0357
## 3 7 0.270
## 4 18 0.283
## 5 10 0.336
## 6 63 0.358
# something is strange with school 48, we take a look at it
dd[dd$school == 48,]
## school student score reading.test gender.f schooltype gender
## 3055 48 1 -6.9951 -3.7276 1 Girls only 1
## 3056 48 2 -1.2908 -4.5541 1 Girls only 1
# we exclude it, only two students took part
cors <- cors[2:length(cors$school),]
cors$o.nr <- 1:length(cors$school)
head(cors)
## # A tibble: 6 × 3
## school COR o.nr
## <fct> <dbl> <int>
## 1 54 0.0357 1
## 2 7 0.270 2
## 3 18 0.283 3
## 4 10 0.336 4
## 5 63 0.358 5
## 6 24 0.366 6
tail(cors)
## # A tibble: 6 × 3
## school COR o.nr
## <fct> <dbl> <int>
## 1 30 0.706 59
## 2 4 0.708 60
## 3 50 0.730 61
## 4 32 0.731 62
## 5 34 0.736 63
## 6 53 0.785 64
# a graph of the correlations in schools
ggplot(data=cors, aes(x=o.nr, y=COR)) + geom_point() + geom_smooth(se=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
# our correlations vary substancially
# we fit a simple anova
require(ez)
m.ez <- ezANOVA(data=dd, dv=score, wid=student, between=school, detailed=T, return_aov = T)
## Warning: The column supplied as the wid variable contains non-unique values
## across levels of the supplied between-Ss variables. Automatically fixing this by
## generating unique wid labels.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
m.ez
## $ANOVA
## Effect DFn DFd SSn SSd F p p<.05 ges
## 1 school 64 3994 66357.44 338582.9 12.23074 9.338734e-112 * 0.1638697
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 64 3994 7518.794 118575.1 3.957152 2.200363e-23 *
##
## $aov
## Call:
## aov(formula = formula(aov_formula), data = data)
##
## Terms:
## school Residuals
## Sum of Squares 66357.4 338582.9
## Deg. of Freedom 64 3994
##
## Residual standard error: 9.207219
## Estimated effects may be unbalanced
summary(m.ez$aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## school 64 66357 1036.8 12.23 <2e-16 ***
## Residuals 3994 338583 84.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ... shows a significant factor school. And ...
# multiple correlation
cor(m.ez[["aov"]]$fitted.values, dd$score)
## [1] -0.03577007
# squared multiple correlation
cor(m.ez[["aov"]]$fitted.values, dd$score)^2
## [1] 0.001279498
# ... not too much explained variance in score
rm(m.ez)
require(nlme)
# zero multi level model
m.0 <- lme(fixed = score ~ 1, random = ~1 | school, data=dd, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## AIC BIC logLik
## 29709 29727.93 -14851.5
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 4.10655 9.207357
##
## Fixed effects: score ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) -0.1317103 0.5363377 3994 -0.2455735 0.806
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.94714971 -0.64855061 0.01168497 0.69918956 3.65785029
##
## Number of Observations: 4059
## Number of Groups: 65
# get colors
# g <- ggplot_build(pplot)
# c.colors <- unique(g$data[[1]]["colour"])
# fitted value is the same for all subjects within a level2 group
head(predict(m.0), n=100)
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 1 1 1 1 1 1 1
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637
## 1 2 2 2 2 2 2 2
## 4.680637 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112
## 2 2 2 2 2 2 2 2
## 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112
## 2 2 2 2 2 2 2 2
## 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112
## 2 2 2 2
## 7.164112 7.164112 7.164112 7.164112
unique(predict(m.0))
## [1] 4.6806368626 7.1641117492 7.7886860696 0.6843289745 3.5125920556
## [6] 8.8794480695 3.6962759755 -0.4655025204 -3.8125809168 -2.4598620714
## [11] 5.1443700298 -0.6674877850 -2.2881692109 0.1485128656 -0.3917076145
## [16] -2.4109877309 -2.3651756403 -0.0643696476 1.8789986489 4.4059230567
## [21] 3.5496526677 -4.7254338739 -6.2736306171 0.0955018007 -5.7445284331
## [26] -3.6822445431 -2.9285276235 -8.0305067022 0.5891802575 2.9818667380
## [31] -2.1588729661 -3.3304637328 0.7157623990 -3.1294199488 0.7260163062
## [36] -2.3339162499 -5.4404178590 -2.6418102507 0.3343883874 -2.4337806121
## [41] 0.6626199905 0.1982296764 1.3535040525 -3.7159620819 -2.0711088565
## [46] -4.3203510148 -1.1595917088 -1.2733624538 0.4433082501 -3.0189551959
## [51] -2.5694542961 4.9174902526 9.3541940739 -3.9419503186 6.5158852595
## [56] -0.3181838079 0.0002605729 2.4658954630 -9.4899661148 1.8308870588
## [61] -0.4948790548 0.3460303496 6.2819733913 3.1576121006 -2.9121524697
# but means vary
unique(predict(m.0))[order(unique(predict(m.0)))]
## [1] -9.4899661148 -8.0305067022 -6.2736306171 -5.7445284331 -5.4404178590
## [6] -4.7254338739 -4.3203510148 -3.9419503186 -3.8125809168 -3.7159620819
## [11] -3.6822445431 -3.3304637328 -3.1294199488 -3.0189551959 -2.9285276235
## [16] -2.9121524697 -2.6418102507 -2.5694542961 -2.4598620714 -2.4337806121
## [21] -2.4109877309 -2.3651756403 -2.3339162499 -2.2881692109 -2.1588729661
## [26] -2.0711088565 -1.2733624538 -1.1595917088 -0.6674877850 -0.4948790548
## [31] -0.4655025204 -0.3917076145 -0.3181838079 -0.0643696476 0.0002605729
## [36] 0.0955018007 0.1485128656 0.1982296764 0.3343883874 0.3460303496
## [41] 0.4433082501 0.5891802575 0.6626199905 0.6843289745 0.7157623990
## [46] 0.7260163062 1.3535040525 1.8308870588 1.8789986489 2.4658954630
## [51] 2.9818667380 3.1576121006 3.5125920556 3.5496526677 3.6962759755
## [56] 4.4059230567 4.6806368626 4.9174902526 5.1443700298 6.2819733913
## [61] 6.5158852595 7.1641117492 7.7886860696 8.8794480695 9.3541940739
mean(dd$score[dd$school == 1])
## [1] 5.012035
# ... not surprising: each school has its own mean score which is the predicted value
ggplot(data=dd, aes(x=reading.test, y=predict(m.0), color=school)) + geom_point()
# level 1 predictor, random intercept
m.1 <- lme(fixed = score ~ 1 + reading.test, random = ~1 | school, data=dd, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## AIC BIC logLik
## 28057.6 28082.83 -14024.8
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 3.035269 7.521481
##
## Fixed effects: score ~ 1 + reading.test
## Value Std.Error DF t-value p-value
## (Intercept) 0.0238706 0.4003241 3993 0.05963 0.9525
## reading.test 0.5633697 0.0124684 3993 45.18366 0.0000
## Correlation:
## (Intr)
## reading.test 0.008
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.7161717 -0.6304245 0.0286690 0.6844298 3.2680305
##
## Number of Observations: 4059
## Number of Groups: 65
ggplot(data=dd, aes(x=reading.test, y=predict(m.1), color=school)) + geom_point()
ggplot(data=dd, aes(x=predict(m.1), y=score, color=school)) + geom_point()
# level 2 variable, random intercept and random slope
#m.2 <- lme(fixed = score ~ 1 + reading.test, random = ~ 1 + reading.test | school, data=dd, control=ctrl, method='ML')
m.2 <- lme(fixed = score ~ 1 + reading.test, random = ~ 1 + reading.test | school, data=dd, method='ML')
summary(m.2)
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## AIC BIC logLik
## 28021.23 28059.08 -14004.61
##
## Random effects:
## Formula: ~1 + reading.test | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 3.0073339 (Intr)
## reading.test 0.1205752 0.497
## Residual 7.4407756
##
## Fixed effects: score ~ 1 + reading.test
## Value Std.Error DF t-value p-value
## (Intercept) -0.1150742 0.3979198 3993 -0.289189 0.7725
## reading.test 0.5567278 0.0199429 3993 27.916142 0.0000
## Correlation:
## (Intr)
## reading.test 0.365
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.83123290 -0.63247504 0.03404161 0.68320566 3.45617377
##
## Number of Observations: 4059
## Number of Groups: 65
ggplot(data=dd, aes(x=reading.test, y=predict(m.2), color=school)) + geom_point()
# we compare the models using liklihood ratio test
anova(m.0, m.1, m.2)
## Model df AIC BIC logLik Test L.Ratio p-value
## m.0 1 3 29709.00 29727.93 -14851.50
## m.1 2 4 28057.60 28082.83 -14024.80 1 vs 2 1653.4061 <.0001
## m.2 3 6 28021.23 28059.08 -14004.61 2 vs 3 40.3722 <.0001
anova(m.0, m.2)
## Model df AIC BIC logLik Test L.Ratio p-value
## m.0 1 3 29709.00 29727.93 -14851.50
## m.2 2 6 28021.23 28059.08 -14004.61 1 vs 2 1693.778 <.0001
# we compare the explained variance of the various models
cor(dd$score, predict(m.0))
## [1] 0.4044815
cor(dd$score, predict(m.1))
## [1] 0.6641689
cor(dd$score, predict(m.2))
## [1] 0.6764199
# ... the school-type specific random slope increases model fit a lot
# we compare the fixed effects
fixef(m.0)
## (Intercept)
## -0.1317103
fixef(m.1)
## (Intercept) reading.test
## 0.02387062 0.56336973
fixef(m.2)
## (Intercept) reading.test
## -0.1150742 0.5567278
# ... in m.1 and m.2 we see the fixed parameters (intercept and slope)
# ... for model m.2 this is the fixed part of the parameters, random part has to be added to this
# we compare the random effects
head(ranef(m.0))
## (Intercept)
## 1 4.8123472
## 2 7.2958221
## 3 7.9203964
## 4 0.8160393
## 5 3.6443024
## 6 9.0111584
head(ranef(m.1))
## (Intercept)
## 1 3.7375959
## 2 5.0204297
## 3 5.0388872
## 4 0.1813312
## 5 2.4043096
## 6 5.4139221
head(ranef(m.2))
## (Intercept) reading.test
## 1 3.7492877 0.12497492
## 2 4.7020428 0.16472764
## 3 4.7978126 0.08084055
## 4 0.3501122 0.12720817
## 5 2.4627687 0.07205185
## 6 5.1840346 0.05859261
# ... for every school we get a specific intercept
# ... random intercepts of m.1 are almost the same when we add level 1 predictor x
# ... we can see the level 2 random slope component to be added to the fixed slope
# ... we can see the big differences in random slope, due to the school
# coefficients() works also with mle result objects
# it returns the aggregated coefficients
# fixed and random parts are already added
head(coefficients(m.0))
## (Intercept)
## 1 4.680637
## 2 7.164112
## 3 7.788686
## 4 0.684329
## 5 3.512592
## 6 8.879448
head(coefficients(m.1))
## (Intercept) reading.test
## 1 3.7614666 0.5633697
## 2 5.0443003 0.5633697
## 3 5.0627578 0.5633697
## 4 0.2052018 0.5633697
## 5 2.4281802 0.5633697
## 6 5.4377927 0.5633697
head(coefficients(m.2))
## (Intercept) reading.test
## 1 3.634213 0.6817028
## 2 4.586969 0.7214555
## 3 4.682738 0.6375684
## 4 0.235038 0.6839360
## 5 2.347694 0.6287797
## 6 5.068960 0.6153205
# a look at the variances
VarCorr(m.0)
## school = pdLogChol(1)
## Variance StdDev
## (Intercept) 16.86375 4.106550
## Residual 84.77542 9.207357
VarCorr(m.1)
## school = pdLogChol(1)
## Variance StdDev
## (Intercept) 9.212858 3.035269
## Residual 56.572672 7.521481
VarCorr(m.2)
## school = pdLogChol(1 + reading.test)
## Variance StdDev Corr
## (Intercept) 9.04405725 3.0073339 (Intr)
## reading.test 0.01453838 0.1205752 0.497
## Residual 55.36514147 7.4407756
# ... variances of residuals go down
# we look at ICC
VarCorr(m.0)
## school = pdLogChol(1)
## Variance StdDev
## (Intercept) 16.86375 4.106550
## Residual 84.77542 9.207357
var.u0 = as.numeric(VarCorr(m.0)[1])
var.epsilon = as.numeric(VarCorr(m.0)[2])
ICC = var.u0 / (var.u0 + var.epsilon)
ICC
## [1] 0.1659178
# ICC m.1
as.numeric(VarCorr(m.1)[1]) / (as.numeric(VarCorr(m.1)[1]) + as.numeric(VarCorr(m.1)[2]))
## [1] 0.1400438
# ICC m.2
as.numeric(VarCorr(m.2)[1]) / (as.numeric(VarCorr(m.2)[1]) + as.numeric(VarCorr(m.2)[2]))
## [1] 0.9983951
require(psychometric)
## Lade nötiges Paket: psychometric
## Lade nötiges Paket: multilevel
## Lade nötiges Paket: MASS
##
## Attache Paket: 'MASS'
## Das folgende Objekt ist maskiert 'package:dplyr':
##
## select
##
## Attache Paket: 'psychometric'
## Das folgende Objekt ist maskiert 'package:psych':
##
## alpha
## Das folgende Objekt ist maskiert 'package:ggplot2':
##
## alpha
ICC1.lme(score, grp=school, data=dd)
## [1] 0.1683408
Leider keine Dokumentation zu den Daten.
Inhalt
Die SES erlaubt die Messung und differenzierte Beschreibung der subjektiv wahrgenommenen Schmerzen. Der Fragebogen besteht aus 24 Items, die sich 5 Teilskalen zuordnen lassen. Zwei Skalen beschreiben affektive Merkmale der Schmerzempfindung: «allgemeine affektive Schmerzangabe» und «Schmerzangabe der Hartnäckigkeit». Beide Merkmale aufsummiert bilden die Globalskala «SES-affektiv». Die drei weiteren Skalen beschreiben sensorische Aspekte der Schmerzempfindung: «sensorische Schmerzangabe der Rhythmik», «sensorische Schmerzangabe des lokalen Eindringens» und «sensorische Schmerzangabe der Temperatur». Die drei Merkmale aufsummiert bilden die Globalskala «SES-sensorisch». Um die Ökonomie der Durchführung bei umfassenden Studien zu erhöhen, sind drei Survey-Versionen, getrennt nach dem jeweiligen Beurteilungszeitraum, beziehbar. Instruktion und Items sind auf nur einem DIN-A4-Bogen abgebildet.
### !/usr/local/bin/R
# Xavier Fernández-i-Marín
# December 2004
# http://xavier-fim.net/
# load the example datasets from HLM/AppendixA/Examples
# Download them directly. Requires Internet connection
hsb1 <- read.table(file="http://xavier-fim.net/R/scripts/hsb1.dat")
hsb2 <- read.table(file="http://xavier-fim.net/R/scripts/hsb2.dat")
# get the variables and basic information of each dataset
names(hsb1)
length(hsb1$ID)
summary(as.data.frame(hsb1))
names(hsb2)
length(hsb2$ID)
summary(as.data.frame(hsb2))
# load the package to do Linear Mixed-Effects Models
library(nlme)
# create a new dataset with the merged data
ds <- merge(as.data.frame(hsb1), as.data.frame(hsb2))
### model specification
# 1
mod.1 <- lme(fixed = MATHACH ~ 1,
random = ~ 1 | ID,
data=ds, method="ML"
)
# 2
mod.2 <- lme(fixed = MATHACH ~ SES,
random = ~ 1 | ID,
data=ds, method="ML"
)
# 3
mod.3 <- lme(fixed = MATHACH ~ SES + SECTOR,
random = ~ 1 | ID,
data=ds, method="ML"
)
# 4
mod.4 <- lme(fixed = MATHACH ~ SES + SECTOR,
random = ~ 1 + SES | ID,
data=ds, method="ML"
)
# 5
# todo: convergence error
# mod.5 <- lme(fixed = MATHACH ~ SES * SECTOR,
# random = ~ 1 + SES | ID,
# data=ds, method="ML"
# )
# Extract information about the models
print(mod.1) # basic information
summary(mod.1) # complete information
summary(mod.2)
summary(mod.3)
summary(mod.4)
# summary(mod.5)
# attributes of an 'lmeObject'
mod.1$dims # dimensions
mod.1$coef # coefficients
mod.1$call # the formula used to produce the object
head(mod.1$residuals) # residuals of the model
head(mod.1$varFix) # matrix with the fixed effects
mod.1$sigma # estimated within-group error standard deviation
anova(mod.1)
# Plots
plot(mod.1)
# by default, the plot of an lme object is
# form=resid(., type = "p") ~ fitted(.)
# that corresponds to the standardized residuals vs predicted values
plot(mod.1, resid(.) ~ fitted(.) | SECTOR)
# idem, but comparing the sectors
qqnorm(mod.1) # normality of the residuals
interaction.plot(ds$SES, ds$SECTOR, ds$MATHACH, col=c("blue", "red"))
# show one of the plotting abilities of R
# standardized residuals vs predicted values, by sector
plot(mod.3, resid(., type = "p") ~ fitted(.) | SECTOR, abline = 0)
# boxplots with the residuals by school
# (it's just an example, there are too many objects in level 2)
plot(mod.4, ID ~ resid(.))
dd = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13,
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07,
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L,
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score",
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
# or using data.frame()
dd <- data.frame(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13, 2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07, 1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L))
# or more realistic
dd = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41,
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01,
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E",
"F", "G", "H"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"),
Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM",
"2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject",
"Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
# a first plot
library(ggplot2)
qplot(Time, Score, data = dd, geom = "line", group = Subject, colour = factor(Condition))
# using mle
library(nlme)
m1 <- lme(Score ~ Condition + Time + Condition*Time, data = dd, random = ~ 1 | Subject)
summary(m1)
## Linear mixed-effects model fit by REML
## Data: dd
## AIC BIC logLik
## 41.6678 48.79077 -12.8339
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 0.2370475 0.3365843
##
## Fixed effects: Score ~ Condition + Time + Condition * Time
## Value Std.Error DF t-value p-value
## (Intercept) 1.7450 0.2058400 12 8.477457 0.0000
## ConditionYes 1.5600 0.2911018 6 5.358951 0.0017
## Time2PM 0.4925 0.2380010 12 2.069319 0.0608
## Time3PM 1.1500 0.2380010 12 4.831912 0.0004
## ConditionYes:Time2PM -0.2250 0.3365843 12 -0.668480 0.5165
## ConditionYes:Time3PM -0.3950 0.3365843 12 -1.173555 0.2633
## Correlation:
## (Intr) CndtnY Tim2PM Tim3PM CY:T2P
## ConditionYes -0.707
## Time2PM -0.578 0.409
## Time3PM -0.578 0.409 0.500
## ConditionYes:Time2PM 0.409 -0.578 -0.707 -0.354
## ConditionYes:Time3PM 0.409 -0.578 -0.354 -0.707 0.500
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.4009179 -0.5396982 0.1391771 0.4146175 1.2662191
##
## Number of Observations: 24
## Number of Groups: 8
# The intercept variance is basically 0, indicating no within subject effect, so this model is not capturing the between time relationship well. A random intercept model is rarely the type of model you want for a repeated measures design. A random intercept model assumes that the correlations between all time points are equal. i.e. the correlation between time 1 and time 2 is the same as between time 1 and time 3. Under normal circumstances (perhaps not those generating your fake data) we would expect the later to be less than the former. An auto regressive structure is usually a better way to go.
### m2<-gls(Score ~ Condition + Time + Condition*Time, data = dd, correlation = corAR1(form = ~ Time | Subject))
### summary(m2)
# Your data is showing a -.596 between time point correlation, which seems odd. normally there should, at a minimum be a positive correlation between time points. How was this data generated?
# addendum:
# With your new data we know that the data generating process is equivalent to a random intercept model (though that is not the most realistic for a longitudinal study. The visualization shows that the effect of time seems to be fairly linear, so we should feel comfortable treating it as a numeric variable.
library(nlme)
m1 <- lme(Score ~ Condition + as.numeric(Time) + Condition*as.numeric(Time), data = dd, random = ~ 1 | Subject)
summary(m1)
## Linear mixed-effects model fit by REML
## Data: dd
## AIC BIC logLik
## 38.15055 44.12494 -13.07527
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 0.2457354 0.3173421
##
## Fixed effects: Score ~ Condition + as.numeric(Time) + Condition * as.numeric(Time)
## Value Std.Error DF t-value p-value
## (Intercept) 1.142500 0.2717382 14 4.204415 0.0009
## ConditionYes 1.748333 0.3842958 6 4.549447 0.0039
## as.numeric(Time) 0.575000 0.1121974 14 5.124898 0.0002
## ConditionYes:as.numeric(Time) -0.197500 0.1586710 14 -1.244714 0.2337
## Correlation:
## (Intr) CndtnY as.(T)
## ConditionYes -0.707
## as.numeric(Time) -0.826 0.584
## ConditionYes:as.numeric(Time) 0.584 -0.826 -0.707
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.44560365 -0.65018589 0.01864076 0.52930927 1.40824841
##
## Number of Observations: 24
## Number of Groups: 8
# We see a significant Condition effect, indicating that the 'yes' condition tends to have higher scores (by about 1.7), and a significant time effect, indicating that both groups go up over time. Supporting the plot, we find no differential effect of time between the two groups (the interaction). i.e. the slopes are the same.
# (using lme4 library) This fits your subject effect as random and also the variable that your random effects are grouped under. In this model the random effect is the intercept varying by subject.
require(lme4)
m <- lmer( Score ~ Condition + Time + (1|Subject), data=dd )
# To see the random effects you can just use
ranef(m)
## $Subject
## (Intercept)
## A -0.15956787
## B 0.22011131
## C 0.01282700
## D -0.32785811
## E 0.15033243
## F 0.01077468
## G -0.00359156
## H 0.09697212
##
## with conditional variances for "Subject"
# As Ian Fellows mentioned, your data may also have random Condition and Time components. You can test that with another model. In the one below Condition, Time, and the intercept are allowed to vary randomly by subject. It also evaluates their correlations.
### mi <- lmer( Score ~ Condition + Time + (Condition + Time|Subject), data=dd )
### summary(mi)
### ranef(mi)
# You could also test for this without correlations with the intercept, with interactions between Condition and Time, and numerous other models to see which best fits your data and / or your theory. Your question is a bit vague but these few commands should get you started.
# Note that Subject is your grouping factor so it's what you fit other effects as random under. It's not something you then explicitly fit as a predictor as well.
require(tidyverse)
require(foreign)
dd <- read.dta("https://stats.idre.ucla.edu/stat/stata/examples/mlm_ma_hox/popular.dta")
write_csv(dd, "popular.csv")
# ... see code and explanation in answers
library(tidyverse)
library(nlme)
dd <- read_delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/cosmetic_surgery.dat", "\t")
## Rows: 276 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): Surgery_Text, Reason_Text, Gender_Text
## dbl (9): particnu, Post_QoL, Base_QoL, Clinic, Surgery, Reason, Age, Gender,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# or
dd <- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/div/cosmetic_surgery.dat")
## Rows: 276 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): Surgery_Text, Reason_Text, Gender_Text
## dbl (9): particnu, Post_QoL, Base_QoL, Clinic, Surgery, Reason, Age, Gender,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dd$Gender <- factor(dd$Gender, levels = c(0, 1), labels = c("f", "m"))
dd$Surgery_Text <- factor(dd$Surgery_Text)
dd$Reason_Text <- factor(dd$Reason_Text)
dd$Gender_Text <- factor(dd$Gender_Text)
# model without interaction
m.test <- nlme::lme(Post_QoL ~ Surgery,
data = dd,
random = ~Surgery | Clinic,
method = "ML")
# we look at ICC
VarCorr(m.test)
## Clinic = pdLogChol(Surgery)
## Variance StdDev Corr
## (Intercept) 74.75235 8.645944 (Intr)
## Surgery 65.02213 8.063630 -0.947
## Residual 37.22614 6.101323
var.clinic = as.numeric(VarCorr(m.test)[1])
var.surgery = as.numeric(VarCorr(m.test)[2])
var.residual = as.numeric(VarCorr(m.test)[3])
cat("ICC clinic: ", var.clinic/(var.clinic + var.surgery + var.residual))
## ICC clinic: 0.4223282
cat("ICC surgery: ", var.surgery/(var.clinic + var.surgery + var.residual))
## ICC surgery: 0.3673554
# package psychometric offers a ICC calculation
require(psychometric)
cat("ICC clinic by package psychometric: ", ICC1.lme(Post_QoL, grp=Clinic, data=dd))
## ICC clinic by package psychometric: 0.4269201
m.interaction <- update(m.test, .~. + Surgery:Reason)
m.interaction <- nlme::lme(Post_QoL ~ Surgery + Surgery:Reason,
data = dd,
random = ~Surgery | Clinic,
method = "ML")
# we look at ICC, they only refer to random effects ...
VarCorr(m.interaction)
## Clinic = pdLogChol(Surgery)
## Variance StdDev Corr
## (Intercept) 74.75147 8.645893 (Intr)
## Surgery 63.92678 7.995422 -0.944
## Residual 37.16650 6.096433
var.clinic = as.numeric(VarCorr(m.interaction)[1])
var.surgery = as.numeric(VarCorr(m.interaction)[2])
var.residual = as.numeric(VarCorr(m.interaction)[3])
cat("ICC clinic: ", var.clinic/(var.clinic + var.surgery + var.residual))
## ICC clinic: 0.4250992
cat("ICC surgery: ", var.surgery/(var.clinic + var.surgery + var.residual))
## ICC surgery: 0.363541
cat("Random effect surgery binds slightly less variance compared to m.test.\n")
## Random effect surgery binds slightly less variance compared to m.test.
cat("because the fixed interaction effect Surgery:Reason 'explains' part of the surgery variance.")
## because the fixed interaction effect Surgery:Reason 'explains' part of the surgery variance.
# the additional fixed effect
fixef(m.test)
## (Intercept) Surgery
## 59.387461 0.367755
head(ranef(m.test))
## (Intercept) Surgery
## 1 9.451884 -8.540927
## 2 14.369838 -9.301715
## 3 5.463431 -7.091419
## 4 8.579993 -9.552092
## 5 -0.332467 -1.424968
## 6 -3.013192 5.380795
Matthew Crump PSYC 7709: Using R for Reproducible Research source
Psychologische Rundschau (2006): Mehrebenen.pdf
Torres paper
Zur Frage nach p-values und lme4 oder generell Signifikanztests bei Mixed-Designs: see
blog auf R-Bloggers.
Erklärung zu Akaike’s weight bzw. zu Modellvergleichen via AIC.
Siehe hierzu auch den Artikel von Wagenmakers & Farrel (2004)
Multi level online course of university of Bristol.
Vergleich lme4 nlme formulae und Erklärungen dazu. Leider ohne die Daten.
Rense Nieuwenhuis
Einführung von Fabio Votta
Good post that explains varCorr and the output of lmer
We present library(nlme)
here, alternatively we
might use library(lme4)
in combination with
lmerTest()
nlme::gls()
vs. nlme::lme()
non random
effect models vs. random effect models
level specific variables: values are the same for all observations of a specific level
nlme::gls(..., method = "ML")
and
nlme:lme(..., method = "ML")
useful for comparison of non
MLM and MLM via anova()
Formulae 2 parts: fixed and random effects have their own formula random: | separats random effects from the definition of the level structure
3 or more levels, separated by “/”, the more right runs faster,
lme(... | country/city )
fixef() und ranef() fixed effects: coefficients are equal for all levels (random) random effects: coefficients for every level additional to the fixed effect (corrects the fixed effect)
modeling random effects binds (explains) variance in the outcome and thus reduces the residuals due to level specific coefficients, but we don’t get level specific significance tests
including a variable as fixed effect reveals in a significance test for the coefficient (for all dummy variables in categorial predictors)
variables that define levels have to be categorial (usually of type factor)
it does not make sense to use level defining variables as explanatory variables (or even reaction variables)
we can model dependencies using the corr
parameter
this is especially relevant for time series (growth models) examlple above
we can use intervals(model)
to get informations on
explanatory variables in form of confidence intervals, a variable might
be relevant, if 0 is not included in the interval
we can use ICC to get an idea whether a categorial variable might serve as level defining variable
Vergleich Äquivalenz WW und random effects (passiert differenziert im sheet zu MLM)
predict ohne die originalen Daten, die den (Random-)Koeffizienten zugrunde liegen, basieren die Vorhersagen bei MLM nur auf den fixed Effects
Variablen in Beispiel unten: #Beispiel_Schule,_Abschlussnote_und_Schuleingangstest-Niveau
library(tidyverse)
library(nlme)
# data come from library(mlmRev)
dd <- mlmRev::Exam
# normexam Normalized exam score.
# standLRT Standardised LR test score (London Reading Test - a reading test taken when they were 11 years old)
# categorial predictor school
mg.0 <- gls(normexam ~ 1, data=dd, method="ML")
mg.1 <- gls(normexam ~ school, data=dd, method="ML")
car::Anova(mg.1, type=3)
## Analysis of Deviance Table (Type III tests)
##
## Response: normexam
## Df Chisq Pr(>Chisq)
## (Intercept) 1 21.632 3.303e-06 ***
## school 64 782.768 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# additional continuous predictor reading test: standLRT
mg.2 <- gls(normexam ~ school + standLRT, data=dd, method="ML", correlation=NULL)
car::Anova(mg.2, type=3)
## Analysis of Deviance Table (Type III tests)
##
## Response: normexam
## Df Chisq Pr(>Chisq)
## (Intercept) 1 21.495 3.547e-06 ***
## school 64 659.759 < 2.2e-16 ***
## standLRT 1 1992.549 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(mg.2)
# null model
m.0 <- lme(fixed = normexam ~ 1,
random = ~ 1 | school,
data=dd,
method="ML")
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
## Data: dd
## AIC BIC logLik
## 11016.65 11035.58 -5505.324
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 0.4106567 0.9207391
##
## Fixed effects: normexam ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) -0.01316707 0.05363399 3994 -0.2454986 0.8061
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.94711015 -0.64856012 0.01168106 0.69918450 3.65782427
##
## Number of Observations: 4059
## Number of Groups: 65
nlme::intervals(m.0)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) -0.1183067 -0.01316707 0.09197253
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.3393035 0.4106567 0.4970149
##
## Within-group standard error:
## lower est. upper
## 0.9007837 0.9207391 0.9411366
anova(mg.0, m.0)
## Model df AIC BIC logLik Test L.Ratio p-value
## mg.0 1 2 11513.37 11525.98 -5754.683
## m.0 2 3 11016.65 11035.58 -5505.324 1 vs 2 498.7168 <.0001
# MLM
m.1 <- lme(fixed = normexam ~ standLRT,
random = ~ 1 | school,
data=dd,
method="ML")
m.2 <- lme(fixed = normexam ~ standLRT,
random = ~ standLRT | school,
data=dd,
method="ML")
# visualization {}todo
dd.s <- dd %>% dplyr::filter(school %in% c(1,9))
p_m.1 <- predict(m.1)
p_m.2 <- predict(m.2)
dd.s %>% ggplot(aes(x=standLRT, y=normexam, color = school)) +
geom_point() +
# geom_point(data = dd.s, aes(color=school))
# geom_line(data=predict(m.1)) +
# geom_line(data=predict(m.2))
ggtitle("School exit")
# fixed effects coefficients
fixef(m.2)
## (Intercept) standLRT
## -0.01150385 0.55672932
# random effects coefficients, head, because we have them for every second level observation (school)
head(ranef(m.2))
## (Intercept) standLRT
## 1 0.37492945 0.12497525
## 2 0.47020445 0.16472812
## 3 0.47978037 0.08084307
## 4 0.03500977 0.12720641
## 5 0.24627676 0.07205334
## 6 0.51840503 0.05859459
# confidence intervals
nlme::intervals(m.2)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) -0.08949896 -0.01150385 0.06649125
## standLRT 0.51763969 0.55672932 0.59581894
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.24660747 0.3007325 0.3667368
## sd(standLRT) 0.08856276 0.1205756 0.1641601
## cor((Intercept),standLRT) 0.15216356 0.4972754 0.7343044
##
## Within-group standard error:
## lower est. upper
## 0.7278074 0.7440805 0.7607174
# log. likelhood deviance
m.2$logLik
## [1] -4658.435
-2*m.2$logLik
## [1] 9316.871
deviance(m.2) # -2ll only available with method="ML" in lme()
## 'log Lik.' 9316.871 (df=6)
# a look at the variances
VarCorr(m.0)
## school = pdLogChol(1)
## Variance StdDev
## (Intercept) 0.1686389 0.4106567
## Residual 0.8477605 0.9207391
VarCorr(m.1)
## school = pdLogChol(1)
## Variance StdDev
## (Intercept) 0.09212927 0.3035280
## Residual 0.56573100 0.7521509
VarCorr(m.2)
## school = pdLogChol(standLRT)
## Variance StdDev Corr
## (Intercept) 0.09044004 0.3007325 (Intr)
## standLRT 0.01453847 0.1205756 0.497
## Residual 0.55365575 0.7440805
# ... variances of residuals go down
# using ICC to decide on levels
# we look at variance decoposition: proportion that variance of interest holds of total variance
# school
as.numeric(VarCorr(m.0)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC schools as level 2 on random intercepts
## [1] 0.1659179
# schgend (mixed, boys, girls)
mm <- lme(fixed = normexam ~ 1, random = ~ 1 | schgend, data=dd, method="ML")
as.numeric(VarCorr(mm)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC schgend (boys, girls, mixed) as level 2 on random intercepts
## [1] 0.009612604
# vr (verbal reasoning bottom 25%, mid 50%, and top 25%.)
mm <- lme(fixed = normexam ~ 1, random = ~ 1 | vr, data=dd, method="ML")
as.numeric(VarCorr(mm)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC vr (schools level in verbal reasoning at intake) as level 2 on random intercepts
## [1] 0.09377866
# sex (F or M)
mm <- lme(fixed = normexam ~ 1, random = ~ 1 | sex, data=dd, method="ML")
as.numeric(VarCorr(mm)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC pupils gender as level 2 on random intercepts
## [1] 0.01293193
# require(psychometric)
# school on level 2
psychometric::ICC1.lme(normexam, grp=school, data=dd)
## [1] 0.1683409
# ... about the same as above
# predict is possible
head(predict(m.2))
## 1 1 1 1 1 1
## 0.7854411 0.5037219 -0.5668121 0.5037219 0.6164097 1.8559749
# ... and therefore some manual check of "explained" variance
cor(predict(m.2), dd$normexam)
## [1] 0.6764198
cor(predict(m.2), dd$normexam)^2
## [1] 0.4575438
# using library(rsq) which takes into account the model used ...
rsq::rsq(m.2, adj=F) # same as rsq::rsq(m.2)
## $model
## [1] 0.4491454
##
## $fixed
## [1] 0.348489
##
## $random
## [1] 0.1006564
rsq::rsq(m.2, adj=T)
## $model
## [1] 0.4490096
##
## $fixed
## [1] 0.3483284
##
## $random
## [1] 0.1006812
# some conceptual stuff
# including a categorial explanatory variable to have a continuous (normexam) and a categorial (vr) explanatory variable
# vr: Student level Verbal Reasoning (VR) score band at intake - a factor. Levels are bottom 25%, mid 50%, and top 25%.
m.f <- lme(fixed = normexam ~ standLRT + vr,
random = ~ 1 | school,
data=dd,
method="ML")
m.fr <- lme(fixed = normexam ~ standLRT + vr,
random = ~ standLRT + vr | school,
data=dd,
method="ML")
# ... in m.fr we have slope for continuous standLRT for every school which corrects fixed slope for standLRT
# ... we also have an intercept for categorial vr which corrects the fixed intercept for every school. As vr has 3 levels, we deal with two dummy variables.
# fixed effects coefficients (parameters) are the same for all observations (level 1)
fixef(m.f)
## (Intercept) standLRT vrmid 50% vrtop 25%
## -0.1349020 0.5603018 0.1021208 0.3051488
fixef(m.fr)
## (Intercept) standLRT vrmid 50% vrtop 25%
## -0.08053503 0.55318447 0.04612312 0.17180483
# random effects coefficients (parameters) are valid for a single unit of a level, here for every school
# they "correct" the fixed effects
# categorial random variables correct the fixed intercept, correct the level
# continuous random variables correct the fixed slope for their unit
head(ranef(m.f))
## (Intercept)
## 1 0.40137258
## 2 0.34621804
## 3 0.34890046
## 4 0.05040893
## 5 0.09574633
## 6 0.38267936
head(ranef(m.fr))
## (Intercept) standLRT vrmid 50% vrtop 25%
## 1 0.3264611 0.11793019 0.0509535 -0.09911205
## 2 0.3285482 0.15448034 -0.2665773 0.06084519
## 3 0.2423554 0.06471981 -0.1822759 0.16236549
## 4 0.2317168 0.14145486 -0.1886661 -0.04171440
## 5 0.1311447 0.05712286 -0.1050833 0.03512109
## 6 0.2355876 0.04322645 -0.1714415 0.20480701
# we have as many random coefficients as units in level 2 (schools)
nrow(ranef(m.fr))
## [1] 65
Unter https://md.psych.bio.uni-goettingen.de/mv/unit/parms/parms.html#Some_points_to_remember_regarding_MLM_and_an_example findet sich eine Sammlung von beachtenswerten Punkten zu MLM.
MLM - check - data structure and the concept of level variables StudIP - ownCloud
MLM - check - running lme() and looking at the output, also model tests StudIP - ownCloud
MLM - check - ICC, it helps to decide on which levels should be entered in the model StudIP - ownCloud
MLM - check - something more conceptual, fixef and ranef in action StudIP - ownCloud
Version: 07 Juli, 2022 10:12