Rmd

Theorie: Multi-Level-Modelle als Ersatz für Repeated-Measures-ANOVA-Designs

Einführende Präsentation von Thomas Schultze-Gerlach und Christan Treffenstädt

Die Präsentation als rmd

Chapter 18 of Serious Stats introduces multilevel models by considering them as an extension of repeated measures ANOVA models that can cope with missing outcomes, time-varying covariates and can relax the sphericity assumption of conventional repeated measures ANOVA. They can also deal with other – less well known – problems such as having stimuli that are random factor (e.g., see this post on my Psychological Statistics blog). Last, but not least, multilevel generalised linear models allow you to have discrete and bounded outcomes (e.g., dichotomous, ordinal or count data) rather than be constrained by as assuming a continuous response with normal errors. Quelle

Ein paar Punkte

unconditional model: Modell, in dem nur die Ebenen definiert werden. Null-Modell mit Ebenen. Auf jeder Ebene eine Konstante als Vorhersagewert.

Ebenenspezifische Modelle, also für jede Ausprägung von Ebene eine einzelne Analyse. Generalisierbarkeit fraglich.

Ein paar praktische Tips

Darstellungen hier mit nlme::lme(). Eine gängige Alternative für Multi-Level-Modelle in R ist lme4::lmer() oder besser lmerTest::lmer(), ein Wrapper für lme4::lmer() der auch Signifikanztests mit ausgibt.

Bei Convergenzfehlern, die nicht auf falschen Modellen beruhen: post

Correlation in Ausgabe bei Random Effects Da wir für jede Beobachtung auf Ebenen über Ebene 1 Parameterschätzungen für jede einzelne Beobachtung erhalten, können wir diese Parameter miteinander korrelieren. Dies ist Teil der Ausgabe.

Hintergrund: Formulae in lme4::lmer()

Quelle

The general trick is, as mentioned in another answer, is that the formula follows the form dependent ~ independent | grouping. The grouping is generally a random factor, you can include fixed factors without any grouping and you can have additional random factors without any fixed factor (an intercept-only model). A + between factors indicates no interaction, a * indicates interaction.

For random factors, you have three basic variants:

  • Intercepts only by random factor: (1 | random.factor)
  • Slopes only by random factor: (0 + fixed.factor | random.factor)
  • Intercepts and slopes by random factor: (1 + fixed.factor | random.factor)

Note that variant 3 has the slope and the intercept calculated in the same grouping, i.e. at the same time. If we want the slope and the intercept calculated independently, i.e. without any assumed correlation between the two, we need a fourth variant:

Intercept and slope, separately, by random factor: (1 | random.factor) + (0 + fixed.factor | random.factor)

There’s also a nice summary in response to a related question that you should look at.

If you’re up to digging into the math a bit, Barr et al. (2013) summarize the lmer syntax quite nicely in their Table 1, adapted here to meet the constraints of tableless markdown. That paper dealt with psycholinguistic data, so the two random effects are Subject and Item.

Models and equivalent lme4 formula syntax:

Nr Formula Syntax
1. \(Y_{si}=β_0+β_1X_i+e_{si}\) n/a (Not a mixed-effects model)
2. \(Y_{si}=β_0+S_{0s}+β_1X_i+e_{si}\) Y ∼ X+(1∣Subject)
3. \(Y_{si}=β_0+S_{0s}+(β_1+S_{1s})Xi+e_{si}\) Y ∼ X+(1 + X∣Subject)
4. \(Y_{si}=β_0+S_{0s}+I_{0i}+(β_1+S_{1s})Xi+e_{si}\) Y ∼ X+(1 + X∣Subject)+(1∣Item)
5. \(Y_{si}=β_0+S_{0s}+I_{0i}+β_1X_i+e_{si}\) Y ∼ X+(1∣Subject)+(1∣Item)
6. As (4), but S0s, S1s independent Y ∼ X+(1∣Subject)+(0 + X∣ Subject)+(1∣Item)
7. \(Y_{si}=β_0+I_{0i}+(β_1+S_{1s})X_i+e_{si}\) Y ∼ X+(0 + X∣Subject)+(1∣Item)

References:

Barr, Dale J, R. Levy, C. Scheepers und H. J. Tily (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68:255– 278.

Hintergrund: MR als Multi-Level Modell mit Maximum Likelihood Ratio Test

Einfache Regression Körpergewicht und Körpergröße, ML-Parameterschätzung mit lme aus library(nlme). Modelltest eines Nullmodells gegen ein Modell mit Slope.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")

require(nlme)
## Lade nötiges Paket: nlme
m.0 <- lme(fixed= weight ~ 1, random = ~1 | subj, data=ddf, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
##   Data: ddf 
##        AIC      BIC    logLik
##   273.7854 277.9889 -133.8927
## 
## Random effects:
##  Formula: ~1 | subj
##         (Intercept) Residual
## StdDev:     19.6556 7.370852
## 
## Fixed effects:  weight ~ 1 
##                Value Std.Error DF  t-value p-value
## (Intercept) 67.83333  3.898153 30 17.40141       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -0.4488246 -0.2439264 -0.1310233  0.2035043  1.0398234 
## 
## Number of Observations: 30
## Number of Groups: 30
m.1 <- lme(fixed= weight ~ 1 + height, random = ~1 | subj, data=ddf, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
##   Data: ddf 
##        AIC      BIC    logLik
##   264.2429 269.8477 -128.1214
## 
## Random effects:
##  Formula: ~1 | subj
##         (Intercept) Residual
## StdDev:    16.21583 6.080935
## 
## Fixed effects:  weight ~ 1 + height 
##                  Value Std.Error DF   t-value p-value
## (Intercept) -131.92913   55.2076 28 -2.389691  0.0238
## height         1.13782    0.3139 28  3.624762  0.0011
##  Correlation: 
##        (Intr)
## height -0.998
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -0.47221438 -0.28102107 -0.07517554  0.23402698  0.88130174 
## 
## Number of Observations: 30
## Number of Groups: 30
anova(m.0, m.1)
##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m.0     1  3 273.7853 277.9889 -133.8927                        
## m.1     2  4 264.2429 269.8476 -128.1214 1 vs 2 11.54249   7e-04
# compare it with least square linear model
lm.0 <- lm(ddf$weight ~ 1)
summary(lm.0)
## 
## Call:
## lm(formula = ddf$weight ~ 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.833 -14.583  -7.833  12.167  62.167 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   67.833      3.898    17.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.35 on 29 degrees of freedom
mean(ddf$weight)
## [1] 67.83333
lm.1 <- lm(ddf$weight ~ ddf$height)
summary(lm.1)
## 
## Call:
## lm(formula = ddf$weight ~ ddf$height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.291 -13.861  -3.708  11.543  43.469 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -131.9291    55.2076  -2.390  0.02384 * 
## ddf$height     1.1378     0.3139   3.625  0.00114 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.93 on 28 degrees of freedom
## Multiple R-squared:  0.3194, Adjusted R-squared:  0.2951 
## F-statistic: 13.14 on 1 and 28 DF,  p-value: 0.001138
anova(lm.0, lm.1)
## Analysis of Variance Table
## 
## Model 1: ddf$weight ~ 1
## Model 2: ddf$weight ~ ddf$height
##   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
## 1     29 13220.2                                
## 2     28  8997.9  1    4222.2 13.139 0.001138 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# get a plot for it
plot(ddf$height, ddf$weight)
abline(lm.1, col = "red")

Intraclass Correlation Coefficient (ICC)

proportion of the variation that is being contributed by different levels of random effects. source

Wenn in höherem Level nur noch wenig Residualvarianz übrig ist, ist der ICC hoch. Die Kontext-Einflüsse (der Level-Ausprägung) sind also stark bzw. haben großen Einfluss auf die Ausprägung der AV.

Je größer der Varianzanteil zwischen den Gruppen (Stufen auf einem Level) ist, desto weniger Varianz besteht anteilig innerhalb der Gruppen. Das bedeutet, dass sich die Werte innerhalb der Gruppen ähnlicher sind, also stärker miteinander korellieren.

Partitionierung der Residualvarianz:

  • die Residualvarianz ist die Varianz, die durch das (Gesamt-)Modell nicht aufgeklärt wird (variance of the residuals)
  • Varianz des Intercept für einen Faktor ist die Varianz um den Grand Mean, also die Residualvarianz des Grand Mean
  • der Varianzanteil des Faktors an der Gesamt-Residualvarianz ist der ICC dieses Faktors
  • Residualvarianz(Faktor) / Residualvarianz(Faktor) + Residualvarianz(Modell)
  • je kleiner der Anteil der Residualvarianz(Modell) wird, desto näher an 1 ist die ICC und desto mehr lohnt sich die Aufnahme des Faktors als Level
  • je größer der Anteil der Residualvarianz(Modell), desto näher an 0 ist die ICC, desto weniger lohnt sich die Aufnahme des Faktors als Level
  • je weiter die Faktorstufen auseinanderliegen (große Residualvarianz des Grand Mean) und je weniger Restvarianz innerhalb der Faktorstufe übrig ist (Residualvarianz des Modells), desto größer ist der ICC

todo: Grafik hierzu

ICC-Paper of Andy Field (2012) on companon site. paper

ICC and VarCorr

  • ICC tells us, whether MLM is necessary? Variance bound by level (group) membership should be considerable (recommendation: >10%) to use MLM instead of GLM
  • We use VarCorr() to get the necessary variances to calculate ICC
  • in nlme::lme() we have to co as.numeric() on the VarCorr() object, because content type is string.
  • VarCorr() is different in nlme::lme() and lme4::lmer()

Example in “check” below.

lme() und missing data

source

Mixed models based on likelihood methods can often handle missing observations within subjects, but they not do well with missing individual elements in the design matrices (think unit nonresponse vs item nonresponse in the survey world). Continuing with the example I recently sent to you

set.seed(0112358)
library(nlme)

# we create a dataframe withount missings
school <- factor(rep(1:20, each=4))
year <- rep(2006:2009, 20)
year.c <- year - mean(year)
tmt <- sample(0:1, 20, replace = TRUE)[school]
math <- rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school]
tmt <- factor(tmt)
df.f <- data.frame(math, school, tmt, year, year.c)
rm(math, school, year, year.c, tmt)

# Now create missing observations,
# -- note just a single obs for the first school:
df.m <- df.f[-c(2:6, sample(75, 5)), ]
table(df.m$school)
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
##  1  2  4  4  3  4  3  3  4  3  4  4  4  4  3  4  4  4  4  4
#
# school:  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
# numobs:  1  2  4  3  4  4  4  4  4  3  3  4  4  4  3  4  4  4  4  4

m.f <- lme(math ~ year.c*tmt, data = df.f, random=~1|school)
m.m <- lme(math ~ year.c*tmt, data = df.m, random=~1|school)

intervals(m.f)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                  lower       est.     upper
## (Intercept)  3.1358160 3.84090374 4.5459915
## year.c      -0.2776773 0.16550116 0.6086796
## tmt1         0.4895508 1.48740943 2.4852680
## year.c:tmt1 -0.5819210 0.01566072 0.6132424
## 
##  Random Effects:
##   Level: school 
##                     lower      est.    upper
## sd((Intercept)) 0.3849488 0.7518102 1.468295
## 
##  Within-group standard error:
##    lower     est.    upper 
## 1.238062 1.485190 1.781648
intervals(m.m)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                  lower       est.     upper
## (Intercept)  2.9435062 3.61474585 4.2859855
## year.c      -0.2744510 0.18578479 0.6460206
## tmt1         0.6135467 1.56693004 2.5203134
## year.c:tmt1 -0.5838501 0.02736675 0.6385836
## 
##  Random Effects:
##   Level: school 
##                     lower      est.    upper
## sd((Intercept)) 0.3063612 0.6718072 1.473179
## 
##  Within-group standard error:
##    lower     est.    upper 
## 1.139500 1.391602 1.699479

Note that even with the missing obs, in this instance we got pretty good estimates of the variance components (parameters: within-group sd = 1.5; between-school sd = 1). However, I had to change the seed from the one I used in the last email because, as is not uncommon with unbalanced data and relatively small sample sizes, there were convergence problems. If you were using classical MoM estimators fitting a mixed model with unbalanced data would lead to very precarious inferences (not that the likelihood inference isn’t without it share of pitfalls, but that’s another email…)

Mehrebenenmodelle mit mehr als zwei Ebenen

formula description
\(Y_{ijk} = \gamma_{000} + v_{00k} + u_{0jk} + \varepsilon_{ijk}\) intercept
\(+ \gamma_{100} \cdot X_{1_{ijk}} + \gamma_{010} \cdot X_{2_{0jk}} + \gamma_{001} \cdot X_{3_{00k}}\) Fixed Effects
\(+ v_{10k} \cdot X_{1_{ijk}} + u_{1jk} \cdot X_{1_{ijk}}\) random slopes des Level-1-Prädiktors
\(+ v_{20k} \cdot X_{2_{0jk}}\) random slope des Level-2-Prädiktors
\(+ \gamma_{110} \cdot X_{1_{ijk}} \cdot X_{2_{0jk}}\)
\(+ \gamma_{101} \cdot X_{1_{ijk}} \cdot X_{3_{00k}}\)
\(+ \gamma_{011} \cdot X_{2_{0jk}} \cdot X_{3_{00k}}\)
cross-level-2-fach-Interaktionen
\(+ \gamma_{111} \cdot X_{1_{ijk}} \cdot X_{2_{0jk}} \cdot X_{3_{00k}}\) cross-level-3-fach-Interaktion

Hintergrund: Correlation in Summaries von nlme::gls() bzw. nlme::lme()

Im Summary von nlme::gls() bzw. nlme::lme() taucht immer auch eine Korrelationstabelle mit auf, sowohl bei den fixed, als auch bei den random Effects.

Der Koeffizient zeigt die Korrelation zwischen Intercept und Slope, wenn der Slope variieren würde, also z. B. bei Wiederholung von Erhebungen. Für die einzelne untersuchte Stichprobe gibt es keine berechenbare Korrelation zwischen Intercept und Slope, weil sie für die Stichprobe konstant sind.

Eine Erklärung und Visualisierung des Intercept in der Correlation Tabelle von nlme::gls() bzw. nlme::lme()

Visualisation

Nach Zentrieren der Daten verschwindet die Korrelation, sie ist also abhängig von der Skalierung der Erklärvariablen. Meist ist diese Korrelation für die Interpretation irrelevant.

Parameter Corr in nlme::lme()

Inspired by StackExchange [https://stats.stackexchange.com/questions/367957/autoregression-in-nlme-undestanding-how-to-specify-corar1]

Daten sind simuliert. Wir können uns vorstellen, esvalue` seien die Stimmungs- bzw. Befindlichkeitswerte als Rating zwischen -10 und 10 im Laufe einer Woche.

set.seed(100)
n.subj <- 20
n.times <- 6
# subj.c <-rep(NA, n.subj)
subj.c <- rnorm(n.subj, sd=2)
subj.s <- rnorm(n.subj, sd=1)
mat = matrix(ncol = n.times, nrow = n.subj)
# converting the matrix to data frame
dd=data.frame(mat)
# line <- rep(subj.c[1], n.times)
# line <- line + rnorm(n.subj, sd=0.2)
for (subj in 1:n.subj) {
  resid <- rnorm(n.times, 0, sd=2)
  for (tt in 1:n.times) {
    dd[subj,tt] <- subj.c[subj] +  (tt - n.times/2) * subj.s[subj] + resid[tt]
  }
}
for (cc in 1:ncol(dd)) {colnames(dd)[cc] <- paste('t', as.character(cc), sep="_")}
# for (cc in 1:ncol(dd)) {colnames(dd)[cc] <- as.numeric(cc)}
dd <- cbind(1:nrow(dd), dd)
colnames(dd)[1] <- "subj"
dd.w <- dd
dd <- dd.w %>% tidyr::pivot_longer(cols = 2:ncol(dd), names_to = "time.s", values_to = "value")
dd <- dd %>% dplyr::rowwise() %>% dplyr::mutate(time = as.numeric(strsplit(time.s, split="_")[[1]][2]))
# dd <- dd %>% dplyr::mutate(time.n = as.numeric(time))
dd %>% ggplot(aes(x=time, y=value, group=subj, color=factor(subj))) +
  geom_point() + 
  geom_line()

#value = rnorm(100)
#time = rep(1:5, each = 20)
#id = rep(letters[1:20],5)
#dd = data.frame(value, time, id)

fit.0 <- dd %>% nlme::lme(value ~ time, random=~ 1 | subj,
              correlation = NULL,
              data=.)
fit.1 <- dd %>% nlme::lme(value ~ time, random=~ 1 | subj,
              correlation=corAR1(),
              data=.)

# random intercept and slope model without correlation between time steps
fit.2a <- dd %>% nlme::lme(value ~ time, random=~ 1 + time | subj,
              correlation=NULL,
              data=.)
fit.2b <- dd %>% nlme::lme(value ~ time, random=~ 1 + time | subj,
              correlation=corAR1(form=~1), # time is equally spaced (one daily value measured)
              # correlation=corAR1(form=~1|subj), # time is equally spaced and subj are uncorrelated
              data=.)
# just to show syntax and idea, corCAR1
# random intercept and slope model with time as continuous variable for correlation structure per subj
fit.2c <- dd %>% nlme::lme(value ~ time, random=~ 1 + time | subj,
              correlation=corCAR1(form=~time),  # this would also account for varying time lags
              data=.)


anova(fit.2a, fit.2b)
##        Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## fit.2a     1  6 569.9247 586.5488 -278.9623                        
## fit.2b     2  7 563.4894 582.8841 -274.7447 1 vs 2 8.435332  0.0037
anova(fit.0, fit.1, fit.2a)
##        Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## fit.0      1  4 588.9028 599.9855 -290.4514                         
## fit.1      2  5 589.9272 603.7806 -289.9636 1 vs 2  0.975598  0.3233
## fit.2a     3  6 569.9247 586.5488 -278.9623 2 vs 3 22.002509  <.0001
summary(fit.0)
## Linear mixed-effects model fit by REML
##   Data: . 
##        AIC      BIC    logLik
##   588.9028 599.9855 -290.4514
## 
## Random effects:
##  Formula: ~1 | subj
##         (Intercept) Residual
## StdDev:    1.321498  2.50516
## 
## Fixed effects:  value ~ time 
##                   Value Std.Error DF    t-value p-value
## (Intercept) -0.12179816 0.5993916 99 -0.2032030  0.8394
## time         0.08129237 0.1339064 99  0.6070834  0.5452
##  Correlation: 
##      (Intr)
## time -0.782
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.55525559 -0.59529315  0.05664013  0.54989467  2.84974370 
## 
## Number of Observations: 120
## Number of Groups: 20
summary(fit.1)
## Linear mixed-effects model fit by REML
##   Data: . 
##        AIC      BIC    logLik
##   589.9272 603.7806 -289.9636
## 
## Random effects:
##  Formula: ~1 | subj
##         (Intercept) Residual
## StdDev:    1.231883  2.56317
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | subj 
##  Parameter estimate(s):
##       Phi 
## 0.1292435 
## Fixed effects:  value ~ time 
##                   Value Std.Error DF    t-value p-value
## (Intercept) -0.16008511 0.6338599 99 -0.2525560  0.8011
## time         0.09201056 0.1451945 99  0.6337054  0.5277
##  Correlation: 
##      (Intr)
## time -0.802
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.59655646 -0.59732118  0.04912839  0.60991983  2.85002705 
## 
## Number of Observations: 120
## Number of Groups: 20
summary(fit.2b)
## Linear mixed-effects model fit by REML
##   Data: . 
##        AIC      BIC    logLik
##   563.4894 582.8841 -274.7447
## 
## Random effects:
##  Formula: ~1 + time | subj
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 2.1001965 (Intr)
## time        0.8395891 -0.866
## Residual    1.8947812       
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | subj 
##  Parameter estimate(s):
##        Phi 
## -0.3971962 
## Fixed effects:  value ~ time 
##                   Value Std.Error DF     t-value p-value
## (Intercept) -0.02110225 0.5551392 99 -0.03801254  0.9698
## time         0.05301270 0.2030656 99  0.26106201  0.7946
##  Correlation: 
##      (Intr)
## time -0.863
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.56356806 -0.69260877 -0.07906837  0.68254335  2.20481596 
## 
## Number of Observations: 120
## Number of Groups: 20

Beispiele

Beispiel OxBoys

source

This dataset contains three features from 26 students in Oxford: student id, (measured) height, and (measurement) year.

df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/div/oxboys.txt", fileEncoding = "UTF-8")
head(df)
##   subj     age height occasion
## 1    1 -1.0000  140.5        1
## 2    1 -0.7479  143.4        2
## 3    1 -0.4630  144.8        3
## 4    1 -0.1643  147.1        4
## 5    1 -0.0027  147.7        5
## 6    1  0.2466  150.2        6
# Plot the relationship between height and year, and draw a linearly regressed line ignoring the id variable.
plot(df$occasion, df$height, col="blue", main="Oxford Students: Height vs. Year", cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.5, xlim = c(0,10))
abline(lm(height~occasion, data=df), col="red", lwd=5)

library(nlme)

# Plot the relationship between height and year, but this time, fit a different linear regression for each individual. In ggplot, you can do it by specifying the group variable: ggplot(oxboys, aes(x=year, y=height, group=id)) + stat_smooth(method=“lm”) Other plotting packages also should have some such facility. Comment on differences between (a) and (b).
library(ggplot2)

ggplot(data=df, aes(x=occasion, y=height, group=subj)) + 
  stat_smooth(method="lm", color="red") + 
  ggtitle("Oxford Boys: Height vs. Occasion") +
  expand_limits(x=c(min(df$occasion),max(df$occasion))) +
  scale_x_continuous(breaks=seq(min(df$occasion), max(df$occasion), 1)) + 
  theme(axis.title = element_text(face="bold", colour="blue", size=20), 
    plot.title = element_text(family = "Trebuchet MS", color="blue", face="bold", size=22))
## `geom_smooth()` using formula 'y ~ x'

The first graph plots the relationship between year and height for all students. The abline represents the line of best fit for a global MLR model. In contrast, the ggplot graph shows the relationship between year and height with respect to each student (group=id). This is a graphic representation of a local model (i.e., 26 different linear regressions).

Example Pitch

Quelle or see

Pitch hier Stimmlage

Description: This experiment (from a collaboration with Tim Wells and Andrew Dunn) involves looking at the pitch of male voices making attractiveness ratings with respect to female faces. The effect of interest (for this example) is whether average pitch goes up or done for higher ratings (and if so, by how much). A conventional ANOVA is problematic because this is a design with two fully crossed random factors – each participant (n = 30) sees each face (n = 32) and any conclusions ought to generalise both to other participants and (crucially) to other faces. Furthermore, there is a time-varying covariate – the baseline pitch of the numerical rating when no face is presented.

Anwendung von lme4::lmer()

# we could get the original data via
# pitch.dat <- read.csv('http://www2.ntupsychology.net/seriousstats/pitch.csv')
# we use a local copy for convenience
# write.table(pitch.dat, "pitch.txt", quote=F, sep="\t", row.names=F)
df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/div/pitch.txt")

head(df)
##   Constant Participant Face Context attract  pitch   base
## 1        1           1    1       1       2  99.88  97.79
## 2        1           1    2       1       1 108.50 100.40
## 3        1           1    3       1       3  99.06 101.00
## 4        1           1    4       1       4  97.02 100.28
## 5        1           1    5       1       2 102.04  97.79
## 6        1           1    6       1       3  98.38 101.00
require(psych)
## Lade nötiges Paket: psych
## 
## Attache Paket: 'psych'
## Die folgenden Objekte sind maskiert von 'package:ggplot2':
## 
##     %+%, alpha
describe(df)
##             vars    n   mean    sd median trimmed   mad   min    max  range
## Constant       1 1920   1.00  0.00   1.00    1.00  0.00  1.00   1.00   0.00
## Participant    2 1920  15.50  8.66  15.50   15.50 11.12  1.00  30.00  29.00
## Face           3 1920  16.50  9.24  16.50   16.50 11.86  1.00  32.00  31.00
## Context        4 1920   1.50  0.50   1.50    1.50  0.74  1.00   2.00   1.00
## attract        5 1920   3.86  2.25   4.00    3.69  2.97  1.00   9.00   8.00
## pitch          6 1920 114.95 18.42 114.14  114.03 16.98 74.25 186.26 112.01
## base           7 1920 112.67 17.05 109.80  111.94 17.28 82.99 159.13  76.14
##             skew kurtosis   se
## Constant     NaN      NaN 0.00
## Participant 0.00     -1.2 0.20
## Face        0.00     -1.2 0.21
## Context     0.00     -2.0 0.01
## attract     0.46     -0.8 0.05
## pitch       0.48      0.1 0.42
## base        0.39     -0.5 0.39
require(Rmisc)
## Lade nötiges Paket: Rmisc
## Lade nötiges Paket: lattice
## Lade nötiges Paket: plyr
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attache Paket: 'plyr'
## Die folgenden Objekte sind maskiert von 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## Das folgende Objekt ist maskiert 'package:purrr':
## 
##     compact
summarySE(data=df, "pitch")
##    .id    N    pitch      sd       se        ci
## 1 <NA> 1920 114.9475 18.4154 0.420272 0.8242378
summarySE(data=df, "pitch", c("attract"))
##   attract   N    pitch       sd        se       ci
## 1       1 349 112.7665 17.37810 0.9302284 1.829577
## 2       2 314 114.8755 18.12165 1.0226639 2.012165
## 3       3 280 112.6467 16.07398 0.9606043 1.890953
## 4       4 264 116.1327 18.04405 1.1105343 2.186670
## 5       5 238 113.3962 17.06602 1.1062258 2.179291
## 6       6 181 120.0590 22.23088 1.6524080 3.260582
## 7       7 141 119.6528 19.95647 1.6806383 3.322712
## 8       8 109 116.1674 20.26676 1.9412037 3.847803
## 9       9  44 109.5525 17.85199 2.6912883 5.427500
summarySE(data=df, "pitch", c("Context"))
##   Context   N    pitch       sd        se       ci
## 1       1 960 114.7716 18.54056 0.5983939 1.174313
## 2       2 960 115.1233 18.29735 0.5905444 1.158908
summarySE(data=df, "pitch", c("attract", "Context"))
##    attract Context   N    pitch       sd       se       ci
## 1        1       1 177 112.5491 18.07397 1.358523 2.681092
## 2        1       2 172 112.9902 16.68154 1.271956 2.510756
## 3        2       1 167 114.1406 18.77047 1.452502 2.867759
## 4        2       2 147 115.7105 17.38127 1.433583 2.833255
## 5        3       1 147 111.3256 15.13055 1.247947 2.466374
## 6        3       2 133 114.1068 16.99413 1.473578 2.914883
## 7        4       1 128 116.7216 17.43147 1.540739 3.048845
## 8        4       2 136 115.5785 18.64952 1.599183 3.162692
## 9        5       1 121 112.4814 16.14998 1.468180 2.906894
## 10       5       2 117 114.3422 17.98501 1.662715 3.293216
## 11       6       1  90 121.7013 21.80328 2.298268 4.566609
## 12       6       2  91 118.4347 22.64857 2.374215 4.716793
## 13       7       1  66 121.2992 21.26681 2.617763 5.228034
## 14       7       2  75 118.2040 18.75236 2.165336 4.314526
## 15       8       1  45 116.3747 21.07783 3.142097 6.332481
## 16       8       2  64 116.0217 19.84412 2.480515 4.956914
## 17       9       1  19 109.8332 20.53339 4.710683 9.896778
## 18       9       2  25 109.3392 15.95831 3.191661 6.587265
require(ggplot2)
ggplot(data=df, aes(x=attract, y=pitch)) + geom_point() + geom_smooth(method = lm, se=F)
## `geom_smooth()` using formula 'y ~ x'

# regression line indicates a slightly growing pitch with growing attractivity

# require(ez)
# ezANOVA(data=)

library(lme4)
## Lade nötiges Paket: Matrix
## 
## Attache Paket: 'Matrix'
## Die folgenden Objekte sind maskiert von 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attache Paket: 'lme4'
## Das folgende Objekt ist maskiert 'package:nlme':
## 
##     lmList
pitch.me <- lmer(pitch ~ base + attract + (1|Face) + (1|Participant), data=df)

pitch.me
## Linear mixed model fit by REML ['lmerMod']
## Formula: pitch ~ base + attract + (1 | Face) + (1 | Participant)
##    Data: df
## REML criterion at convergence: 14121.79
## Random effects:
##  Groups      Name        Std.Dev.
##  Face        (Intercept)  0.6665 
##  Participant (Intercept) 13.1046 
##  Residual                 9.1909 
## Number of obs: 1920, groups:  Face, 32; Participant, 30
## Fixed Effects:
## (Intercept)         base      attract  
##     90.0333       0.2054       0.4591
coef(pitch.me)
## $Face
##    (Intercept)      base   attract
## 1     90.32040 0.2054025 0.4591019
## 2     89.85338 0.2054025 0.4591019
## 3     89.86314 0.2054025 0.4591019
## 4     89.67938 0.2054025 0.4591019
## 5     90.19382 0.2054025 0.4591019
## 6     89.88064 0.2054025 0.4591019
## 7     90.91743 0.2054025 0.4591019
## 8     89.58486 0.2054025 0.4591019
## 9     90.15139 0.2054025 0.4591019
## 10    89.93114 0.2054025 0.4591019
## 11    89.74059 0.2054025 0.4591019
## 12    89.77798 0.2054025 0.4591019
## 13    90.04976 0.2054025 0.4591019
## 14    90.09177 0.2054025 0.4591019
## 15    90.05765 0.2054025 0.4591019
## 16    90.22825 0.2054025 0.4591019
## 17    89.98391 0.2054025 0.4591019
## 18    90.17051 0.2054025 0.4591019
## 19    90.07468 0.2054025 0.4591019
## 20    90.22505 0.2054025 0.4591019
## 21    89.49214 0.2054025 0.4591019
## 22    89.86455 0.2054025 0.4591019
## 23    90.77257 0.2054025 0.4591019
## 24    90.15636 0.2054025 0.4591019
## 25    90.63089 0.2054025 0.4591019
## 26    89.48716 0.2054025 0.4591019
## 27    89.78225 0.2054025 0.4591019
## 28    90.04786 0.2054025 0.4591019
## 29    89.98818 0.2054025 0.4591019
## 30    90.01073 0.2054025 0.4591019
## 31    90.21006 0.2054025 0.4591019
## 32    89.84846 0.2054025 0.4591019
## 
## $Participant
##    (Intercept)      base   attract
## 1     79.24162 0.2054025 0.4591019
## 2    114.29585 0.2054025 0.4591019
## 3     94.88277 0.2054025 0.4591019
## 4     94.88836 0.2054025 0.4591019
## 5     86.91915 0.2054025 0.4591019
## 6     91.32206 0.2054025 0.4591019
## 7     93.85191 0.2054025 0.4591019
## 8     66.57846 0.2054025 0.4591019
## 9     96.38930 0.2054025 0.4591019
## 10   103.64559 0.2054025 0.4591019
## 11   110.36171 0.2054025 0.4591019
## 12    94.21022 0.2054025 0.4591019
## 13    89.44698 0.2054025 0.4591019
## 14    89.51116 0.2054025 0.4591019
## 15    79.57811 0.2054025 0.4591019
## 16    87.44646 0.2054025 0.4591019
## 17   110.05654 0.2054025 0.4591019
## 18    96.62059 0.2054025 0.4591019
## 19    86.64688 0.2054025 0.4591019
## 20    96.25067 0.2054025 0.4591019
## 21    80.82324 0.2054025 0.4591019
## 22    77.52477 0.2054025 0.4591019
## 23    74.52455 0.2054025 0.4591019
## 24    69.88148 0.2054025 0.4591019
## 25    71.22787 0.2054025 0.4591019
## 26   103.50483 0.2054025 0.4591019
## 27   114.56861 0.2054025 0.4591019
## 28    88.38784 0.2054025 0.4591019
## 29    85.04500 0.2054025 0.4591019
## 30    73.36771 0.2054025 0.4591019
## 
## attr(,"class")
## [1] "coef.mer"
fixef(pitch.me)
## (Intercept)        base     attract 
##  90.0333423   0.2054025   0.4591019
ranef(pitch.me)
## $Face
##    (Intercept)
## 1   0.28706115
## 2  -0.17996662
## 3  -0.17020403
## 4  -0.35396199
## 5   0.16048244
## 6  -0.15270484
## 7   0.88409253
## 8  -0.44848593
## 9   0.11805208
## 10 -0.10219812
## 11 -0.29275053
## 12 -0.25536038
## 13  0.01642258
## 14  0.05842830
## 15  0.02430616
## 16  0.19490993
## 17 -0.04943207
## 18  0.13716879
## 19  0.04133560
## 20  0.19170746
## 21 -0.54120048
## 22 -0.16879624
## 23  0.73922783
## 24  0.12302105
## 25  0.59754718
## 26 -0.54618038
## 27 -0.25108817
## 28  0.01451355
## 29 -0.04516416
## 30 -0.02261380
## 31  0.17671811
## 32 -0.18488701
## 
## $Participant
##    (Intercept)
## 1  -10.7917253
## 2   24.2625050
## 3    4.8494235
## 4    4.8550207
## 5   -3.1141924
## 6    1.2887170
## 7    3.8185653
## 8  -23.4548810
## 9    6.3559594
## 10  13.6122507
## 11  20.3283647
## 12   4.1768759
## 13  -0.5863578
## 14  -0.5221842
## 15 -10.4552290
## 16  -2.5868870
## 17  20.0232022
## 18   6.5872516
## 19  -3.3864650
## 20   6.2173245
## 21  -9.2101024
## 22 -12.5085737
## 23 -15.5087923
## 24 -20.1518667
## 25 -18.8054717
## 26  13.4714843
## 27  24.5352670
## 28  -1.6455045
## 29  -4.9883462
## 30 -16.6656326
## 
## with conditional variances for "Face" "Participant"
head(predict(pitch.me))
##        1        2        3        4        5        6 
## 100.5332 100.1432 101.1944 101.3218 100.4066 101.2119

Beispiel Auswirkung von Schlafdeprivation auf Reaktionszeiten

Der Datensatz ist Teil der library(lme4). Beschreibung des Datensatzes findet sich unter help(sleepstudy)

require(lme4)
df <- sleepstudy
head(sleepstudy)
##   Reaction Days Subject
## 1 249.5600    0     308
## 2 258.7047    1     308
## 3 250.8006    2     308
## 4 321.4398    3     308
## 5 356.8519    4     308
## 6 414.6901    5     308
# some descriptives
require(psych)
psych::describe(sleepstudy)
##          vars   n   mean    sd median trimmed   mad    min    max  range skew
## Reaction    1 180 298.51 56.33 288.65  295.17 59.34 194.33 466.35 272.02 0.58
## Days        2 180   4.50  2.88   4.50    4.50  3.71   0.00   9.00   9.00 0.00
## Subject*    3 180   9.50  5.20   9.50    9.50  6.67   1.00  18.00  17.00 0.00
##          kurtosis   se
## Reaction     0.03 4.20
## Days        -1.24 0.21
## Subject*    -1.23 0.39
require(Rmisc)
summarySE(data=df, "Reaction")
##    .id   N Reaction       sd       se       ci
## 1 <NA> 180 298.5079 56.32876 4.198498 8.284918
summarySE(data=df, "Reaction", "Days")
##    Days  N Reaction       sd        se       ci
## 1     0 18 256.6518 32.12945  7.572984 15.97760
## 2     1 18 264.4958 33.43033  7.879605 16.62451
## 3     2 18 265.3619 29.47342  6.946953 14.65679
## 4     3 18 282.9920 38.85774  9.158857 19.32350
## 5     4 18 288.6494 42.53789 10.026276 21.15359
## 6     5 18 308.5185 51.76962 12.202218 25.74443
## 7     6 18 312.1783 63.17372 14.890189 31.41555
## 8     7 18 318.7506 50.10396 11.809617 24.91611
## 9     8 18 336.6295 60.19972 14.189209 29.93661
## 10    9 18 350.8512 66.98616 15.788788 33.31143
summarySE(data=df, "Reaction", "Subject")
##    Subject  N Reaction       sd        se        ci
## 1      308 10 342.1338 79.82176 25.241858 57.101050
## 2      309 10 215.2330 10.81219  3.419116  7.734577
## 3      310 10 231.0013 21.85600  6.911473 15.634838
## 4      330 10 303.2214 22.90920  7.244525 16.388254
## 5      331 10 309.4361 27.24261  8.614869 19.488187
## 6      332 10 307.3021 64.30613 20.335382 46.001831
## 7      333 10 316.1583 30.06821  9.508402 21.509500
## 8      334 10 295.3021 41.85561 13.235905 29.941698
## 9      335 10 250.0700 13.83385  4.374648  9.896142
## 10     337 10 375.7210 59.62379 18.854698 42.652289
## 11     349 10 275.8345 42.93794 13.578170 30.715954
## 12     350 10 313.6027 63.36056 20.036368 45.325413
## 13     351 10 290.0978 28.97881  9.163905 20.730192
## 14     352 10 337.4215 47.60238 15.053194 34.052691
## 15     369 10 306.0346 37.46043 11.846028 26.797577
## 16     370 10 291.7018 59.20821 18.723281 42.355004
## 17     371 10 294.9840 36.50599 11.544207 26.114811
## 18     372 10 317.8861 35.82280 11.328165 25.626090
require(ggplot2)
ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + geom_point(aes(color=Subject)) + geom_smooth(method = lm, se=F)
## `geom_smooth()` using formula 'y ~ x'

# we convert Days to type factor
df$f.Days <- factor(df$Days)

# the classical anova approach
require(ez)
## Lade nötiges Paket: ez
m.ez <- ezANOVA(data=df, dv=.(Reaction), wid = .(Subject), within = .(f.Days), detailed=TRUE, type=2, return_aov = T)
m.ez
## $ANOVA
##        Effect DFn DFd        SSn      SSd         F            p p<.05      ges
## 1 (Intercept)   1  17 16039253.0 250618.1 1087.9793 7.475644e-17     * 0.975566
## 2      f.Days   9 153   166235.1 151101.0   18.7027 8.995345e-21     * 0.292691
## 
## $`Mauchly's Test for Sphericity`
##   Effect            W            p p<.05
## 2 f.Days 0.0002190204 5.154879e-08     *
## 
## $`Sphericity Corrections`
##   Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF] p[HF]<.05
## 2 f.Days 0.3690298 5.463092e-09         * 0.4693648 7.076552e-11         *
## 
## $aov
## 
## Call:
## aov(formula = formula(aov_formula), data = data)
## 
## Grand Mean: 298.5079
## 
## Stratum 1: Subject
## 
## Terms:
##                 Residuals
## Sum of Squares   250618.1
## Deg. of Freedom        17
## 
## Residual standard error: 121.4176
## 
## Stratum 2: Subject:f.Days
## 
## Terms:
##                   f.Days Residuals
## Sum of Squares  166235.1  151101.0
## Deg. of Freedom        9       153
## 
## Residual standard error: 31.42592
## Estimated effects may be unbalanced
# descriptives
ezStats(data=df, dv=.(Reaction), wid = .(Subject), within = .(f.Days))
##    f.Days  N     Mean       SD     FLSD
## 1       0 18 256.6518 32.12945 20.69491
## 2       1 18 264.4958 33.43033 20.69491
## 3       2 18 265.3619 29.47342 20.69491
## 4       3 18 282.9920 38.85774 20.69491
## 5       4 18 288.6494 42.53789 20.69491
## 6       5 18 308.5185 51.76962 20.69491
## 7       6 18 312.1783 63.17372 20.69491
## 8       7 18 318.7506 50.10396 20.69491
## 9       8 18 336.6295 60.19972 20.69491
## 10      9 18 350.8512 66.98616 20.69491
# and a plot
ezPlot(data=df,  dv=.(Reaction), wid = .(Subject), within = .(f.Days), x=.(f.Days))

# we store aov element for convenience
o.aov <- m.ez[['aov']][['Subject:f.Days']]
# we calculate predicted values and store them for further use
# reference group is mean at day zero 256.6518
mean.day.0 <- 256.6518
df$pred.ez <- NA
adds.one <- c(0, o.aov[['coefficients']])
adds <- rep(adds.one, length(unique(df$Subject)))
# predicted values are group means
# we get to group means by adding the coefficients to the mean of the reference group
df$pred.ez <-adds + mean.day.0 

# correlation of predicted values and reaction variable 
cor(df$Reaction, df$pred.ez)
## [1] 0.5410093
# standard deviation of residuals
sd(df$Reaction - df$pred.ez)
## [1] 47.37342
require(lattice)
xyplot(Reaction ~ Days | Subject, sleepstudy, type = c("g","p","r"),
       index = function(x,y) coef(lm(y ~ x))[1],
       xlab = "Days of sleep deprivation",
       ylab = "Average reaction time (ms)", aspect = "xy")

(fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## REML criterion at convergence: 1743.628
## Random effects:
##  Groups   Name        Std.Dev. Corr
##  Subject  (Intercept) 24.741       
##           Days         5.922   0.07
##  Residual             25.592       
## Number of obs: 180, groups:  Subject, 18
## Fixed Effects:
## (Intercept)         Days  
##      251.41        10.47
(fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
##    Data: sleepstudy
## REML criterion at convergence: 1743.669
## Random effects:
##  Groups    Name        Std.Dev.
##  Subject   (Intercept) 25.051  
##  Subject.1 Days         5.988  
##  Residual              25.565  
## Number of obs: 180, groups:  Subject, 18
## Fixed Effects:
## (Intercept)         Days  
##      251.41        10.47
require(lme4)
df <- sleepstudy
# detach package lme4, we only need it's sleepstudy data 
# detach("package:lme4", unload=TRUE)

require(nlme)
m.0 <- lme(fixed= Reaction ~ 1, random = ~1 | Subject, data=df, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
##   Data: df 
##        AIC     BIC    logLik
##   1916.541 1926.12 -955.2705
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept) Residual
## StdDev:    34.58954 44.25907
## 
## Fixed effects:  Reaction ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 298.5079   8.81949 162 33.84639       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.4914591 -0.5481683 -0.1470795  0.5178317  3.3461410 
## 
## Number of Observations: 180
## Number of Groups: 18
m.1 <- lme(fixed= Reaction ~ 1 + Days, random = ~1 | Subject, data=df, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
##   Data: df 
##        AIC     BIC    logLik
##   1802.079 1814.85 -897.0393
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept) Residual
## StdDev:    36.01208 30.89543
## 
## Fixed effects:  Reaction ~ 1 + Days 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 251.40510  9.559442 161 26.29914       0
## Days         10.46729  0.806227 161 12.98305       0
##  Correlation: 
##      (Intr)
## Days -0.38 
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.23470320 -0.55439805  0.01549172  0.52570857  4.26476610 
## 
## Number of Observations: 180
## Number of Groups: 18
m.2 <- lme(fixed= Reaction ~ 1 + Days, random = ~1 + Days | Subject, data=df, method='ML')
summary(m.2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: df 
##        AIC      BIC    logLik
##   1763.939 1783.097 -875.9697
## 
## Random effects:
##  Formula: ~1 + Days | Subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 23.780376 (Intr)
## Days         5.716807 0.081 
## Residual    25.591842       
## 
## Fixed effects:  Reaction ~ 1 + Days 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 251.40510  6.669396 161 37.69533       0
## Days         10.46729  1.510647 161  6.92901       0
##  Correlation: 
##      (Intr)
## Days -0.138
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.94156355 -0.46559311  0.02894656  0.46361051  5.17933587 
## 
## Number of Observations: 180
## Number of Groups: 18
anova(m.0, m.1, m.2)
##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m.0     1  3 1916.541 1926.120 -955.2705                        
## m.1     2  4 1802.079 1814.851 -897.0393 1 vs 2 116.4624  <.0001
## m.2     3  6 1763.939 1783.097 -875.9697 2 vs 3  42.1393  <.0001
# an lme object "understands" predict 
df$pred.m.1 <- predict(m.1)
cor(df$Reaction, df$pred.m.1)
## [1] 0.852554
df$pred.m.2 <- predict(m.2)
cor(df$Reaction, df$pred.m.2)
## [1] 0.9090176
# variance of residuals
sd(m.1$residuals)
## [1] 39.53289
# ... is smaller than in classical analysis

# we can get at the coefficients via
fixef(m.1)
## (Intercept)        Days 
##   251.40510    10.46729
ranef(m.1)
##     (Intercept)
## 308   40.635097
## 309  -77.565875
## 310  -62.878604
## 330    4.390385
## 331   10.178962
## 332    8.191280
## 333   16.440367
## 334   -2.986060
## 335  -45.117122
## 337   71.919657
## 349  -21.119011
## 350   14.059942
## 351   -7.833572
## 352   36.245865
## 369    7.010741
## 370   -6.339518
## 371   -3.282269
## 372   18.049734
# we extend the data with effects of model m.1
df$inter <- NA
df$slope <- NA
require(dplyr)
df <- df %>% dplyr::group_by(Subject) %>% dplyr::mutate(inter=fixef(m.1)[1] + ranef(m.1)[Subject,1])
df <- df %>% dplyr::mutate(slope=fixef(m.1)[2])

# we visualize the random intercept fixed slope model
require(ggplot2)
pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + 
  geom_point(aes(color=Subject)) +
  geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
  geom_smooth(method = lm, se=F)
pplot
## `geom_smooth()` using formula 'y ~ x'

# we extend the data with effects of model m.2
df$inter <- NA
df$slope <- NA
require(dplyr)
df <- df %>% dplyr::group_by(Subject) %>% dplyr::mutate(inter=fixef(m.2)[1] + ranef(m.2)[Subject,1])
df <- df %>% dplyr::group_by(Subject) %>% dplyr::mutate(slope=fixef(m.2)[2] + ranef(m.2)[Subject,2])


# we visualize the random intercept random slope model
pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + 
  geom_point(aes(color=Subject)) +
  geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
  geom_smooth(method = lm, se=F, size=3) 
pplot
## `geom_smooth()` using formula 'y ~ x'

# pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + geom_point(aes(color=Subject)) 
# pplot + geom_smooth(method = lm, se=F)
# 
# # data for ggplot have to be in one dataframe
# require(dplyr)
# df.t <- dplyr::data_frame(subj = 1:length(ranef(m.2)[,1]), r.int = ranef(m.2)[,1], r.slope = ranef(m.2)[,2], f.int <- fixef(m.2)[1], f.slope <- fixef(m.2)[2])
# names(df.t) <- c("subj", "r.int", "r.slope", "f.int", "f.slope")
# df.t <- df.t %>% dplyr::mutate(intercept = f.int + r.int, slope = f.slope + r.slope)
# df.t <- df.t %>% dplyr::mutate(x0 = 0, y0 = intercept, x9 = 9, y9 = intercept + 9 * slope, Days = 0, Reaction=300)
# head(df.t)
# 
# # df.t$subj <- 1:length(ranef(m.2)[1])
# # df.t <- cbind(df.t, ranef(m.2))
# # df.t$fix.intercept <- fixef(m.2)[1]
# # df.t$fix.slope     <- fixef(m.2)[2]
# # names(df.t) <- c("subj", "r.int", "r.slope", "f.int", "f.slope")
# # df.t <- df.t %>% dplyr::mutate(intercept = f.int + r.int, slope = f.slope + f.int)
# # head(df.t)
# 
# # we extend the data with effects
# df$inter <- NA
# df$slope <- NA
# for (ss in rownames(ranef(m.2))){
#   df$inter[df$Subject == ss] <- fixef(m.2)[1] + ranef(m.2)[ss,1]
#   df$slope[df$Subject == ss] <- fixef(m.2)[2] + ranef(m.2)[ss,2]
# }
# 
# #pplot <- pplot + geom_line(data=df.t, aes(intercept, slope), color=subj)
# #pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject, show.legend=F) + geom_point(aes(color=Subject), show.legend=F) 
# pplot <- ggplot(data=df, aes(x=Days, y=Reaction), color=Subject) + geom_point(aes(color=Subject)) 
# pplot <- pplot + geom_abline(aes(intercept=inter, slope=slope, color = Subject)) 
# #pplot <- pplot + geom_line() 
# #pplot <- pplot + geom_segment(data=df.t, mapping=aes(x=x0, y=y0, xend=x9, yend=y9))
# #pplot <- pplot + geom_segment(data=df.t, mapping=aes(x=x0, y=y0, xend=x9, yend=y9, color=factor(subj)))
# pplot <- pplot + geom_smooth(method = lm, se=F)
# pplot
# 
# #pplot <- pplot + geom_line(data=df.t, aes(mapping = NULL, x=x0, y=intercept), color=subj)
# #pplot <- pplot + geom_line(data=df.t, aes(intercept = intercept, slope = slope, color=factor(subj)))
# pplot
# 
# 
# 
# 
# pplot + geom_abline(intercept = fixef(m.2)[1], slope = fixef(m.2)[2])              
# for (nn in 1:length(ranef(m.2)[,1])){
#   #cat(ranef(m.2)[nn])
#   c.int   <- ranef(m.2)[nn,1] + fixef(m.2)[1]
#   #cat(c.int)
#   c.slope <- ranef(m.2)[nn,2] + fixef(m.2)[2]
#   #cat(c.slope)
#   pplot <- pplot +  geom_abline(intercept = c.int, slope = c.slope)      
# }
# pplot <- pplot + geom_abline(intercept = fixef(m.2)[1], slope = fixef(m.2)[2], size=2)              
# pplot 

Beispiel Schule, Abschlussnote und Schuleingangstest-Niveau

Quelle

Im britischen Schulsystem gibt es beim Übergang von Primary School zu Secondary School einen standardisierten Test, in Nordirland beispielsweise der “Common Entrance Assessment (CEA)” durchgeführt von der AQE (Association for Quality Education). Auf Basis der Ergebnisse dieses Tests können Eltern ihre Kinder in den Highschools anmelden. Die Highschools haben unterschiedliche Minimal-Punktwerte, also Cut-Off-Werte, um Schüler zu akzeptieren. Die Schuleingangstests haben einen Mittelwert von 100.

Ein Beispiel: Belfast High School. Hier wurden 2016 nur überdurchschnittliche Schüler aufgenommen.

AQE Highest 2016: 124
AQE Lowest 2016: 101
AQE Applications 2016: 220
AQE 1st Preference 2016: 179
AQE Admitted 2016: 136

Die Abschlussnote von Schülern soll vorhergesagt werden normexam. Die Schüler sind an verschiedenen Schulen. Vorhersagevariable auf Level Schüler ist standLRT (Literarisches Textverständnis). Vorhersagevariable auf Schul-Level ist schavg (Durchschnittliches Aufnahmeniveau der jeweiligen Schule).

Die Schul-Level Variablen erkennen wir daran, dass ihre Ausprägung für die jeweilige Schule gleich ist, also innerhalb einer Schule für alle Schülerinnen und Schüler gleich bleiben (sich wiederholen).

Der verwendete Datensatz beschreibt 65 Schulen in London von insgesamt 4059 Schülern. Infos hierzu gibt es mit ?mlmRev::Exam

A data frame with 4059 observations on the following 9 variables.

var description
school School ID - a factor.
normexam Normalized exam score.
schgend School gender - a factor. Levels are mixed, boys, and girls.
schavg School average of intake score.
vr Student level Verbal Reasoning (VR) score band at intake - a factor. Levels are bottom 25%, mid 50%, and top 25%.
intake Band of student’s intake score - a factor. Levels are bottom 25%, mid 50% and top 25%./
standLRT Standardised LR test score (London Reading Test - a reading test taken when they were 11 years old)
sex Sex of the student - levels are F and M.
type School type - levels are Mxd and Sngl.
student Student id (within school) - a factor
require(nlme)
require(mlmRev)
## Lade nötiges Paket: mlmRev
## 
## Attache Paket: 'mlmRev'
## Die folgenden Objekte sind maskiert von 'package:nlme':
## 
##     bdf, Oxboys
# mlmRev hat data Exam
dd <- mlmRev::Exam
names(dd) 
##  [1] "school"   "normexam" "schgend"  "schavg"   "vr"       "intake"  
##  [7] "standLRT" "sex"      "type"     "student"
head(dd)
##   school   normexam schgend    schavg      vr     intake   standLRT sex type
## 1      1  0.2613242   mixed 0.1661752 mid 50% bottom 25%  0.6190592   F  Mxd
## 2      1  0.1340672   mixed 0.1661752 mid 50%    mid 50%  0.2058022   F  Mxd
## 3      1 -1.7238820   mixed 0.1661752 mid 50%    top 25% -1.3645760   M  Mxd
## 4      1  0.9675862   mixed 0.1661752 mid 50%    mid 50%  0.2058022   F  Mxd
## 5      1  0.5443412   mixed 0.1661752 mid 50%    mid 50%  0.3711052   F  Mxd
## 6      1  1.7348992   mixed 0.1661752 mid 50% bottom 25%  2.1894372   M  Mxd
##   student
## 1     143
## 2     145
## 3     142
## 4     141
## 5     138
## 6     155
require(Rmisc)
summarySE(dd, "normexam", c("school"))
##    school   N     normexam        sd         se        ci
## 1       1  73  0.501209573 1.1158114 0.13059584 0.2603381
## 2       2  55  0.783102291 1.2844851 0.17319993 0.3472450
## 3       3  52  0.855444696 0.9581753 0.13287501 0.2667577
## 4       4  79  0.073628516 1.0866424 0.12225682 0.2433946
## 5       5  35  0.403608663 0.7978350 0.13485872 0.2740659
## 6       6  80  0.944579967 0.8352044 0.09337869 0.1858656
## 7       7  88  0.391500023 0.7877530 0.08397475 0.1669089
## 8       8 102 -0.048192541 1.0484182 0.10380887 0.2059289
## 9       9  34 -0.435682141 0.8291304 0.14219469 0.2892973
## 10     10  50 -0.269390664 0.6613442 0.09352820 0.1879520
## 11     11  62  0.557222542 0.8758982 0.11123918 0.2224365
## 12     12  47 -0.072478545 0.7990063 0.11654705 0.2345970
## 13     13  64 -0.245749669 0.9855739 0.12319674 0.2461890
## 14     14 198  0.015569462 0.9443966 0.06711534 0.1323568
## 15     15  91 -0.040602934 0.9607184 0.10071065 0.2000793
## 16     16  88 -0.254118025 0.7288980 0.07770078 0.1544387
## 17     17 126 -0.245425479 0.9231316 0.08223910 0.1627614
## 18     18 120 -0.006150902 0.8295298 0.07572537 0.1499438
## 19     19  55  0.206285044 0.9124906 0.12304020 0.2466808
## 20     20  39  0.499090605 0.8537705 0.13671269 0.2767604
## 21     21  73  0.380324951 0.7738230 0.09056914 0.1805463
## 22     22  90 -0.498201307 0.8950668 0.09434832 0.1874681
## 23     23  28 -0.737632557 1.0780764 0.20373729 0.4180344
## 24     24  37  0.012641427 1.0217807 0.16797972 0.3406787
## 25     25  73 -0.613106307 0.8802210 0.10302207 0.2053708
## 26     26  75 -0.392018208 0.8688488 0.10032601 0.1999039
## 27     27  39 -0.328896456 0.8753218 0.14016366 0.2837465
## 28     28  57 -0.872710309 0.6198061 0.08209533 0.1644567
## 29     29  79  0.063508116 0.7452036 0.08384196 0.1669165
## 30     30  42  0.335456290 1.3144203 0.20281946 0.4096022
## 31     31  49 -0.236679098 0.7157689 0.10225270 0.2055928
## 32     32  42 -0.371331538 1.0198787 0.15737070 0.3178166
## 33     33  77  0.077111969 0.7779287 0.08865320 0.1765682
## 34     34  26 -0.370894600 1.3554668 0.26582892 0.5474849
## 35     35  38  0.083951826 0.7335080 0.11899071 0.2410981
## 36     36  70 -0.249203923 0.9643744 0.11526478 0.2299469
## 37     37  22 -0.665348082 0.7122275 0.15184741 0.3157840
## 38     38  54 -0.287545137 0.9875188 0.13438429 0.2695408
## 39     39  48  0.038325454 0.7469371 0.10781108 0.2168879
## 40     40  71 -0.259676242 1.2444905 0.14769385 0.2945661
## 41     41  60  0.072922377 0.8958935 0.11565936 0.2314338
## 42     42  58  0.022684469 0.6780823 0.08903657 0.1782927
## 43     43  61  0.147594741 0.9470311 0.12125491 0.2425459
## 44     44  29 -0.433727697 0.6801447 0.12629971 0.2587132
## 45     45  53 -0.225505181 0.8165485 0.11216156 0.2250686
## 46     46  83 -0.457403342 0.6929915 0.07606570 0.1513189
## 47     47  82 -0.122257644 0.9812088 0.10835639 0.2155952
## 48     48   2 -0.414295000 0.4033479 0.28521000 3.6239367
## 49     49 113  0.046892811 0.7810911 0.07347887 0.1455890
## 50     50  73 -0.321780110 0.9139714 0.10697226 0.2132453
## 51     51  58 -0.278070169 0.7557403 0.09923356 0.1987118
## 52     52  61  0.533366836 1.1360822 0.14546043 0.2909642
## 53     53  70  1.003548429 1.2838470 0.15344907 0.3061225
## 54     54   8 -0.633628850 0.6522696 0.23061214 0.5453111
## 55     55  51  0.717123745 0.9759933 0.13666640 0.2745025
## 56     56  38 -0.034280063 1.2405573 0.20124496 0.4077610
## 57     57  63  0.001081203 0.9535847 0.12014037 0.2401572
## 58     58  37  0.281887811 0.7257508 0.11931269 0.2419773
## 59     59  47 -1.049087153 0.7428851 0.10836093 0.2181192
## 60     60  80  0.195428218 0.9267497 0.10361376 0.2062380
## 61     61  64 -0.052334084 1.0869840 0.13587300 0.2715205
## 62     62  71  0.037990287 0.8509089 0.10098431 0.2014068
## 63     63  30  0.735678180 0.7038345 0.12850201 0.2628161
## 64     64  59  0.343793698 0.9803036 0.12762466 0.2554685
## 65     65  80 -0.308687153 0.8915633 0.09967981 0.1984077
require(psych)
psych::describe (dd)
##          vars    n   mean     sd median trimmed   mad   min    max  range  skew
## school*     1 4059  31.01  18.94  29.00   30.51 23.72  1.00  65.00  64.00  0.20
## normexam    2 4059   0.00   1.00   0.00    0.00  1.00 -3.67   3.67   7.33  0.00
## schgend*    3 4059   1.80   0.91   1.00    1.76  0.00  1.00   3.00   2.00  0.39
## schavg      4 4059   0.00   0.31  -0.02    0.01  0.32 -0.76   0.64   1.39 -0.20
## vr*         5 4059   2.13   0.65   2.00    2.16  0.00  1.00   3.00   2.00 -0.13
## intake*     6 4059   1.84   0.63   2.00    1.80  0.00  1.00   3.00   2.00  0.14
## standLRT    7 4059   0.00   0.99   0.04    0.02  0.98 -2.93   3.02   5.95 -0.13
## sex*        8 4059   1.40   0.49   1.00    1.37  0.00  1.00   2.00   1.00  0.41
## type*       9 4059   1.47   0.50   1.00    1.46  0.00  1.00   2.00   1.00  0.14
## student*   10 4059 121.67 134.99  78.00   92.20 69.68  1.00 650.00 649.00  2.12
##          kurtosis   se
## school*     -1.23 0.30
## normexam    -0.02 0.02
## schgend*    -1.69 0.01
## schavg      -0.35 0.00
## vr*         -0.69 0.01
## intake*     -0.57 0.01
## standLRT     0.07 0.02
## sex*        -1.83 0.01
## type*       -1.98 0.01
## student*     4.19 2.12
# we specify the null model
m.0 <- lme(fixed = normexam ~ 1,   # 1, a constant, predicts and results in mean
         data = dd,
         random = ~ 1 | school,
         method="ML")      # levels are defined, school defines level two
m.0
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##   Log-likelihood: -5505.324
##   Fixed: normexam ~ 1 
## (Intercept) 
## -0.01316707 
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept)  Residual
## StdDev:   0.4106567 0.9207391
## 
## Number of Observations: 4059
## Number of Groups: 65
# ... we find mean in fixed (Intercept)
# ... we find sd of residuals for (Intercept) level 2 and level 1 Residual


# we add a level 1 predictor
# random intercept, fixed predictor in individual level
m.1 <- lme(fixed = normexam ~ standLRT, random = ~ 1 | school, data = dd, method="ML") 
m.1
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##   Log-likelihood: -4678.622
##   Fixed: normexam ~ standLRT 
## (Intercept)    standLRT 
## 0.002390757 0.563371165 
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept)  Residual
## StdDev:    0.303528 0.7521509
## 
## Number of Observations: 4059
## Number of Groups: 65
# ... sd of Intercept goes down
# ... sd of Residuals goes down
fixef(m.1)
## (Intercept)    standLRT 
## 0.002390757 0.563371165
head(ranef(m.1))
##   (Intercept)
## 1  0.37376075
## 2  0.50204393
## 3  0.50388972
## 4  0.01813123
## 5  0.24043139
## 6  0.54139588
# we add a level 2 predictor
# random intercept, random slope
m.2 <- lme(fixed = normexam ~ standLRT, random = ~ standLRT | school, data = dd, method="ML")
m.2
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##   Log-likelihood: -4658.435
##   Fixed: normexam ~ standLRT 
## (Intercept)    standLRT 
## -0.01150385  0.55672932 
## 
## Random effects:
##  Formula: ~standLRT | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.3007325 (Intr)
## standLRT    0.1205756 0.497 
## Residual    0.7440805       
## 
## Number of Observations: 4059
## Number of Groups: 65
fixef(m.2)
## (Intercept)    standLRT 
## -0.01150385  0.55672932
head(ranef(m.2))
##   (Intercept)   standLRT
## 1  0.37492945 0.12497525
## 2  0.47020445 0.16472812
## 3  0.47978037 0.08084307
## 4  0.03500977 0.12720641
## 5  0.24627676 0.07205334
## 6  0.51840503 0.05859459
# we can add a level 2 predictor on level 1 also
# random intercept, individual and group level predictor. There is a random slope also.
m.3 <- lme(fixed = normexam ~ standLRT + schavg, random = ~ standLRT | school, data = dd, method="ML")
m.3
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##   Log-likelihood: -4655.215
##   Fixed: normexam ~ standLRT + schavg 
##  (Intercept)     standLRT       schavg 
## -0.001238744  0.552391343  0.294788775 
## 
## Random effects:
##  Formula: ~standLRT | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.2729164 (Intr)
## standLRT    0.1220248 0.379 
## Residual    0.7440531       
## 
## Number of Observations: 4059
## Number of Groups: 65
fixef(m.3)
##  (Intercept)     standLRT       schavg 
## -0.001238744  0.552391343  0.294788775
head(ranef(m.3))
##    (Intercept)   standLRT
## 1  0.316023289 0.12335833
## 2  0.352660821 0.15783280
## 3  0.342163770 0.05858984
## 4 -0.002440117 0.13442592
## 5  0.181470405 0.06233838
## 6  0.349439693 0.03347502
# we can also add cross-level interaction
# random intercept, cross-level interaction
m.4 <- lme(fixed = normexam ~ standLRT * schavg, random = ~ standLRT | school, data=dd, method="ML") 
fixef(m.4)
##     (Intercept)        standLRT          schavg standLRT:schavg 
##    -0.007008473     0.558239557     0.373510221     0.162219019
head(ranef(m.4))
##    (Intercept)    standLRT
## 1  0.307074694  0.09386613
## 2  0.320212211  0.10349247
## 3  0.288467624  0.01089639
## 4 -0.004713493  0.10997334
## 5  0.166694142  0.04226494
## 6  0.277333995 -0.02955381
# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
deviance(m.0)
## 'log Lik.' 11010.65 (df=3)
deviance(m.1)
## 'log Lik.' 9357.243 (df=4)
anova(m.0)
##             numDF denDF    F-value p-value
## (Intercept)     1  3994 0.06026954  0.8061
anova(m.1, test="F")
##             numDF denDF   F-value p-value
## (Intercept)     1  3993    0.0924  0.7612
## standLRT        1  3993 2041.5623  <.0001
# which is default
anova(m.1)
##             numDF denDF   F-value p-value
## (Intercept)     1  3993    0.0924  0.7612
## standLRT        1  3993 2041.5623  <.0001

Beispiel: Schultyp und Zukunftsperspektive

Ecologic fallacy: Systematiken auf höheren Ebenen werden als Effekt auf level 1 fehlinterpretiert.

y ist Reaktionsvariable x ist Vorhersagevariable auf Level 1 treatment ist Gruppierungsvariable für Level 2 g.eff ist Variable auf Level 2 ev. subj kann benutzt werden als zweite Level 2 Variable (Betrachtung als Messwiederholungsdesign)

Szenario:

  • x ist pot. kogn. Leistungsfähigkeit
  • y ist Zukunftsaussichten, Zuversicht in die Zukunft
  • Gruppen sind Schulformen: Haupt- Realschule, Gymnasium
  • Effekt: Das Wissen um die Aussichtslosigkeit in Hauptschulen

Repeated-Measures-ANOVA würde Mittelwertsunterschiede zwischen den Schultypen prüfen. Schultyp würde als kategoriale Variable aufgefasst.

(Es handelt sich um simulierte Daten)

# we read data like generated above
df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/div/schooltype_future.txt")
df$schooltype <- factor(df$schooltype)
# correlation is
cor(df$x, df$y)
## [1] 0.7397159
# complete scatterplot seems to visualize this correlation, we might notice some asymmetry in distribution
require(ggplot2)
# we visualize the random intercept random slope model
pplot <- ggplot(data=df, aes(x=x, y=y)) + 
  #geom_point(aes(color=Subject)) +
  geom_point() + 
  #geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
  geom_smooth(method = lm, se=F, size=1) +
    xlab('cognitive abilities') + 
    ylab('future perspectives')
pplot
## `geom_smooth()` using formula 'y ~ x'

# but if we look at the groups ...
pplot <- ggplot(data=df, aes(x=x, y=y, color=schooltype, shape=schooltype)) + 
  #geom_point(aes(color=Subject)) +
  geom_point() +
  #geom_smooth(method="lm", fill=NA)
  #geom_abline(aes(intercept=inter, slope=slope, color = Subject)) +
  #geom_smooth(method = lm, se=F, size=2, ) 
    xlab('cognitive abilities') + 
    ylab('future perspectives')
pplot

# group specific correlations show huge differences
cor(df[df$schooltype == 'a', 'x'], df[df$schooltype=='a', 'y'])
## [1] 0.7694075
cor(df[df$schooltype == 'b', 'x'], df[df$schooltype=='b', 'y'])
## [1] 0.2675309
cor(df[df$schooltype == 'c', 'x'], df[df$schooltype=='c', 'y'])
## [1] -0.763508
# and the descriptives
require(Rmisc)
summarySE(data=df, "x", c("schooltype"))
##   schooltype   N           x       sd        se        ci
## 1          a 100  2.93823842 1.775213 0.1775213 0.3522409
## 2          b 100 -0.01171805 1.679583 0.1679583 0.3332657
## 3          c 100 -2.90880335 1.736970 0.1736970 0.3446526
summarySE(data=df, "y", c("schooltype"))
##   schooltype   N          y       sd        se        ci
## 1          a 100 11.1699977 4.427624 0.4427624 0.8785366
## 2          b 100 -0.0644219 2.941416 0.2941416 0.5836408
## 3          c 100 -9.2367355 4.755953 0.4755953 0.9436843
summarySE(data=df, "y")
##    .id   N         y       sd        se       ci
## 1 <NA> 300 0.6229468 9.312347 0.5376486 1.058055
# require(psych)
# psych::describeBy(df, group="schooltype")
# psych::describe(df)

# we fit a repeated measures anova
require(ez)
m.ez <- ezANOVA(data=data.frame(df), dv=y, wid=subj, within=schooltype, detailed=T, return_aov = T)
## Warning: Converting "subj" to factor for ANOVA.
m.ez
## $ANOVA
##        Effect DFn DFd        SSn      SSd          F            p p<.05
## 1 (Intercept)   1  99   116.4188 1561.714   7.380009 7.785155e-03     *
## 2  schooltype   2 198 20892.6094 3474.899 595.231235 1.815405e-84     *
##          ges
## 1 0.02259229
## 2 0.80575535
## 
## $`Mauchly's Test for Sphericity`
##       Effect         W         p p<.05
## 2 schooltype 0.9606459 0.1398308      
## 
## $`Sphericity Corrections`
##       Effect      GGe        p[GG] p[GG]<.05       HFe        p[HF] p[HF]<.05
## 2 schooltype 0.962136 2.224446e-81         * 0.9808178 6.660063e-83         *
## 
## $aov
## 
## Call:
## aov(formula = formula(aov_formula), data = data)
## 
## Grand Mean: 0.6229468
## 
## Stratum 1: subj
## 
## Terms:
##                 Residuals
## Sum of Squares   1561.714
## Deg. of Freedom        99
## 
## Residual standard error: 3.971761
## 
## Stratum 2: subj:schooltype
## 
## Terms:
##                 schooltype Residuals
## Sum of Squares   20892.609  3474.899
## Deg. of Freedom          2       198
## 
## Residual standard error: 4.189271
## Estimated effects may be unbalanced
require(nlme)
# for demonstration reasons we fit a level 1 only zero model
df.t <- cbind(df, 1)
names(df.t) <- c("subj", "schooltype", "g.eff", "x", "y", 'const')
m.00 <- lme(fixed = y ~ 1, random = ~ 1 | const, data=df.t, method='ML')
# fitted value is the same for all subjects
unique(predict(m.00))
## [1] 0.6229468
# correlation cannot be computed due to the missing variance in the predicted values

# zero multi level model
m.0 <- lme(fixed = y ~ 1, random = ~1 | schooltype, data=df, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
##   Data: df 
##        AIC      BIC    logLik
##   1724.644 1735.755 -859.3218
## 
## Random effects:
##  Formula: ~1 | schooltype
##         (Intercept) Residual
## StdDev:    8.335013 4.118045
## 
## Fixed effects:  y ~ 1 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 0.6229468  4.826142 297 0.1290776  0.8974
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -2.383275000 -0.669782463 -0.006185372  0.624697233  2.799833738 
## 
## Number of Observations: 300
## Number of Groups: 3
# get colors
g <- ggplot_build(pplot)
c.colors <- unique(g$data[[1]]["colour"])
# fitted value is the same for all subjects within a level2 group
unique(predict(m.0))
## [1] 11.14431494 -0.06274811 -9.21272654
# ... random intercept of schooltype 'explains' variance and reduces residual variance
pplot + 
  #geom_hline(yintercept=m.0$coefficients$random[['schooltype']][1], color=c1) 
  geom_hline(yintercept=m.0$coefficients$random[['schooltype']][1], color=c.colors[1,1]) +
  geom_hline(yintercept=m.0$coefficients$random[['schooltype']][2], color=c.colors[2,1]) +
  geom_hline(yintercept=m.0$coefficients$random[['schooltype']][3], color=c.colors[3,1])

# level 1 predictor, random intercept
m.1 <- lme(fixed = y ~ 1 + x, random = ~1 | schooltype, data=df, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
##   Data: df 
##        AIC      BIC    logLik
##   1725.689 1740.504 -858.8447
## 
## Random effects:
##  Formula: ~1 | schooltype
##         (Intercept) Residual
## StdDev:    8.012678 4.113069
## 
## Fixed effects:  y ~ 1 + x 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 0.6221491  4.647731 296 0.1338608  0.8936
## x           0.1350708  0.137988 296 0.9788598  0.3284
##  Correlation: 
##   (Intr)
## x 0     
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.43382547 -0.69250697 -0.01532247  0.56627952  2.71538500 
## 
## Number of Observations: 300
## Number of Groups: 3
pplot + 
  geom_abline(intercept = fixef(m.1)[1] + ranef(m.1)[1,1], slope = fixef(m.1)[2], color=c.colors[1,1]) + 
  geom_abline(intercept = fixef(m.1)[1] + ranef(m.1)[2,1], slope = fixef(m.1)[2], color=c.colors[2,1]) + 
  geom_abline(intercept = fixef(m.1)[1] + ranef(m.1)[3,1], slope = fixef(m.1)[2], color=c.colors[3,1]) + 
    geom_abline(intercept = fixef(m.1)[1], slope = fixef(m.1)[2], size = 2)

# we don't want to run into an optimization error ("iteration limit reached without convergence")
ctrl <- lmeControl(opt='optim');
# flow.lme <- lme(rate ~ nozzle, error= nozzle|operator, control=ctrl, data=Flow);
# level 2 variable, random intercept and random slope
m.2 <- lme(fixed = y ~ 1 + x, random = ~ 1 + x | schooltype, data=df, control=ctrl, method='ML')
summary(m.2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: df 
##        AIC      BIC    logLik
##   1527.571 1549.793 -757.7853
## 
## Random effects:
##  Formula: ~1 + x | schooltype
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 8.966983 (Intr)
## x           1.623816 1     
## Residual    2.923098       
## 
## Fixed effects:  y ~ 1 + x 
##                  Value Std.Error  DF    t-value p-value
## (Intercept) -3.1403554  5.198685 296 -0.6040673  0.5463
## x            0.1005276  0.944947 296  0.1063844  0.9153
##  Correlation: 
##   (Intr)
## x 0.996 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.4952414 -0.7107150  0.0242678  0.7058308  2.9059657 
## 
## Number of Observations: 300
## Number of Groups: 3
# we visualize the group specific random intercept and random slope
pplot + 
  geom_abline(intercept = fixef(m.2)[1] + ranef(m.2)[1,1], slope = fixef(m.2)[2] + ranef(m.2)[1,2], color=c.colors[1,1]) + 
  geom_abline(intercept = fixef(m.2)[1] + ranef(m.2)[2,1], slope = fixef(m.2)[2] + ranef(m.2)[2,2], color=c.colors[2,1]) + 
  geom_abline(intercept = fixef(m.2)[1] + ranef(m.2)[3,1], slope = fixef(m.2)[2] + ranef(m.2)[3,2], color=c.colors[3,1]) + 
    geom_abline(intercept = fixef(m.2)[1], slope = fixef(m.2)[2], size = 2)

# we compare the models using liklihood ratio test
anova(m.0, m.1, m.2)
##     Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## m.0     1  3 1724.643 1735.755 -859.3218                         
## m.1     2  4 1725.689 1740.505 -858.8447 1 vs 2   0.95419  0.3287
## m.2     3  6 1527.571 1549.793 -757.7853 2 vs 3 202.11867  <.0001
anova(m.0, m.2)
##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m.0     1  3 1724.643 1735.755 -859.3218                        
## m.2     2  6 1527.571 1549.793 -757.7853 1 vs 2 203.0729  <.0001
# we compare the explained variance of the various models
cor(df$y, predict(m.0))
## [1] 0.8976388
cor(df$y, predict(m.1))
## [1] 0.8978998
cor(df$y, predict(m.2))
## [1] 0.9498179
# ... the school-type specific random slope increases model fit a lot

# we compare the fixed effects
fixef(m.0)
## (Intercept) 
##   0.6229468
fixef(m.1)
## (Intercept)           x 
##   0.6221491   0.1350708
fixef(m.2)
## (Intercept)           x 
##  -3.1403554   0.1005276
# ... in m.1 and m.2 we see the fixed parameters (intercept and slope)
# ... for model m.2 this is the fixed part of the parameters, random part has to be added to this

# we compare the random effects
ranef(m.0)
##   (Intercept)
## a  10.5213682
## b  -0.6856949
## c  -9.8356733
ranef(m.1)
##   (Intercept)
## a   10.124301
## b   -0.683188
## c   -9.441113
ranef(m.2)
##   (Intercept)          x
## a    9.169431  1.6660904
## b    2.996174  0.5337865
## c  -12.165606 -2.1998769
# ... for every school-type we get a specific intercept
# ... random intercepts of m.1 are almost the same when we add level 1 predictor x
# ... we can see the level 2 random slope component to be added to the fixed slope
# ... we can see the big differences in random slope, due to the school-type

# coefficients() works also with mle result objects
# it returns the aggregated coefficients
# fixed and random parts are already added
coefficients(m.0)
##   (Intercept)
## a 11.14431494
## b -0.06274811
## c -9.21272654
coefficients(m.1)
##   (Intercept)         x
## a 10.74645018 0.1350708
## b -0.06103895 0.1350708
## c -8.81896400 0.1350708
coefficients(m.2)
##   (Intercept)          x
## a   6.0290760  1.7666181
## b  -0.1441811  0.6343141
## c -15.3059610 -2.0993493
# a look at the variances
VarCorr(m.0)
## schooltype = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 69.47245 8.335013
## Residual    16.95829 4.118045
VarCorr(m.1)
## schooltype = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 64.20301 8.012678
## Residual    16.91734 4.113069
VarCorr(m.2)
## schooltype = pdLogChol(1 + x) 
##             Variance  StdDev   Corr  
## (Intercept) 80.406782 8.966983 (Intr)
## x            2.636777 1.623816 1     
## Residual     8.544504 2.923098
# ... variances of residuals go down


# we look at ICC
VarCorr(m.0)
## schooltype = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 69.47245 8.335013
## Residual    16.95829 4.118045
var.u0 = as.numeric(VarCorr(m.0)[1])
var.epsilon = as.numeric(VarCorr(m.0)[2])
ICC = var.u0 / (var.u0 + var.epsilon)
ICC
## [1] 0.8037933
#require(psychometric)
#ICC1.lme(y, grp=schooltype, data=df)

Zur Interpretation der Ausgabe, speziell zu random StdDev post

sowie

post2

Example Kowles: Personality and School

example_personality_school

Verschiedene Herangehensweise an denselben Datensatz.

Beispiel Schulen von Torres Reyna

source

Variable Beschreibung
school Schule, 1…65
student Schüler-Nummer, 1 … 198, unter Schule genested, maximal 198 Schüler aus einer Schule
y Score (of what?) vielleicht Abschlussnote
X1 Reading test
x2 binär Female = 1
x3 Type of school Schulform:
# the data file comes in Stata format *.dta
require(foreign)
## Lade nötiges Paket: foreign
dd <- read.dta("http://md.psych.bio.uni-goettingen.de/mv/data/div/schools.dta")
attributes(dd)$var.labels
## [1] "School id"      "Student id"     "Score"          "Reading test"  
## [5] "Female=1"       "Type of school"
# we rename vars according to labels
require(dplyr)
dd <- dplyr::rename(dd, score=y, reading.test=x1, gender.f=x2, schooltype=x3)
# and make school and student factors
dd$school <- factor(dd$school)
dd$student <- factor(dd$student)
dd$gender <- factor(dd$gender)

require(ggplot2)
# get a plot of variability
ggplot(data=dd, aes(x=school, y=score)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# ... schools seem to be different in their score




# individual regressions
ggplot(data=dd, aes(x=reading.test, y=score, color=school)) + geom_point() + geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula 'y ~ x'

# or a bit clearer
ggplot(data=dd, aes(x=reading.test, y=score, color=school))                + geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula 'y ~ x'

# ... slopes seem to vary considerably


# global correlation
cor(dd$reading.test, dd$score)
## [1] 0.5916495
# school specific correlation
require(dplyr)
cors <- dd %>% dplyr::group_by(school) %>% dplyr::summarize(COR=cor(score,reading.test))
# we sort correlations
cors <- dplyr::arrange(cors, COR)
head(cors)
## # A tibble: 6 × 2
##   school     COR
##   <fct>    <dbl>
## 1 48     -1     
## 2 54      0.0357
## 3 7       0.270 
## 4 18      0.283 
## 5 10      0.336 
## 6 63      0.358
# something is strange with school 48, we take a look at it
dd[dd$school == 48,]
##      school student   score reading.test gender.f schooltype gender
## 3055     48       1 -6.9951      -3.7276        1 Girls only      1
## 3056     48       2 -1.2908      -4.5541        1 Girls only      1
# we exclude it, only two students took part
cors <- cors[2:length(cors$school),]
cors$o.nr <- 1:length(cors$school)
head(cors)
## # A tibble: 6 × 3
##   school    COR  o.nr
##   <fct>   <dbl> <int>
## 1 54     0.0357     1
## 2 7      0.270      2
## 3 18     0.283      3
## 4 10     0.336      4
## 5 63     0.358      5
## 6 24     0.366      6
tail(cors)
## # A tibble: 6 × 3
##   school   COR  o.nr
##   <fct>  <dbl> <int>
## 1 30     0.706    59
## 2 4      0.708    60
## 3 50     0.730    61
## 4 32     0.731    62
## 5 34     0.736    63
## 6 53     0.785    64
# a graph of the correlations in schools
ggplot(data=cors, aes(x=o.nr, y=COR)) + geom_point() + geom_smooth(se=F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# our correlations vary substancially




# we fit a simple anova
require(ez)
m.ez <- ezANOVA(data=dd, dv=score, wid=student, between=school, detailed=T, return_aov = T)
## Warning: The column supplied as the wid variable contains non-unique values
## across levels of the supplied between-Ss variables. Automatically fixing this by
## generating unique wid labels.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
m.ez
## $ANOVA
##   Effect DFn  DFd      SSn      SSd        F             p p<.05       ges
## 1 school  64 3994 66357.44 338582.9 12.23074 9.338734e-112     * 0.1638697
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn  DFd      SSn      SSd        F            p p<.05
## 1  64 3994 7518.794 118575.1 3.957152 2.200363e-23     *
## 
## $aov
## Call:
##    aov(formula = formula(aov_formula), data = data)
## 
## Terms:
##                   school Residuals
## Sum of Squares   66357.4  338582.9
## Deg. of Freedom       64      3994
## 
## Residual standard error: 9.207219
## Estimated effects may be unbalanced
summary(m.ez$aov)
##               Df Sum Sq Mean Sq F value Pr(>F)    
## school        64  66357  1036.8   12.23 <2e-16 ***
## Residuals   3994 338583    84.8                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ... shows a significant factor school. And ...
# multiple correlation
cor(m.ez[["aov"]]$fitted.values, dd$score)
## [1] -0.03577007
# squared multiple correlation
cor(m.ez[["aov"]]$fitted.values, dd$score)^2
## [1] 0.001279498
# ... not too much explained variance in score
rm(m.ez)

require(nlme)
# zero multi level model
m.0 <- lme(fixed = score ~ 1, random = ~1 | school, data=dd, method='ML')
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##     AIC      BIC   logLik
##   29709 29727.93 -14851.5
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept) Residual
## StdDev:     4.10655 9.207357
## 
## Fixed effects:  score ~ 1 
##                  Value Std.Error   DF    t-value p-value
## (Intercept) -0.1317103 0.5363377 3994 -0.2455735   0.806
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.94714971 -0.64855061  0.01168497  0.69918956  3.65785029 
## 
## Number of Observations: 4059
## Number of Groups: 65
# get colors
# g <- ggplot_build(pplot)
# c.colors <- unique(g$data[[1]]["colour"])

# fitted value is the same for all subjects within a level2 group
head(predict(m.0), n=100)
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        1        1        1        1        1        1        1 
## 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 4.680637 
##        1        2        2        2        2        2        2        2 
## 4.680637 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 
##        2        2        2        2        2        2        2        2 
## 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 
##        2        2        2        2        2        2        2        2 
## 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 7.164112 
##        2        2        2        2 
## 7.164112 7.164112 7.164112 7.164112
unique(predict(m.0))
##  [1]  4.6806368626  7.1641117492  7.7886860696  0.6843289745  3.5125920556
##  [6]  8.8794480695  3.6962759755 -0.4655025204 -3.8125809168 -2.4598620714
## [11]  5.1443700298 -0.6674877850 -2.2881692109  0.1485128656 -0.3917076145
## [16] -2.4109877309 -2.3651756403 -0.0643696476  1.8789986489  4.4059230567
## [21]  3.5496526677 -4.7254338739 -6.2736306171  0.0955018007 -5.7445284331
## [26] -3.6822445431 -2.9285276235 -8.0305067022  0.5891802575  2.9818667380
## [31] -2.1588729661 -3.3304637328  0.7157623990 -3.1294199488  0.7260163062
## [36] -2.3339162499 -5.4404178590 -2.6418102507  0.3343883874 -2.4337806121
## [41]  0.6626199905  0.1982296764  1.3535040525 -3.7159620819 -2.0711088565
## [46] -4.3203510148 -1.1595917088 -1.2733624538  0.4433082501 -3.0189551959
## [51] -2.5694542961  4.9174902526  9.3541940739 -3.9419503186  6.5158852595
## [56] -0.3181838079  0.0002605729  2.4658954630 -9.4899661148  1.8308870588
## [61] -0.4948790548  0.3460303496  6.2819733913  3.1576121006 -2.9121524697
# but means vary
unique(predict(m.0))[order(unique(predict(m.0)))]
##  [1] -9.4899661148 -8.0305067022 -6.2736306171 -5.7445284331 -5.4404178590
##  [6] -4.7254338739 -4.3203510148 -3.9419503186 -3.8125809168 -3.7159620819
## [11] -3.6822445431 -3.3304637328 -3.1294199488 -3.0189551959 -2.9285276235
## [16] -2.9121524697 -2.6418102507 -2.5694542961 -2.4598620714 -2.4337806121
## [21] -2.4109877309 -2.3651756403 -2.3339162499 -2.2881692109 -2.1588729661
## [26] -2.0711088565 -1.2733624538 -1.1595917088 -0.6674877850 -0.4948790548
## [31] -0.4655025204 -0.3917076145 -0.3181838079 -0.0643696476  0.0002605729
## [36]  0.0955018007  0.1485128656  0.1982296764  0.3343883874  0.3460303496
## [41]  0.4433082501  0.5891802575  0.6626199905  0.6843289745  0.7157623990
## [46]  0.7260163062  1.3535040525  1.8308870588  1.8789986489  2.4658954630
## [51]  2.9818667380  3.1576121006  3.5125920556  3.5496526677  3.6962759755
## [56]  4.4059230567  4.6806368626  4.9174902526  5.1443700298  6.2819733913
## [61]  6.5158852595  7.1641117492  7.7886860696  8.8794480695  9.3541940739
mean(dd$score[dd$school == 1])
## [1] 5.012035
# ... not surprising: each school has its own mean score which is the predicted value
ggplot(data=dd, aes(x=reading.test, y=predict(m.0), color=school)) + geom_point()

# level 1 predictor, random intercept
m.1 <- lme(fixed = score ~ 1 + reading.test, random = ~1 | school, data=dd, method='ML')
summary(m.1)
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##       AIC      BIC   logLik
##   28057.6 28082.83 -14024.8
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept) Residual
## StdDev:    3.035269 7.521481
## 
## Fixed effects:  score ~ 1 + reading.test 
##                  Value Std.Error   DF  t-value p-value
## (Intercept)  0.0238706 0.4003241 3993  0.05963  0.9525
## reading.test 0.5633697 0.0124684 3993 45.18366  0.0000
##  Correlation: 
##              (Intr)
## reading.test 0.008 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.7161717 -0.6304245  0.0286690  0.6844298  3.2680305 
## 
## Number of Observations: 4059
## Number of Groups: 65
ggplot(data=dd, aes(x=reading.test, y=predict(m.1), color=school)) + geom_point()

ggplot(data=dd, aes(x=predict(m.1), y=score, color=school)) + geom_point()

# level 2 variable, random intercept and random slope
#m.2 <- lme(fixed = score ~ 1 + reading.test, random = ~ 1 + reading.test | school, data=dd, control=ctrl, method='ML')
m.2 <- lme(fixed = score ~ 1 + reading.test, random = ~ 1 + reading.test | school, data=dd, method='ML')
summary(m.2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##        AIC      BIC    logLik
##   28021.23 28059.08 -14004.61
## 
## Random effects:
##  Formula: ~1 + reading.test | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##              StdDev    Corr  
## (Intercept)  3.0073339 (Intr)
## reading.test 0.1205752 0.497 
## Residual     7.4407756       
## 
## Fixed effects:  score ~ 1 + reading.test 
##                   Value Std.Error   DF   t-value p-value
## (Intercept)  -0.1150742 0.3979198 3993 -0.289189  0.7725
## reading.test  0.5567278 0.0199429 3993 27.916142  0.0000
##  Correlation: 
##              (Intr)
## reading.test 0.365 
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.83123290 -0.63247504  0.03404161  0.68320566  3.45617377 
## 
## Number of Observations: 4059
## Number of Groups: 65
ggplot(data=dd, aes(x=reading.test, y=predict(m.2), color=school)) + geom_point()

# we compare the models using liklihood ratio test
anova(m.0, m.1, m.2)
##     Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## m.0     1  3 29709.00 29727.93 -14851.50                         
## m.1     2  4 28057.60 28082.83 -14024.80 1 vs 2 1653.4061  <.0001
## m.2     3  6 28021.23 28059.08 -14004.61 2 vs 3   40.3722  <.0001
anova(m.0, m.2)
##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m.0     1  3 29709.00 29727.93 -14851.50                        
## m.2     2  6 28021.23 28059.08 -14004.61 1 vs 2 1693.778  <.0001
# we compare the explained variance of the various models
cor(dd$score, predict(m.0))
## [1] 0.4044815
cor(dd$score, predict(m.1))
## [1] 0.6641689
cor(dd$score, predict(m.2))
## [1] 0.6764199
# ... the school-type specific random slope increases model fit a lot

# we compare the fixed effects
fixef(m.0)
## (Intercept) 
##  -0.1317103
fixef(m.1)
##  (Intercept) reading.test 
##   0.02387062   0.56336973
fixef(m.2)
##  (Intercept) reading.test 
##   -0.1150742    0.5567278
# ... in m.1 and m.2 we see the fixed parameters (intercept and slope)
# ... for model m.2 this is the fixed part of the parameters, random part has to be added to this

# we compare the random effects
head(ranef(m.0))
##   (Intercept)
## 1   4.8123472
## 2   7.2958221
## 3   7.9203964
## 4   0.8160393
## 5   3.6443024
## 6   9.0111584
head(ranef(m.1))
##   (Intercept)
## 1   3.7375959
## 2   5.0204297
## 3   5.0388872
## 4   0.1813312
## 5   2.4043096
## 6   5.4139221
head(ranef(m.2))
##   (Intercept) reading.test
## 1   3.7492877   0.12497492
## 2   4.7020428   0.16472764
## 3   4.7978126   0.08084055
## 4   0.3501122   0.12720817
## 5   2.4627687   0.07205185
## 6   5.1840346   0.05859261
# ... for every school we get a specific intercept
# ... random intercepts of m.1 are almost the same when we add level 1 predictor x
# ... we can see the level 2 random slope component to be added to the fixed slope
# ... we can see the big differences in random slope, due to the school

# coefficients() works also with mle result objects
# it returns the aggregated coefficients
# fixed and random parts are already added
head(coefficients(m.0))
##   (Intercept)
## 1    4.680637
## 2    7.164112
## 3    7.788686
## 4    0.684329
## 5    3.512592
## 6    8.879448
head(coefficients(m.1))
##   (Intercept) reading.test
## 1   3.7614666    0.5633697
## 2   5.0443003    0.5633697
## 3   5.0627578    0.5633697
## 4   0.2052018    0.5633697
## 5   2.4281802    0.5633697
## 6   5.4377927    0.5633697
head(coefficients(m.2))
##   (Intercept) reading.test
## 1    3.634213    0.6817028
## 2    4.586969    0.7214555
## 3    4.682738    0.6375684
## 4    0.235038    0.6839360
## 5    2.347694    0.6287797
## 6    5.068960    0.6153205
# a look at the variances
VarCorr(m.0)
## school = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 16.86375 4.106550
## Residual    84.77542 9.207357
VarCorr(m.1)
## school = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept)  9.212858 3.035269
## Residual    56.572672 7.521481
VarCorr(m.2)
## school = pdLogChol(1 + reading.test) 
##              Variance    StdDev    Corr  
## (Intercept)   9.04405725 3.0073339 (Intr)
## reading.test  0.01453838 0.1205752 0.497 
## Residual     55.36514147 7.4407756
# ... variances of residuals go down


# we look at ICC
VarCorr(m.0)
## school = pdLogChol(1) 
##             Variance StdDev  
## (Intercept) 16.86375 4.106550
## Residual    84.77542 9.207357
var.u0 = as.numeric(VarCorr(m.0)[1])
var.epsilon = as.numeric(VarCorr(m.0)[2])
ICC = var.u0 / (var.u0 + var.epsilon)
ICC
## [1] 0.1659178
# ICC m.1
as.numeric(VarCorr(m.1)[1]) / (as.numeric(VarCorr(m.1)[1]) + as.numeric(VarCorr(m.1)[2]))
## [1] 0.1400438
# ICC m.2
as.numeric(VarCorr(m.2)[1]) / (as.numeric(VarCorr(m.2)[1]) + as.numeric(VarCorr(m.2)[2]))
## [1] 0.9983951
require(psychometric)
## Lade nötiges Paket: psychometric
## Lade nötiges Paket: multilevel
## Lade nötiges Paket: MASS
## 
## Attache Paket: 'MASS'
## Das folgende Objekt ist maskiert 'package:dplyr':
## 
##     select
## 
## Attache Paket: 'psychometric'
## Das folgende Objekt ist maskiert 'package:psych':
## 
##     alpha
## Das folgende Objekt ist maskiert 'package:ggplot2':
## 
##     alpha
ICC1.lme(score, grp=school, data=dd)
## [1] 0.1683408

Beispiel Xavier Fernández-i-Marín (not available any more)

source

Leider keine Dokumentation zu den Daten.

Inhalt

Die SES erlaubt die Messung und differenzierte Beschreibung der subjektiv wahrgenommenen Schmerzen. Der Fragebogen besteht aus 24 Items, die sich 5 Teilskalen zuordnen lassen. Zwei Skalen beschreiben affektive Merkmale der Schmerzempfindung: «allgemeine affektive Schmerzangabe» und «Schmerzangabe der Hartnäckigkeit». Beide Merkmale aufsummiert bilden die Globalskala «SES-affektiv». Die drei weiteren Skalen beschreiben sensorische Aspekte der Schmerzempfindung: «sensorische Schmerzangabe der Rhythmik», «sensorische Schmerzangabe des lokalen Eindringens» und «sensorische Schmerzangabe der Temperatur». Die drei Merkmale aufsummiert bilden die Globalskala «SES-sensorisch». Um die Ökonomie der Durchführung bei umfassenden Studien zu erhöhen, sind drei Survey-Versionen, getrennt nach dem jeweiligen Beurteilungszeitraum, beziehbar. Instruktion und Items sind auf nur einem DIN-A4-Bogen abgebildet.

### !/usr/local/bin/R

# Xavier Fernández-i-Marín
# December 2004
# http://xavier-fim.net/

# load the example datasets from HLM/AppendixA/Examples
# Download them directly. Requires Internet connection
hsb1 <- read.table(file="http://xavier-fim.net/R/scripts/hsb1.dat")
hsb2 <- read.table(file="http://xavier-fim.net/R/scripts/hsb2.dat")

# get the variables and basic information of each dataset
names(hsb1)
length(hsb1$ID)
summary(as.data.frame(hsb1))

names(hsb2)
length(hsb2$ID)
summary(as.data.frame(hsb2))

# load the package to do Linear Mixed-Effects Models
library(nlme)

# create a new dataset with the merged data
ds <- merge(as.data.frame(hsb1), as.data.frame(hsb2))

### model specification

# 1
mod.1 <- lme(fixed = MATHACH ~ 1,
             random = ~ 1 | ID,
           data=ds, method="ML"
           )

# 2
mod.2 <- lme(fixed = MATHACH ~ SES,
            random = ~ 1 | ID,
             data=ds, method="ML"
             )

# 3
mod.3 <- lme(fixed = MATHACH ~ SES + SECTOR,
            random = ~ 1 | ID,
             data=ds, method="ML"
             )

# 4
mod.4 <- lme(fixed = MATHACH ~ SES + SECTOR,
             random = ~ 1 + SES | ID,
             data=ds, method="ML"
            )

# 5
# todo: convergence error
# mod.5 <- lme(fixed = MATHACH ~ SES * SECTOR,
#             random = ~ 1 + SES | ID,
#             data=ds, method="ML"
#              )


# Extract information about the models
print(mod.1)      # basic information
summary(mod.1)    # complete information
summary(mod.2)
summary(mod.3)
summary(mod.4)
# summary(mod.5)

# attributes of an 'lmeObject'
mod.1$dims     # dimensions 
mod.1$coef     # coefficients
mod.1$call     # the formula used to produce the object
head(mod.1$residuals)     # residuals of the model
head(mod.1$varFix)   # matrix with the fixed effects
mod.1$sigma    # estimated within-group error standard deviation

anova(mod.1)


# Plots
plot(mod.1)
    # by default, the plot of an lme object is 
    # form=resid(., type = "p") ~ fitted(.)
    # that corresponds to the standardized residuals vs predicted values

plot(mod.1, resid(.) ~ fitted(.) | SECTOR)
    # idem, but comparing the sectors 

qqnorm(mod.1)     # normality of the residuals

interaction.plot(ds$SES, ds$SECTOR, ds$MATHACH, col=c("blue", "red"))
     # show one of the plotting abilities of R

# standardized residuals vs predicted values, by sector
plot(mod.3, resid(., type = "p") ~ fitted(.) | SECTOR, abline = 0)

# boxplots with the residuals by school 
#  (it's just an example, there are too many objects in level 2)
plot(mod.4, ID ~ resid(.))

Beispiel …

Quelle

dd = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13, 
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07, 
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score", 
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA, 
-24L))

# or using data.frame()
dd <- data.frame(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13, 2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07, 1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L))


# or more realistic
dd = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41, 
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01, 
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E", 
"F", "G", "H"), class = "factor"), Condition = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), 
    Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM", 
    "2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject", 
"Condition", "Time"), class = "data.frame", row.names = c(NA, 
-24L))


# a first plot
library(ggplot2)
qplot(Time, Score, data = dd, geom = "line", group = Subject, colour = factor(Condition))

# using mle

library(nlme)
m1 <- lme(Score ~ Condition + Time + Condition*Time, data = dd, random = ~ 1 | Subject)
summary(m1)
## Linear mixed-effects model fit by REML
##   Data: dd 
##       AIC      BIC   logLik
##   41.6678 48.79077 -12.8339
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept)  Residual
## StdDev:   0.2370475 0.3365843
## 
## Fixed effects:  Score ~ Condition + Time + Condition * Time 
##                        Value Std.Error DF   t-value p-value
## (Intercept)           1.7450 0.2058400 12  8.477457  0.0000
## ConditionYes          1.5600 0.2911018  6  5.358951  0.0017
## Time2PM               0.4925 0.2380010 12  2.069319  0.0608
## Time3PM               1.1500 0.2380010 12  4.831912  0.0004
## ConditionYes:Time2PM -0.2250 0.3365843 12 -0.668480  0.5165
## ConditionYes:Time3PM -0.3950 0.3365843 12 -1.173555  0.2633
##  Correlation: 
##                      (Intr) CndtnY Tim2PM Tim3PM CY:T2P
## ConditionYes         -0.707                            
## Time2PM              -0.578  0.409                     
## Time3PM              -0.578  0.409  0.500              
## ConditionYes:Time2PM  0.409 -0.578 -0.707 -0.354       
## ConditionYes:Time3PM  0.409 -0.578 -0.354 -0.707  0.500
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.4009179 -0.5396982  0.1391771  0.4146175  1.2662191 
## 
## Number of Observations: 24
## Number of Groups: 8
# The intercept variance is basically 0, indicating no within subject effect, so this model is not capturing the between time relationship well. A random intercept model is rarely the type of model you want for a repeated measures design. A random intercept model assumes that the correlations between all time points are equal. i.e. the correlation between time 1 and time 2 is the same as between time 1 and time 3. Under normal circumstances (perhaps not those generating your fake data) we would expect the later to be less than the former. An auto regressive structure is usually a better way to go.

### m2<-gls(Score ~ Condition + Time + Condition*Time, data = dd, correlation = corAR1(form = ~ Time | Subject))
### summary(m2)

# Your data is showing a -.596 between time point correlation, which seems odd. normally there should, at a minimum be a positive correlation between time points. How was this data generated?
# addendum:
# With your new data we know that the data generating process is equivalent to a random intercept model (though that is not the most realistic for a longitudinal study. The visualization shows that the effect of time seems to be fairly linear, so we should feel comfortable treating it as a numeric variable.

library(nlme)
m1 <- lme(Score ~ Condition + as.numeric(Time) + Condition*as.numeric(Time), data = dd, random = ~ 1 | Subject)
summary(m1)
## Linear mixed-effects model fit by REML
##   Data: dd 
##        AIC      BIC    logLik
##   38.15055 44.12494 -13.07527
## 
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept)  Residual
## StdDev:   0.2457354 0.3173421
## 
## Fixed effects:  Score ~ Condition + as.numeric(Time) + Condition * as.numeric(Time) 
##                                   Value Std.Error DF   t-value p-value
## (Intercept)                    1.142500 0.2717382 14  4.204415  0.0009
## ConditionYes                   1.748333 0.3842958  6  4.549447  0.0039
## as.numeric(Time)               0.575000 0.1121974 14  5.124898  0.0002
## ConditionYes:as.numeric(Time) -0.197500 0.1586710 14 -1.244714  0.2337
##  Correlation: 
##                               (Intr) CndtnY as.(T)
## ConditionYes                  -0.707              
## as.numeric(Time)              -0.826  0.584       
## ConditionYes:as.numeric(Time)  0.584 -0.826 -0.707
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -1.44560365 -0.65018589  0.01864076  0.52930927  1.40824841 
## 
## Number of Observations: 24
## Number of Groups: 8
# We see a significant Condition effect, indicating that the 'yes' condition tends to have higher scores (by about 1.7), and a significant time effect, indicating that both groups go up over time. Supporting the plot, we find no differential effect of time between the two groups (the interaction). i.e. the slopes are the same.


# (using lme4 library) This fits your subject effect as random and also the variable that your random effects are grouped under. In this model the random effect is the intercept varying by subject.
require(lme4)
m <- lmer( Score ~ Condition + Time + (1|Subject), data=dd )
# To see the random effects you can just use
ranef(m)
## $Subject
##   (Intercept)
## A -0.15956787
## B  0.22011131
## C  0.01282700
## D -0.32785811
## E  0.15033243
## F  0.01077468
## G -0.00359156
## H  0.09697212
## 
## with conditional variances for "Subject"
# As Ian Fellows mentioned, your data may also have random Condition and Time components. You can test that with another model. In the one below Condition, Time, and the intercept are allowed to vary randomly by subject. It also evaluates their correlations.
### mi <- lmer( Score ~ Condition + Time + (Condition + Time|Subject), data=dd )
### summary(mi)
### ranef(mi)
# You could also test for this without correlations with the intercept, with interactions between Condition and Time, and numerous other models to see which best fits your data and / or your theory. Your question is a bit vague but these few commands should get you started.
# Note that Subject is your grouping factor so it's what you fit other effects as random under. It's not something you then explicitly fit as a predictor as well.

Beispiel

Quelle

require(tidyverse)
require(foreign)
dd <- read.dta("https://stats.idre.ucla.edu/stat/stata/examples/mlm_ma_hox/popular.dta")
write_csv(dd, "popular.csv")
# ... see code and explanation in answers

ICC in surgery example

library(tidyverse)
library(nlme)
dd <- read_delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/cosmetic_surgery.dat", "\t")
## Rows: 276 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): Surgery_Text, Reason_Text, Gender_Text
## dbl (9): particnu, Post_QoL, Base_QoL, Clinic, Surgery, Reason, Age, Gender,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# or
dd <- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/div/cosmetic_surgery.dat")
## Rows: 276 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): Surgery_Text, Reason_Text, Gender_Text
## dbl (9): particnu, Post_QoL, Base_QoL, Clinic, Surgery, Reason, Age, Gender,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dd$Gender <- factor(dd$Gender, levels = c(0, 1), labels = c("f", "m"))
dd$Surgery_Text <- factor(dd$Surgery_Text)
dd$Reason_Text <- factor(dd$Reason_Text)
dd$Gender_Text <- factor(dd$Gender_Text)

# model without interaction 
m.test <- nlme::lme(Post_QoL ~ Surgery,
                      data = dd,
                      random = ~Surgery | Clinic,
                      method = "ML")

# we look at ICC
VarCorr(m.test)
## Clinic = pdLogChol(Surgery) 
##             Variance StdDev   Corr  
## (Intercept) 74.75235 8.645944 (Intr)
## Surgery     65.02213 8.063630 -0.947
## Residual    37.22614 6.101323
var.clinic = as.numeric(VarCorr(m.test)[1])
var.surgery = as.numeric(VarCorr(m.test)[2])
var.residual = as.numeric(VarCorr(m.test)[3])
cat("ICC clinic: ", var.clinic/(var.clinic + var.surgery + var.residual))
## ICC clinic:  0.4223282
cat("ICC surgery: ", var.surgery/(var.clinic + var.surgery + var.residual))
## ICC surgery:  0.3673554
# package psychometric offers a ICC calculation
require(psychometric)
cat("ICC clinic by package psychometric: ", ICC1.lme(Post_QoL, grp=Clinic, data=dd))
## ICC clinic by package psychometric:  0.4269201
m.interaction <- update(m.test, .~. + Surgery:Reason)
m.interaction <- nlme::lme(Post_QoL ~ Surgery + Surgery:Reason,
                      data = dd,
                      random = ~Surgery | Clinic,
                      method = "ML")

# we look at ICC, they only refer to random effects ...
VarCorr(m.interaction)
## Clinic = pdLogChol(Surgery) 
##             Variance StdDev   Corr  
## (Intercept) 74.75147 8.645893 (Intr)
## Surgery     63.92678 7.995422 -0.944
## Residual    37.16650 6.096433
var.clinic = as.numeric(VarCorr(m.interaction)[1])
var.surgery = as.numeric(VarCorr(m.interaction)[2])
var.residual = as.numeric(VarCorr(m.interaction)[3])
cat("ICC clinic: ", var.clinic/(var.clinic + var.surgery + var.residual))
## ICC clinic:  0.4250992
cat("ICC surgery: ", var.surgery/(var.clinic + var.surgery + var.residual))
## ICC surgery:  0.363541
cat("Random effect surgery binds slightly less variance compared to m.test.\n")
## Random effect surgery binds slightly less variance compared to m.test.
cat("because the fixed interaction effect Surgery:Reason 'explains' part of the surgery variance.")
## because the fixed interaction effect Surgery:Reason 'explains' part of the surgery variance.
# the additional fixed effect 


fixef(m.test)
## (Intercept)     Surgery 
##   59.387461    0.367755
head(ranef(m.test))
##   (Intercept)   Surgery
## 1    9.451884 -8.540927
## 2   14.369838 -9.301715
## 3    5.463431 -7.091419
## 4    8.579993 -9.552092
## 5   -0.332467 -1.424968
## 6   -3.013192  5.380795

Example

Matthew Crump PSYC 7709: Using R for Reproducible Research source

Check

  • We present library(nlme) here, alternatively we might use library(lme4) in combination with lmerTest()

  • nlme::gls() vs. nlme::lme() non random effect models vs. random effect models

  • level specific variables: values are the same for all observations of a specific level

  • nlme::gls(..., method = "ML") and nlme:lme(..., method = "ML") useful for comparison of non MLM and MLM via anova()

  • Formulae 2 parts: fixed and random effects have their own formula random: | separats random effects from the definition of the level structure

  • 3 or more levels, separated by “/”, the more right runs faster, lme(... | country/city )

  • fixef() und ranef() fixed effects: coefficients are equal for all levels (random) random effects: coefficients for every level additional to the fixed effect (corrects the fixed effect)

  • modeling random effects binds (explains) variance in the outcome and thus reduces the residuals due to level specific coefficients, but we don’t get level specific significance tests

  • including a variable as fixed effect reveals in a significance test for the coefficient (for all dummy variables in categorial predictors)

  • variables that define levels have to be categorial (usually of type factor)

  • it does not make sense to use level defining variables as explanatory variables (or even reaction variables)

  • we can model dependencies using the corr parameter this is especially relevant for time series (growth models) examlple above

  • we can use intervals(model) to get informations on explanatory variables in form of confidence intervals, a variable might be relevant, if 0 is not included in the interval

  • we can use ICC to get an idea whether a categorial variable might serve as level defining variable

  • Vergleich Äquivalenz WW und random effects (passiert differenziert im sheet zu MLM)

  • predict ohne die originalen Daten, die den (Random-)Koeffizienten zugrunde liegen, basieren die Vorhersagen bei MLM nur auf den fixed Effects

Variablen in Beispiel unten: #Beispiel_Schule,_Abschlussnote_und_Schuleingangstest-Niveau

library(tidyverse)
library(nlme)
# data come from library(mlmRev)
dd <- mlmRev::Exam
# normexam  Normalized exam score.
# standLRT  Standardised LR test score (London Reading Test - a reading test taken when they were 11 years old)

# categorial predictor school
mg.0 <- gls(normexam ~ 1,      data=dd, method="ML")
mg.1 <- gls(normexam ~ school, data=dd, method="ML")
car::Anova(mg.1, type=3)
## Analysis of Deviance Table (Type III tests)
## 
## Response: normexam
##             Df   Chisq Pr(>Chisq)    
## (Intercept)  1  21.632  3.303e-06 ***
## school      64 782.768  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# additional continuous predictor reading test: standLRT
mg.2 <- gls(normexam ~ school + standLRT, data=dd, method="ML", correlation=NULL)
car::Anova(mg.2, type=3)
## Analysis of Deviance Table (Type III tests)
## 
## Response: normexam
##             Df    Chisq Pr(>Chisq)    
## (Intercept)  1   21.495  3.547e-06 ***
## school      64  659.759  < 2.2e-16 ***
## standLRT     1 1992.549  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(mg.2)

# null model
m.0 <- lme(fixed = normexam ~ 1,
       random = ~ 1 | school,
       data=dd,
       method="ML")
summary(m.0)
## Linear mixed-effects model fit by maximum likelihood
##   Data: dd 
##        AIC      BIC    logLik
##   11016.65 11035.58 -5505.324
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept)  Residual
## StdDev:   0.4106567 0.9207391
## 
## Fixed effects:  normexam ~ 1 
##                   Value  Std.Error   DF    t-value p-value
## (Intercept) -0.01316707 0.05363399 3994 -0.2454986  0.8061
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.94711015 -0.64856012  0.01168106  0.69918450  3.65782427 
## 
## Number of Observations: 4059
## Number of Groups: 65
nlme::intervals(m.0)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                  lower        est.      upper
## (Intercept) -0.1183067 -0.01316707 0.09197253
## 
##  Random Effects:
##   Level: school 
##                     lower      est.     upper
## sd((Intercept)) 0.3393035 0.4106567 0.4970149
## 
##  Within-group standard error:
##     lower      est.     upper 
## 0.9007837 0.9207391 0.9411366
anova(mg.0, m.0)
##      Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## mg.0     1  2 11513.37 11525.98 -5754.683                        
## m.0      2  3 11016.65 11035.58 -5505.324 1 vs 2 498.7168  <.0001
# MLM 
m.1 <- lme(fixed = normexam ~ standLRT,
       random = ~ 1 | school,
       data=dd,
       method="ML")
m.2 <- lme(fixed = normexam ~ standLRT,
       random = ~ standLRT | school,
       data=dd,
       method="ML")

# visualization {}todo
dd.s <- dd %>% dplyr::filter(school %in% c(1,9))
p_m.1 <- predict(m.1)
p_m.2 <- predict(m.2)

dd.s %>% ggplot(aes(x=standLRT, y=normexam, color = school)) +
  geom_point() + 
  # geom_point(data = dd.s, aes(color=school))
  # geom_line(data=predict(m.1)) +
  # geom_line(data=predict(m.2))
  ggtitle("School exit")

# fixed effects coefficients
fixef(m.2)
## (Intercept)    standLRT 
## -0.01150385  0.55672932
# random effects coefficients, head, because we have them for every second level observation (school)
head(ranef(m.2))
##   (Intercept)   standLRT
## 1  0.37492945 0.12497525
## 2  0.47020445 0.16472812
## 3  0.47978037 0.08084307
## 4  0.03500977 0.12720641
## 5  0.24627676 0.07205334
## 6  0.51840503 0.05859459
# confidence intervals
nlme::intervals(m.2)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                   lower        est.      upper
## (Intercept) -0.08949896 -0.01150385 0.06649125
## standLRT     0.51763969  0.55672932 0.59581894
## 
##  Random Effects:
##   Level: school 
##                                lower      est.     upper
## sd((Intercept))           0.24660747 0.3007325 0.3667368
## sd(standLRT)              0.08856276 0.1205756 0.1641601
## cor((Intercept),standLRT) 0.15216356 0.4972754 0.7343044
## 
##  Within-group standard error:
##     lower      est.     upper 
## 0.7278074 0.7440805 0.7607174
# log. likelhood deviance
m.2$logLik
## [1] -4658.435
-2*m.2$logLik
## [1] 9316.871
deviance(m.2) # -2ll   only available with method="ML" in lme()
## 'log Lik.' 9316.871 (df=6)
# a look at the variances
VarCorr(m.0)
## school = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept) 0.1686389 0.4106567
## Residual    0.8477605 0.9207391
VarCorr(m.1)
## school = pdLogChol(1) 
##             Variance   StdDev   
## (Intercept) 0.09212927 0.3035280
## Residual    0.56573100 0.7521509
VarCorr(m.2)
## school = pdLogChol(standLRT) 
##             Variance   StdDev    Corr  
## (Intercept) 0.09044004 0.3007325 (Intr)
## standLRT    0.01453847 0.1205756 0.497 
## Residual    0.55365575 0.7440805
# ... variances of residuals go down



# using ICC to decide on levels
# we look at variance decoposition: proportion that variance of interest holds of total variance
# school
as.numeric(VarCorr(m.0)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC schools as level 2 on random intercepts
## [1] 0.1659179
# schgend (mixed, boys, girls)
mm <- lme(fixed = normexam ~ 1, random = ~ 1 | schgend, data=dd, method="ML")
as.numeric(VarCorr(mm)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC schgend (boys, girls, mixed) as level 2 on random intercepts
## [1] 0.009612604
# vr (verbal reasoning bottom 25%, mid 50%, and top 25%.)
mm <- lme(fixed = normexam ~ 1, random = ~ 1 | vr, data=dd, method="ML")
as.numeric(VarCorr(mm)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC vr (schools level in verbal reasoning at intake) as level 2 on random intercepts
## [1] 0.09377866
# sex (F or M)
mm <- lme(fixed = normexam ~ 1, random = ~ 1 | sex, data=dd, method="ML")
as.numeric(VarCorr(mm)['(Intercept)',1]) / sum(as.numeric(VarCorr(m.0)[,1])) # ICC pupils gender as level 2 on random intercepts
## [1] 0.01293193
# require(psychometric)
# school on level 2
psychometric::ICC1.lme(normexam, grp=school, data=dd)
## [1] 0.1683409
# ... about the same as above

# predict is possible
head(predict(m.2))
##          1          1          1          1          1          1 
##  0.7854411  0.5037219 -0.5668121  0.5037219  0.6164097  1.8559749
# ... and therefore some manual check of "explained" variance
cor(predict(m.2), dd$normexam)
## [1] 0.6764198
cor(predict(m.2), dd$normexam)^2
## [1] 0.4575438
# using library(rsq) which takes into account the model used ...
rsq::rsq(m.2, adj=F) # same as rsq::rsq(m.2)
## $model
## [1] 0.4491454
## 
## $fixed
## [1] 0.348489
## 
## $random
## [1] 0.1006564
rsq::rsq(m.2, adj=T)
## $model
## [1] 0.4490096
## 
## $fixed
## [1] 0.3483284
## 
## $random
## [1] 0.1006812
# some conceptual stuff
# including a categorial explanatory variable to have a continuous (normexam) and a categorial (vr) explanatory variable
# vr: Student level Verbal Reasoning (VR) score band at intake - a factor. Levels are bottom 25%, mid 50%, and top 25%.
m.f <- lme(fixed = normexam ~ standLRT + vr,
       random = ~ 1 | school,
       data=dd,
       method="ML")
m.fr <- lme(fixed = normexam ~ standLRT + vr,
       random = ~ standLRT + vr | school,
       data=dd,
       method="ML")
# ... in m.fr we have slope for continuous standLRT for every school which corrects fixed slope for standLRT
# ... we also have an intercept for categorial vr which corrects the fixed intercept for every school. As vr has 3 levels, we deal with two dummy variables.

# fixed effects coefficients (parameters) are the same for all observations (level 1)
fixef(m.f)
## (Intercept)    standLRT   vrmid 50%   vrtop 25% 
##  -0.1349020   0.5603018   0.1021208   0.3051488
fixef(m.fr)
## (Intercept)    standLRT   vrmid 50%   vrtop 25% 
## -0.08053503  0.55318447  0.04612312  0.17180483
# random effects coefficients (parameters) are valid for a single unit of a level, here for every school
# they "correct" the fixed effects
# categorial random variables correct the fixed intercept, correct the level
# continuous random variables correct the fixed slope for their unit
head(ranef(m.f))
##   (Intercept)
## 1  0.40137258
## 2  0.34621804
## 3  0.34890046
## 4  0.05040893
## 5  0.09574633
## 6  0.38267936
head(ranef(m.fr))
##   (Intercept)   standLRT  vrmid 50%   vrtop 25%
## 1   0.3264611 0.11793019  0.0509535 -0.09911205
## 2   0.3285482 0.15448034 -0.2665773  0.06084519
## 3   0.2423554 0.06471981 -0.1822759  0.16236549
## 4   0.2317168 0.14145486 -0.1886661 -0.04171440
## 5   0.1311447 0.05712286 -0.1050833  0.03512109
## 6   0.2355876 0.04322645 -0.1714415  0.20480701
# we have as many random coefficients as units in level 2 (schools) 
nrow(ranef(m.fr))
## [1] 65

check parms

Unter https://md.psych.bio.uni-goettingen.de/mv/unit/parms/parms.html#Some_points_to_remember_regarding_MLM_and_an_example findet sich eine Sammlung von beachtenswerten Punkten zu MLM.

Screencast

  • MLM - check - data structure and the concept of level variables StudIP - ownCloud

  • MLM - check - running lme() and looking at the output, also model tests StudIP - ownCloud

  • MLM - check - ICC, it helps to decide on which levels should be entered in the model StudIP - ownCloud

  • MLM - check - something more conceptual, fixef and ranef in action StudIP - ownCloud

Version: 07 Juli, 2022 10:12