parms: Rmd

Some Considerations on Parameters Rmd

In statistics we try to estimate characteristics of observations on base of other characteristics of them. For this we use models. Models are mathematical formulae that use parameters. We adjust the values of parameters by fitting models to samples of observations. We can use models including its parameters to estimate a characteristic of new observations.

In other words, we try to find parameters of mathematical models, that describe outcome variable(s) of a sample in a way, that we can reduce differences between observations outcome variable and the estimated outcome variable (residuals). The scale level of the variables in play define the models we use.

Parameters in statistical models are useful:

We estimate them in a sample of observations. The parameters are estimated to fit our model to the given data in the best way possible.
We apply parameters to estimate unknown reaction varialble values in new observations (with known explanatory variables).
We visualize statistical effects using parameters.

This unit concentrates on parameters of statistical models, where to find them in R output. We show how to use them to predict reaction variable values for new observations on base of their explanatory variable values. We also show how to use them to create graphical visualizations of the model and its parameters.

Where to find parameters in R models result objects

When models are adapted using lm() we find the parameters in the models result object. Suppose we have a result object called ro we can find the parameters via coefficients(ro)

In MLM (multi level models) adapted via gls() or lme() and a result object called ro we find the parameters for fixed effects via: fixef(ro) and parameters for random effects via ranef(ro). coef(ro) works for lme4() result objects.

Find examples for that below.

Examples of models and their parameters

** nothing.Rmd **

No information to predict a outcome variable

We have a continuous reaction variable - no explanatory variable(s)

The reaction variable, the one we measure, is interval scaled. What can we do, if we want to predict the reaction variable for a new observation? Which predicted value could be as close as possible to the real value, the observation has? Theoretically, we could use any value in the range of the observed reaction values, or even values outside that range. Without knowing anything else, we might use the mean of the observed variable (reaction variable). By this we characterize our sample and estimate the value for a new observation, the sample is representative of.

Here, we show the usage of the minimal value of the observed values as the “predicted” value compared with the arithmetic mean. This might even be useful, f. e. if we want to assure a minimal level of a whole group, like minimal distance to swim for a surfing course. Which one “explains” more variance or leaves us with less “unexplained” variance?

dd.a <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_dat.rds")))
# lets see the reaction variable
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

set.seed(43)
to_use <- sample(1:nrow(dd.a))
#dd <- dd[sample(dd$subj, size=300), ]
dd <- dd.a[to_use, ]
head(dd[to_use, ])

## # A tibble: 6 x 10
##   subj   gender grp   level   time     rv    cv    ev1     ev2    ev3
##   <chr>  <chr>  <chr> <chr>   <chr> <dbl> <dbl>  <dbl>   <dbl>  <dbl>
## 1 0i1_2  m      0i1   normal  tf2   0.847 0.309  0.107  0.133   0.199
## 2 0cg_7  f      0cg   normal  tf1   0.858 0.226  0.146 -0.938   0.445
## 3 ei2_10 m      ei2   extreme t-1   1.19  0.917  0.733  1.14    1.20 
## 4 0i1_1  f      0i1   normal  tf2   0.830 0.115  0.640 -0.0901  0.235
## 5 ecg_4  m      ecg   extreme t-1   0.694 0.169 -0.424 -0.165   1.37 
## 6 0cg_1  f      0cg   normal  t1    0.745 0.252 -0.322 -0.886  -0.425

dd$xx <- 1:nrow(dd)

# all observations in one "Group"
dd %>% ggplot(aes(x='G', y=rv)) + geom_point()

# with jitter
dd %>% ggplot(aes(x='1', y=rv)) + geom_jitter(width=0.05)

dd %>% ggplot(aes(x=xx, y=rv)) + geom_point()

# pseudo var as accumulated distances from minimal value
# to calculate a variation index to be explained, we use the distance to the minimum value of our reaction variable
min(dd$rv)

## [1] -0.2191001

dd$diff.a <- dd$rv - min(dd$rv)
# mean
# comparison: reduction of unexplained variance
# we calculate the sd of our artificial difference index
sd.a <- sqrt(sum(dd$diff.a ^ 2) / length(dd$diff.a))
# we see a reduction
cat(sd.a, " > ", sd(dd$rv))

## 0.8653931  >  0.2780206

# we could conclude, that arithmetic mean is a better estimator than the minimum value of our reaction variable
# to predict our reaction variable for new observations
# ... but this comparison is sort of arbitrary 

# in statisctics we therefore use a constant explanatory variable 
# to compare the introduction of more parameters in the complexer model
# we call this often our NULL-model
# in model formula this is defined by an explanatory variable that is constant, usually 1
m.0 <- lm(dd$rv ~ 1)
# model.matrix(result object) shows us, what we us to predict our reaction variable. Here for the first 6 observations.
head(model.matrix(m.0))

##   (Intercept)
## 1           1
## 2           1
## 3           1
## 4           1
## 5           1
## 6           1

# the estimated parameter is
coefficients(m.0)

## (Intercept) 
##   0.6005489

# residuals are the distance of our observations from their extimated value, the error we make by estimating their value in the reaction variable
head(m.0$residuals)

##           1           2           3           4           5           6 
## -0.04526138 -0.08977260  0.05382642 -0.28609654  0.08695575 -0.28606443

# their sd is
sd(m.0$residuals)

## [1] 0.2780206

# ... standard deviation of the residuals (unexplained variation of reaction variable) is the same as above
cat(sd.a, " > ", sd(m.0$residuals))

## 0.8653931  >  0.2780206

# we plot that pseudo reduction to visualize the effect
library(tidyverse)
psych::describe(m.0$residuals)

##    vars   n mean   sd median trimmed  mad   min  max range skew kurtosis   se
## X1    1 360    0 0.28  -0.01   -0.01 0.22 -0.82 0.88   1.7 0.36     1.09 0.01

dd.g <- tibble(
  group = as.factor(c('a', 'm.0a', 'm.0')),
    mm = c(0, 0, mean(dd$rv)),
  se = c(sum(sqrt(dd$diff.a ^ 2)) / length(dd$diff.a), sd(m.0$residuals), sd(m.0$residuals))
)


  
ggplot(dd.g, aes(x=group)) +
  geom_point(data=dd.g, stat='identity', aes(group, y=mm, size=10), show.legend=F, alpha = 0.4) +
    geom_errorbar(data=dd.g, stat='identity', aes(x=group, ymax = mm + se, ymin = mm - se), width = 0.2)

(pp <- ggplot(dd, aes(x=xx, y=rv)) +
  geom_point(alpha = 0.4) +
  geom_point(data=dd.g, stat='identity', aes(x=(as.numeric(group) - 1) * nrow(dd)/3, y=mm, size=10, color = group), show.legend=F) +
  geom_errorbar(data=dd.g, stat='identity', aes(x=(as.numeric(group) - 1) * nrow(dd)/3, y=se, ymax = mm + se, ymin = mm - se, color = group), size = 1, width = 20, show.legend=F)
)

pp + 
    geom_segment(aes(x = 20, y = 0, xend = 20, yend = max(dd$rv)), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[1]) +
    geom_segment(aes(x = 30, y = 0, xend = 30, yend = min(dd$rv)), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[1]) +
        geom_segment(aes(x = 40, y = 0, xend = 40, yend = mean(dd$rv)), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[1]) +

    geom_segment(aes(x = 170, y = mean(dd$rv), xend = 170, yend = max(dd$rv)), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[2]) +
    geom_segment(aes(x = 180, y = mean(dd$rv), xend = 180, yend = min(dd$rv)), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[2]) +
        geom_segment(aes(x = 190, y = mean(dd$rv), xend = 190, yend = mean(dd$rv)), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[2]) +

  geom_hline(aes(yintercept=min(dd$rv)), size=0.3, alpha=0.3, show.legend=F) +
  #geom_text(aes(nrow(dd),min(dd$rv),label = 'min', vjust = -1)) +
  geom_hline(aes(yintercept=max(dd$rv)), size=0.3, alpha=0.3, show.legend=F)

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

## Warning: Use of `dd$rv` is discouraged. Use `rv` instead.

  #geom_text(aes(nrow(dd) - 10, max(dd$rv) - 0.02,label = 'max', vjust = 1))



# two groups, each with "prediction", we fake data to see the effect
dd.t <- dd %>% dplyr::mutate(rvm = if_else(gender == "m", rv - 0.5, rv))
dd.t %>% dplyr::group_by(gender) %>% dplyr::select(rvm) %>% dplyr::summarize(m.gender = mean(rvm)) -> mm

## Adding missing grouping variables: `gender`

(pp <- ggplot(dd.t, aes(x=gender, y=rvm, color=gender)) +
  geom_jitter(alpha = 0.3, width=0.1) +
  stat_summary(fun = mean, geom="point", size=3) +
  # stat_summary(fun = mean, geom="hline", size=0.2) +
  stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width=0.1, size=1)
  )

pp + 
  geom_segment(aes(x = 1, y = mm$m.gender[1], xend = 0, yend = mm$m.gender[1]), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[1]) +
  geom_segment(aes(x = 2, y = mm$m.gender[2], xend = 0, yend = mm$m.gender[2]), arrow = arrow(length = unit(0.5, "cm")), color = scales::hue_pal()(2)[2])

Conclusion: Without any information to predict a reaction value of a new observation, we would best use the mean of the observed values of our sample. We can consider this mean a parameter, that helps us for this prediction. Estimating the mean for new observation would minimize the average deviation (distance) among all observations.

Version: 06 Juli, 2021 14:38

** oneway.Rmd **

Nominal explanatory variable: We know subgroups, our observations belong to

If group membership is the only information, we can use this to describe our observations better in means of unexplained variance in our reaction variable (DV-value-differences). Parameters reflect differences between subgroup means. Group membership is a nominal variable.

dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_data.rds")))

# grand mean
cat("grand mean: ", mean(dd$rv))

## grand mean:  47.55762

# subgroup means and their difference
psych::describeBy(dd$rv, group=dd$pers)

## 
##  Descriptive statistics by group 
## group: hi
##    vars   n  mean    sd median trimmed   mad min   max range  skew kurtosis
## X1    1 150 45.43 18.19   49.9   46.32 18.19 4.2 77.75 73.55 -0.43    -0.78
##      se
## X1 1.49
## ------------------------------------------------------------ 
## group: lo
##    vars   n  mean    sd median trimmed   mad  min   max range skew kurtosis  se
## X1    1 150 49.68 11.02  49.85   49.84 10.46 9.84 75.72 65.88 -0.3     0.38 0.9

cat("difference is: ", mean(dd$rv[dd$pers == "hi"]), " - ", mean(dd$rv[dd$pers == "lo"]), " = ", mean(dd$rv[dd$pers == "hi"]) - mean(dd$rv[dd$pers == "lo"]))

## difference is:  45.43242  -  49.68281  =  -4.250385

# test the relevance of this difference
# MLM to reflect this
m.0 <- nlme::gls(rv ~ 1, data=dd, method = "ML")
# we see one parameter in m.0$coefficients, 
# which is the mean of dd$rv as calculated above
m.0$coefficients

## (Intercept) 
##    47.55762

m.1 <- nlme::gls(rv ~ pers, data=dd, method = "ML")
# m.1 gives us 2 parameters:
m.1$coefficients

## (Intercept)      perslo 
##   45.432423    4.250385

# we can check, whether using group membership "pers" can explain a significant part of our unexplained variance in dd$rv
anova(m.0, m.1)

##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m.0     1  2 2485.641 2493.049 -1240.821                        
## m.1     2  3 2481.668 2492.780 -1237.834 1 vs 2 5.972617  0.0145

# T-Test does basically the same
t.test(rv ~ pers, data = dd)

## 
##  Welch Two Sample t-test
## 
## data:  rv by pers
## t = -2.4479, df = 245.34, p-value = 0.01507
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.6704072 -0.8303621
## sample estimates:
## mean in group hi mean in group lo 
##         45.43242         49.68281

# ANOVA also
car::Anova(lm(rv ~ pers, data = dd))

## Anova Table (Type II tests)
## 
## Response: rv
##           Sum Sq  Df F value  Pr(>F)  
## pers        1355   1  5.9923 0.01495 *
## Residuals  67382 298                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# parameter 1
cat("parameter 1 is the mean of the reference group pers == 'hi' ", m.1$coefficients[[1]])

## parameter 1 is the mean of the reference group pers == 'hi'  45.43242

cat("parameter 2 is the difference between mean of the reference group pers == 'hi' the mean of group pers == 'lo'  ", m.1$coefficients[[2]] )

## parameter 2 is the difference between mean of the reference group pers == 'hi' the mean of group pers == 'lo'   4.250385

# a plot to visualize that
library(tidyverse)
se <- function(x) sqrt(var(x)/length(x))
dd.g <- tibble(
  group = as.factor(c("grand_mean", unique(dd$pers))),
    mm = c(mean(dd$rv), m.1$coefficients[[1]], m.1$coefficients[[1]] + m.1$coefficients[[2]]),
  se = c(se(dd$rv), se(dd$rv[dd$pers == "lo"]), se(dd$rv[dd$pers == "hi"]))
)
  
ggplot(dd.g, aes(x=group)) +
  geom_point(data=dd.g, stat='identity', aes(group, y=mm, size=10), show.legend=F) +
    geom_errorbar(data=dd.g, stat='identity', aes(x=group, ymax = mm + se, ymin = mm - se), width = 0.2) +
  geom_point(stat='identity', aes(x = 2.35, y=m.1$coefficients[[1]], size=10, color = "red"), show.legend=F) +
  geom_segment(aes(x = 2.5, y = m.1$coefficients[[1]], xend = 2.5, yend = m.1$coefficients[[1]] + m.1$coefficients[[2]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F)

# cat("unexplained variance Null-model: ", var(m.0$residuals), " => ", var(m.0$residuals), "/", var(m.0$residuals), " = ", var(m.0$residuals)/var(m.0$residuals)*100, "%")
cat("unexplained variance Null-model: ", var(m.0$residuals))

## unexplained variance Null-model:  229.8895

# cat("unexplained variance model 1: ", var(m.1$residuals), " => ", (1 - var(m.1$residuals)/var(dd$rv)) * 100, "%")
cat("unexplained variance model 1: ", var(m.1$residuals))

## unexplained variance model 1:  225.3579

Version: 06 Juli, 2021 14:39

** factorial.Rmd **

Descrete explanatory variables (nominal scaled), more than one

We might have more than one group membership we can use to predict reaction variable values in new observations. So we can use each membership dimension and moreover their combinations.

{} visualze simpler models

dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_data.rds")))
dd$f.pers  <- factor(dd$pers,  levels = c("lo", "hi"))
dd$f.group <- factor(dd$group, levels = c("cc", "tl", "th"))

# grand mean
cat("grand mean: ", mean(dd$rv))

## grand mean:  47.55762

# subgroup means and their difference in explanatory grouping variable pers (personality)
psych::describeBy(dd$rv, group=dd$f.pers)

## 
##  Descriptive statistics by group 
## group: lo
##    vars   n  mean    sd median trimmed   mad  min   max range skew kurtosis  se
## X1    1 150 49.68 11.02  49.85   49.84 10.46 9.84 75.72 65.88 -0.3     0.38 0.9
## ------------------------------------------------------------ 
## group: hi
##    vars   n  mean    sd median trimmed   mad min   max range  skew kurtosis
## X1    1 150 45.43 18.19   49.9   46.32 18.19 4.2 77.75 73.55 -0.43    -0.78
##      se
## X1 1.49

psych::describeBy(dd$rv, group=dd$f.group)

## 
##  Descriptive statistics by group 
## group: cc
##    vars   n  mean    sd median trimmed   mad  min   max range  skew kurtosis
## X1    1 100 51.22 13.41  50.36   51.23 13.38 9.84 77.75 67.91 -0.17    -0.11
##      se
## X1 1.34
## ------------------------------------------------------------ 
## group: tl
##    vars   n  mean    sd median trimmed   mad min   max range  skew kurtosis
## X1    1 100 38.36 17.41  38.85   38.42 21.24 4.2 72.12 67.93 -0.07    -1.02
##      se
## X1 1.74
## ------------------------------------------------------------ 
## group: th
##    vars   n  mean   sd median trimmed  mad   min   max range  skew kurtosis  se
## X1    1 100 53.09 9.03  53.56   53.32 9.39 32.74 70.78 38.04 -0.18    -0.57 0.9

# we adapt increasingly complex models
# first we have models with fixed effects only
m.0 <- nlme::gls(rv ~ 1, data=dd, method = "ML")
m.0$coefficients

## (Intercept) 
##    47.55762

m.p <- nlme::gls(rv ~ f.pers, data=dd, method = "ML")
m.g <- nlme::gls(rv ~ f.group, data=dd, method = "ML")
# two factorial model
m.p.g <- nlme::gls(rv ~ f.pers + f.group, data=dd, method = "ML")
m.p.g$coefficients

## (Intercept)    f.pershi   f.grouptl   f.groupth 
##   53.348099   -4.250385  -12.866574    1.870700

# now including interaction
m.pg <- nlme::gls(rv ~ f.pers * f.group, data=dd, method = "ML")
m.pg$coefficients

##        (Intercept)           f.pershi          f.grouptl          f.groupth 
##          43.343315          15.759184           9.069837           9.948642 
## f.pershi:f.grouptl f.pershi:f.groupth 
##         -43.872821         -16.155885

# we shuffle dd for clearer graphs
library(tidyverse)
set.seed(43)
dd$r.id <- sample(dd$id)
head(dd$id)

## [1] 1 2 3 4 5 6

dd <- dd[order(dd$r.id),]
head(dd$id)

## [1] 156 205 210 228  67 261

# show the mean of the reference group pers=lo, group=cc of model m.pg
dd %>% ggplot(aes(x=r.id, y=rv, color=f.group, shape=f.pers)) + geom_point() + geom_hline(yintercept=range(m.pg$coefficients[1]), color='coral')

dodge <- position_dodge(width = 0.9)
# dd %>% ggplot(aes(x=group, y=rv, group=f.pers, color=f.group, shape=f.pers)) +
#         geom_bar(position=dodge, stat="identity") +
#         # geom_errorbar(aes(ymax=mean+sd, ymin=mean-sd), position = dodge, width = 0.25) +
#         geom_errorbar(position = dodge, width = 0.25) 
#       #facet_grid(. ~ group)

#dd %>% ggplot(aes(cut, price, fill = color)) +
dd %>% ggplot(aes(x=f.group, y=rv, group=f.pers, color=f.group, fill=f.pers)) +
  stat_summary(geom = "bar", fun = mean, position = "dodge") +
  stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge", width = 0.25)

dd %>% ggplot(aes(x=f.group, y=rv, group=f.pers, color=f.pers, fill=f.group)) +
  stat_summary(geom = "bar", fun = mean, position = "dodge") +
  stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge", width = 0.25)

xnu = c(-0.3, -0.3, 0, 0, 0.3, 0.3)
sed <- Rmisc::summarySE(dd, measurevar="rv", groupvars=c("f.pers","f.group"))
(g.o <- dd %>% ggplot(aes(x=f.pers, y=rv, group=f.group, color=f.pers, fill=f.group)) +
  stat_summary(geom = "bar", fun = mean, position = "dodge", alpha = 0.2) +
  #stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge2", padding = 0.3, width = 0.25)
  stat_summary(geom = "errorbar", fun.data = mean_se, position=position_nudge(x = xnu, y = 0), width = 0.1)
  # geom_bar(fun.y = "mean", position = "dodge2") +
  # geom_errorbar(position = "dodge2")
  #stat_summary(geom = "errorbar", fun.data = mean_se, width = 0.25)
  )

xx <- c(1, 1, 1, 2, 2, 2) + xnu[c(1, 3, 5, 2, 4, 6)]
x.sh <- 0.05
a.size = 1
eff.col <- c("black", "blue", "red", "violet", "olivedrab4", "green")
# Arrows that show the effects
g.o +
  geom_hline(yintercept = m.pg$coefficients[1]) +
  # three arrows that visualize main effect f.pershi
  geom_segment(aes(x = xx[4]-x.sh, y = m.pg$coefficients[1], xend = xx[4]-x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[2]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[2]) +
  geom_segment(aes(x = xx[5]-x.sh, y = m.pg$coefficients[1], xend = xx[5]-x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[2]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[2], size=a.size) +
  geom_segment(aes(x = xx[6]-x.sh, y = m.pg$coefficients[1], xend = xx[6]-x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[2]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[2], size=a.size) +
  # arrows that visualize dummy main effect f.grouptl
  geom_segment(aes(x = xx[2]-x.sh, y = m.pg$coefficients[1], xend = xx[2]-x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[3]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[3], size=a.size) +
  geom_segment(aes(x = xx[5]-x.sh, y = m.pg$coefficients[1] + m.pg$coefficients[2], xend = xx[5]-x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[2] + m.pg$coefficients[3]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[3], size=a.size) +
  # arrows that visualize dummy main effect f.groupth
  geom_segment(aes(x = xx[3]-x.sh, y = m.pg$coefficients[1], xend = xx[3]-x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[4]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[4], size=a.size) +
  geom_segment(aes(x = xx[6]-x.sh, y = m.pg$coefficients[1] + m.pg$coefficients[2], xend = xx[6]-x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[2] + m.pg$coefficients[4]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[4], size=a.size) +
  # arrow to visualize interaction effect f.pershi:f.grouptl
  geom_segment(aes(x = xx[5]+x.sh, y = m.pg$coefficients[1] + m.pg$coefficients[2] + m.pg$coefficients[3], xend = xx[5]+x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[2] + m.pg$coefficients[3] + m.pg$coefficients[5]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[5], size=a.size) +
  # arrow to visualize interaction effect f.pershi:f.groupth
  geom_segment(aes(x = xx[6]+x.sh, y = m.pg$coefficients[1] + m.pg$coefficients[2] + m.pg$coefficients[4], xend = xx[6]+x.sh, yend = m.pg$coefficients[1] + m.pg$coefficients[2] + m.pg$coefficients[4] + m.pg$coefficients[6]), arrow = arrow(length = unit(0.2, "cm")), color = eff.col[6], size=a.size)

Version: 06 Juli, 2021 14:39

** simple.Rmd **

Continuous explanatory variable (interval scaled)

With a continuous explanatory variable we can use more information to describe our samples reaction variable. We can use the grade to which reaction variable and explanatory variable are interconnected. So our estimation can be graphically represented by a line. We estimate a rv value by looking, where the ev value of an observation hits this line.

dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_dat.rds")))
# we use simple regression to estimate the reaction variable on base of our explanatory variable
m.0 <- lm(rv ~ 1, data = dd)
m.1 <- lm(rv ~ ev1, data = dd)
head(model.matrix(m.1))

##   (Intercept)        ev1
## 1           1  0.2098109
## 2           1 -0.5972289
## 3           1 -0.2936077
## 4           1  0.1657957
## 5           1  0.1467773
## 6           1  0.2862212

# expressed as MLM without a level variable
m.0 <- nlme::gls(rv ~ 1, data=dd, method = "ML")
m.1 <- nlme::gls(rv ~ ev1, data=dd, method = "ML")
# parameters
m.0$coefficients

## (Intercept) 
##   0.6005489

m.1$coefficients

## (Intercept)         ev1 
##   0.6341041   0.5520612

# with gls() objects fixef() and ranef() don't work

# we could have used the mean of the reaction variable
# but using regression leaves way less unexplained variance of the reaction variable

# graphs to visualize the parameters meaning
ggplot(dd, aes(x=ev1, y=rv)) +
  geom_point() +
  geom_smooth(method = lm, se=F) + 
  # xlim(c(0, 10)) + 
  # ylim(c(0, 100)) +
  geom_hline(yintercept=mean(dd$rv), color = "blue", linetype="dashed") + 
  geom_abline(intercept = m.1$coefficients[[1]], slope = m.1$coefficients[[2]], color = "blue", linetype="dashed") +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = m.1$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) + 
  geom_segment(aes(x = 0, y = m.1$coefficients[[1]], xend = 1, yend = m.1$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.01, "cm")), show.legend=F) +
  geom_segment(aes(x = 1, y = m.1$coefficients[[1]], xend = 1, yend = m.1$coefficients[[1]] + m.1$coefficients[[2]]), color = "red", arrow = arrow(length = unit(0.1, "cm")), show.legend=F)

## `geom_smooth()` using formula 'y ~ x'

cat("unexplained variance Null-model: ", var(m.0$residuals))

## unexplained variance Null-model:  0.07729544

cat("unexplained variance model 1: ", var(m.1$residuals))

## unexplained variance model 1:  0.04122013

# or expressed in log Likelihood, another measure of quality of fit
logLik(m.0)

## 'log Lik.' -49.49552 (df=2)

logLik(m.1)

## 'log Lik.' 63.67198 (df=3)

# ... the better model is the one with less likelihood, in our case the one, that is less negative (m.1)
anova(m.0, m.1)

##     Model df      AIC       BIC    logLik   Test L.Ratio p-value
## m.0     1  2  102.991  110.7632 -49.49552                       
## m.1     2  3 -121.344 -109.6856  63.67198 1 vs 2 226.335  <.0001

#cat("unexplained variance Null-model: ", var(m.0$residuals), " => ", var(m.0$residuals), "/", var(m.0$residuals), " = ", var(m.0$residuals)/var(m.0$residuals)*100,)
#cat("unexplained variance model 1: ", var(m.1$residuals), " => ", (1 - var(m.1$residuals)/var(dd$rv)) * 100, "%")

dn <- tibble(ev1 = c(0.1, 5.1, 9.6))
# zero model predicts the same reaction variable values for all new observations
predict(m.0, dn)

## [1] 0.6005489 0.6005489 0.6005489
## attr(,"label")
## [1] "Predicted values"

# model m.1 predicts different reaction variable values due to the relation of explanatory and reaction variable
predict(m.1, dn)

## [1] 0.6893102 3.4496164 5.9338919
## attr(,"label")
## [1] "Predicted values"

dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_data.rds")))
# we use simple regression to estimate the reaction variable on base of our explanatory variable
m.0 <- lm(rv ~ 1, data = dd)
m.1 <- lm(rv ~ ev, data = dd)
head(model.matrix(m.1))

##   (Intercept)         ev
## 1           1  0.9765073
## 2           1  1.7265529
## 3           1 -0.9915356
## 4           1  2.7613048
## 5           1  2.1553962
## 6           1  2.1970925

# expressed as MLM
m.0 <- nlme::gls(rv ~ 1, data=dd, method = "ML")
m.1 <- nlme::gls(rv ~ ev, data=dd, method = "ML")
# parameters
m.0$coefficients

## (Intercept) 
##    47.55762

m.1$coefficients

## (Intercept)          ev 
##   31.481874    3.126868

# with gls() objects fixef() and ranef() don't work

# we could have used the mean of the reaction variable
# but using regression leaves way less unexplained variance of the reaction variable

# graphs to visualize the parameters meaning
library(ggplot2)
ggplot(dd, aes(x=ev, y=rv)) +
  geom_point() +
  geom_smooth(method = lm, se=F) + 
  xlim(c(0, 10)) + 
  ylim(c(0, 100)) +
  geom_hline(yintercept=mean(dd$rv), color = "blue", linetype="dashed") + 
  geom_abline(intercept = m.1$coefficients[[1]], slope = m.1$coefficients[[2]], color = "blue", linetype="dashed") +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = m.1$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) + 
  geom_segment(aes(x = 0, y = m.1$coefficients[[1]], xend = 1, yend = m.1$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.01, "cm")), show.legend=F) +
  geom_segment(aes(x = 1, y = m.1$coefficients[[1]], xend = 1, yend = m.1$coefficients[[1]] + m.1$coefficients[[2]]), color = "red", arrow = arrow(length = unit(0.1, "cm")), show.legend=F)

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 4 rows containing non-finite values (stat_smooth).

## Warning: Removed 4 rows containing missing values (geom_point).

cat("unexplained variance Null-model: ", var(m.0$residuals))

## unexplained variance Null-model:  229.8895

cat("unexplained variance model 1: ", var(m.1$residuals))

## unexplained variance model 1:  169.205

# or expressed in log Likelihood, another measure of quality of fit
logLik(m.0)

## 'log Lik.' -1240.821 (df=2)

logLik(m.1)

## 'log Lik.' -1194.847 (df=3)

# ... the better model is the one with less likelihood, in our case the one, that is less negative (m.1)
anova(m.0, m.1)

##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m.0     1  2 2485.641 2493.049 -1240.821                        
## m.1     2  3 2395.695 2406.806 -1194.847 1 vs 2 91.94629  <.0001

#cat("unexplained variance Null-model: ", var(m.0$residuals), " => ", var(m.0$residuals), "/", var(m.0$residuals), " = ", var(m.0$residuals)/var(m.0$residuals)*100,)
#cat("unexplained variance model 1: ", var(m.1$residuals), " => ", (1 - var(m.1$residuals)/var(dd$rv)) * 100, "%")

dn <- tibble(ev = c(0.1, 5.1, 9.6))
# zero model predicts the same reaction variable values for all new observations
predict(m.0, dn)

## [1] 47.55762 47.55762 47.55762
## attr(,"label")
## [1] "Predicted values"

# model m.1 predicts different reaction variable values due to the relation of explanatory and reaction variable
predict(m.1, dn)

## [1] 31.79456 47.42890 61.49981
## attr(,"label")
## [1] "Predicted values"

Conclusion: In our null-model the predicted value of a new observation is always the same, independent of the explanatory variable and is the mean of our reaction variable.

In model m.1 we predict the reaction variable of a new observation, using the value of the observations explanatory variable. I. e. the predicted value depends on the value of the explanatory variable.

Version: 06 Juli, 2021 14:39

** cov.Rmd **

We have both, a nominal scaled and a interval scaled explanatory variable

This is sort of a combination of the above.

library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

# hue_pal()(5)
# show_col(hue_pal()(5))
dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_data.rds")))
dd$group.f <- factor(dd$group, levels = c("cc", "tl", "th")) 
# using lm()
m.0  <- lm(rv ~ 1, data = dd)
m.1a <- lm(rv ~ group.f, data = dd)
m.1  <- lm(rv ~ ev + group.f, data = dd)
# expressed as MLM
m.0 <- nlme::gls(rv ~ 1, data=dd, method = "ML")
m.1 <- nlme::gls(rv ~ ev + group.f, data=dd, method = "ML")
# parameters
m.0$coefficients

## (Intercept) 
##    47.55762

m.1$coefficients

## (Intercept)          ev   group.ftl   group.fth 
##   36.810259    2.632425   -9.689687    1.330574

# graphical representation of our parameters
se <- function(x) sqrt(var(x)/length(x))
dd.g <- tibble(
  group = unique(dd$group.f),
  cols = scales::hue_pal()(length(unique(dd$group.f))),
    mm = c(m.1$coefficients[[1]], m.1$coefficients[[1]] + m.1$coefficients[[3]], m.1$coefficients[[1]] + m.1$coefficients[[4]]),
  mc = c(mean(dd$rv[dd$group.f == "cc"]), mean(dd$rv[dd$group.f == "tl"]), mean(dd$rv[dd$group.f == "th"])),
  se = c(se(dd$rv[dd$group.f == "cc"]), se(dd$rv[dd$group.f == "tl"]), se(dd$rv[dd$group.f == "th"])),
  slope = rep(m.1$coefficients[[2]], length(unique(dd$group.f))),
)

ggplot(dd, aes(x=ev, y=rv, group=group.f, color = group.f)) +
  geom_point() +
  geom_abline(slope=dd.g$slope, intercept = dd.g$mm, color=dd.g$cols, show.legend=F) +
  geom_hline(yintercept = dd.g$mc, color=dd.g$cols, linetype="dashed", show.legend=F)

# comparison of unexplained variance
cat("unexplained variance Null-model: ", var(m.0$residuals))

## unexplained variance Null-model:  229.8895

cat("unexplained variance model 1: ", var(m.1$residuals))

## unexplained variance model 1:  146.5187

# significance test
anova(m.0, m.1)

##     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## m.0     1  2 2485.641 2493.049 -1240.821                        
## m.1     2  5 2356.507 2375.026 -1173.254 1 vs 2 135.1336  <.0001

# prediction
dn <- tibble(ev = c(0.1, 5.1, 5.1, 5.1,  9.6), group.f = factor(c("tl", "tl", "cc", "th", "th"), levels = c("cc", "tl", "th")))
# zero model predicts the same reaction variable values for all new observations
predict(m.0, dn)

## [1] 47.55762 47.55762 47.55762 47.55762 47.55762
## attr(,"label")
## [1] "Predicted values"

# in m.1a only group membership matters
predict(m.1a, dn)

##        1        2        3        4        5 
## 38.35633 38.35633 51.22291 53.09361 53.09361

# model m.1 predicts different reaction variable values in the same group due to the relation of explanatory and reaction variable
predict(m.1, dn)

## [1] 27.38381 40.54594 50.23562 51.56620 63.41211
## attr(,"label")
## [1] "Predicted values"

The dashed lines would be the estimated rv values for the differnt groups ignoring the influence of ev. The solid lines take into account the influence of ev over the membership in group.f. We can see, that the estimated rv value depends on the value of ev.

Note: The slope is constant, i. e. equal for all members of group.f. The reason is, that ev is regarded as a fixed effect in the model formula. Estimated values are influenced by group membership group.f and value of ev.

Version: 06 Juli, 2021 14:39

Several interval scaled explanatory variables

This is alled multiple regression. Additive combination of explanatory variables results in a prediction plane. Adding interaction by multiplying the explanatory variables results in a twisted prediction plane.

dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_data.rds")))
# dd <- readRDS("../parms/parms_data.rds") # {}
library(tidyverse)
m.e_h <- lm(rv ~ ev + evh, data=dd)
rockchalk::plotPlane(model = m.e_h, plotx1 = "ev", plotx2 = "evh", theta = 40, phi = 10)
ggplot(dd, aes(x=ev, y=rv)) + 
  stat_smooth(method = "lm", se=FALSE) + 
  geom_point(aes(x=dd$evh), alpha = 0.1)  + geom_point(size=3) +
  # geom_hline(yintercept=m.e_h$coefficients[1], color = "red", linetype="dashed") +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = m.e_h$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_segment(aes(x = 0, y = m.e_h$coefficients[[1]], xend = 10, yend = m.e_h$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_segment(aes(x = 10, y = m.e_h$coefficients[[1]], xend = 10, yend = m.e_h$coefficients[[1]] + m.e_h$coefficients[[2]] * 10), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_abline(intercept = m.e_h$coefficients[[1]], slope = m.e_h$coefficients[[2]], color = "red") + 
  geom_abline(intercept = m.e_h$coefficients[[1]] + m.e_h$coefficients[[3]] * 10, 
              slope = m.e_h$coefficients[[2]], color = "red", linetype = "dashed") 
  
ggplot(dd, aes(x=evh, y=rv)) + 
  stat_smooth(method = "lm", se=FALSE) + 
  geom_point(aes(x=dd$ev), alpha = 0.1)  + geom_point(size=3) +
  # geom_hline(yintercept=m.e_h$coefficients[1], color = "red", linetype="dashed") +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = m.e_h$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_segment(aes(x = 0, y = m.e_h$coefficients[[1]], xend = 10, yend = m.e_h$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_segment(aes(x = 10, y = m.e_h$coefficients[[1]], xend = 10, yend = m.e_h$coefficients[[1]] + m.e_h$coefficients[[3]] * 10), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_abline(intercept = m.e_h$coefficients[[1]], slope = m.e_h$coefficients[[3]], color = "red") + 
  geom_abline(intercept = m.e_h$coefficients[[1]] + m.e_h$coefficients[[2]] * 10, 
              slope = m.e_h$coefficients[[3]], color = "red", linetype = "dashed") 




# interaction included
m.eh <- lm(rv ~ ev * evh, data=dd)
# 3d plot
rockchalk::plotPlane(model = m.eh, plotx1 = "ev", plotx2 = "evh", theta = 40, phi = 10)
rockchalk::plotPlane(model = m.eh, plotx1 = "ev", plotx2 = "evh", theta = 10, phi = 10)

# 2d ggplot does not make a lot of sense to visualize a twisted plane ...

ggplot(dd, aes(x=evh, y=rv)) + 
  stat_smooth(method = "lm", se=FALSE) + 
  geom_point(aes(x=dd$ev), alpha = 0.1)  + geom_point(size=3) +
  # geom_hline(yintercept=m.e_h$coefficients[1], color = "red", linetype="dashed") +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = m.eh$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_segment(aes(x = 0, y = m.eh$coefficients[[1]], xend = 10, yend = m.eh$coefficients[[1]]), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_segment(aes(x = 10, y = m.eh$coefficients[[1]], xend = 10, yend = m.eh$coefficients[[1]] + m.eh$coefficients[[3]] * 10), color = "red", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  #geom_abline(intercept = m.eh$coefficients[[1]], slope = m.eh$coefficients[[3]], color = "red") + 
  geom_abline(intercept = m.eh$coefficients[[1]] + m.eh$coefficients[[2]] * 10, 
              slope = m.eh$coefficients[[3]], color = "red", linetype = "dashed") +
  # interaction effect
  geom_segment(aes(x = 10.2, y = m.eh$coefficients[[1]] + m.eh$coefficients[[3]] * 10, 
                   xend = 10.2, yend = m.eh$coefficients[[1]] + m.eh$coefficients[[3]] * 10 + m.eh$coefficients[[4]] * 10), 
               color = "green", arrow = arrow(length = unit(0.5, "cm")), show.legend=F) +
  geom_abline(intercept = m.eh$coefficients[[1]], 
              slope = m.eh$coefficients[[3]] + m.eh$coefficients[[4]], color = "green", linetype = "dashed")
  


dd$f.group <- factor(dd$group, levels = c("cc", "tl", "th")) 
library("plot3D")
scatter3D(dd$ev, dd$rv, dd$evh, colvar = dd$group)
scatter3D(dd$ev, dd$evh, dd$rv, bty = "b2", colkey = FALSE, ticktype = "detailed",
          xlab = "ev",
          ylab ="evh", 
          zlab = "rv",
          colvar = dd$rv,
          theta = 50, phi = 30)

#fit <- lm(z ~ x + y)
# predict values on regular xy grid
grid.lines = 10
x.pred <- seq(min(dd$ev), max(dd$ev), length.out = grid.lines)
y.pred <- seq(min(dd$evh), max(dd$evh), length.out = grid.lines)
xy <- expand.grid( ev = x.pred, evh = y.pred)
z.pred <- matrix(predict(m.eh, newdata = xy), 
                 nrow = grid.lines, ncol = grid.lines)
# fitted points for droplines to surface
fitpoints <- predict(m.eh)
# scatter plot with regression plane
scatter3D(dd$ev, dd$evh, dd$rv, 
    theta = 80, phi = 10, ticktype = "detailed",
    xlab = "ev", ylab = "evh", zlab = "rv",  
    surf = list(x = x.pred, y = y.pred, z = z.pred,  
    facets = NA, fit = fitpoints), main = "virt")

scatter3D(dd$ev, dd$evh, dd$rv, pch = 18, cex = 2, 
    theta = 80, phi = 10, ticktype = "detailed",
    xlab = "ev", ylab = "evh", zlab = "rv",  
    surf = list(x = x.pred, y = y.pred, z = z.pred,  
    facets = NA, fit = fitpoints), main = "virt")


x0 <- c(0, 0, 10)
y0 <- c(0, 0, 0)
z0 <- c(0, m.eh$coefficients[[1]], m.eh$coefficients[[1]])
x1 <- c(0, 10, 10)
y1 <- c(0, 0,  0)
z1 <- c(m.eh$coefficients[[1]], m.eh$coefficients[[1]], m.eh$coefficients[[1]] + m.eh$coefficients[[2]] * 10)
# cols <- c("#1B9E77", "#D95F02", "#7570B3", "#E7298A")
cols <- c("black", "blue", "red")
arrows3D(x0, y0, z0, x1, y1, z1, colvar = x1^2, col = cols,
         lwd = 2, d = 3, add=FALSE)

arrows3D(x0, y0, z0, x1, y1, z1, colvar = x1^2, col = cols,
         lwd = 2, d = 3, add=TRUE)


scatter3D(x, y, z, ..., colvar = z, col = NULL, add = FALSE)
text3D(x, y, z, labels, colvar = NULL, add = FALSE)
points3D(x, y, z, ...)
lines3D(x, y, z, ...)
scatter2D(x, y, colvar = NULL, col = NULL, add = FALSE)
text2D(x, y, labels, colvar = NULL, col = NULL, add = FALSE)


m.e_h$coefficients
m.eh$coefficients


m.e_l <- lm(rv ~ ev + evl, data=dd)

We have several of both, nominal scaled and interval scaled explanatory variables

We have data in nested groups (MLM)

** example_school.Rmd **

Example school

Real data from

source

data source

hsb A subset of data from the 1982 High School and Beyond survey used
as examples for HLM software

Format
A data frame with 7,185 observations on the following 8 variables.
schid a numeric vector, 160 unique values
mathach a numeric vector for the performance on a standardized math assessment
female a numeric vector coded 0 for male and 1 for female
ses a numeric measure of student socio-economic status
minority a numeric vector coded 0 for white and 1 for non-white students
schtype a numeric vector coded 0 for public and 1 for private schools
meanses a numeric, the average SES for each school in the data set
size a numeric for the number of students in the school

schid: school id minority: female: level 1 predictor ses: level 1 predictor, ?socioeconomic status mathach: mathematics achievement, level 1 outcome variable schtype: level 2 predictor meanses: level 2 predictor

mathach is our reaction variable

# we load library merTools that includes dataset hsb
library(merTools)

## Loading required package: arm

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loading required package: lme4

## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car

## 
## arm (Version 1.11-2, built: 2020-7-27)

## Working directory is /Users/pzezula/ownCloud/lehre_ss_2021/unit/parms

## 
## Attaching package: 'arm'

## The following object is masked from 'package:scales':
## 
##     rescale

# we define dd, a data object containing the data of hsb, just for convenience. 
# We could have used hsb also, a data object that exists in the merTools namespace
dd <- hsb
# we take a look at our reaction variable using describe() of library(psych)
psych::describe(dd$mathach - min(dd$mathach))

##    vars    n  mean   sd median trimmed  mad min   max range  skew kurtosis   se
## X1    1 7185 15.58 6.88  15.96   15.75 8.12   0 27.82 27.82 -0.18    -0.92 0.08

# we inspect the nominal scaled variables and their cominations
unique(dd[,c("minority", "female", "schtype")])

##     minority female schtype
## 1          0      1       0
## 3          0      0       0
## 18         1      0       0
## 25         1      1       0
## 121        0      0       1
## 123        1      0       1
## 141        0      1       1
## 142        1      1       1

# hm, this score seems to qualify the better the achievement the higher the score
# ... but what means a negative score?
# anyway
# to calculate a variation index to be explained, we use the distance to 0
dd$diff.a <- dd$mathach - min(dd$mathach)
var.a <- sum(dd$diff.a ^ 2) / length(dd$diff.a)
sd.a <- sqrt(var.a)
var.mean <- var(dd$mathach)
sd.mean <- sqrt(var.mean)

# we see a reduction
cat(sd.a, " > ", sd.mean)

## 17.03043  >  6.878246

# we could conclude, that arithmetic mean is a better estimator than 0 to predict our reaction variable in new observations
# ... but this comparison is sort of arbitrarily, as there is no accepted way of a value to calculate the individual diff 

# in statisctics we therefore use the usual variance/sd and the arithmetic mean to compare the introduction of more parameters in the model
# we call this often our NULL-model
# in model formula this is defined by a explanatory variable that is constant, usually 1
m.0 <- lm(dd$mathach ~ 1)
sd(m.1$residuals)

## [1] 12.10449

# ... standard deviation of the residuals (unexplained variation of reaction variable) is the same as above
cat(sd.a, " > ", sd(m.0$residuals))

## 17.03043  >  6.878246

# we plot that pseudo reduction
library(tidyverse)
psych::describe(m.1$residuals)

##    vars   n mean   sd median trimmed   mad    min   max range skew kurtosis  se
## X1    1 300    0 12.1  -0.79   -0.15 13.73 -35.07 30.03  65.1 0.06    -0.39 0.7

#tibble(sd.a = sqrt(dd$diff.a ^ 2), sd.0 = sqrt(m.0$residuals ^ 2)) %>% 
se <- function(x) sqrt(var(x)/length(x))
tibble(d.sd.a = dd$diff.a, d.sd.0 = m.0$residuals) -> dd.g0
#tibble(sd.a = sqrt(dd$diff.a ^ 2), sd.0 = m.0$residuals) %>% 

dd.g <- tibble(
  group = as.factor(c('a', 'm.0')),
    mm = c(0, 0),
  se = c(sum(sqrt(dd$diff.a ^ 2)) / length(dd$diff.a), sd(m.0$residuals))
)
  
ggplot(dd.g, aes(x=group)) +
  geom_point(data=dd.g, stat='identity', aes(group, y=mm, size=10), show.legend=F) +
    geom_errorbar(data=dd.g, stat='identity', aes(x=group, ymax = mm + se, ymin = mm - se), width = 0.2)

Version: 06 Juli, 2021 14:39

Some points to remember regarding MLM and an example

Rmd

we have fixed and random effects
fixed effects work equal (fix) for all observations, random effects work specific due to level-group membership
the way effects work and can be interpreted depend on the scale-level of the variables
we concentrate on categorial and interval scaled variables here
in R, interval scaled variables are of type numeric(), categorial scaled variables are usually of type factor()
the estimated parameters for interval scaled variables are slopes (that define a line in combination with an intercept)
for categorial explanatory variables the estimated coefficients are differences that are added to every observation in a subgroup.
consequently in a sub-category the estimated outcome is equal for all members of this sub-category
it is the arithmetic mean of the outcome values in this sub-group
therefore we talk about “mean differences” that are tested
categorial explanatory variables with more than two sub-groups are transferred to dummy variables (number of sub-groups - 1)
these dummy variables replace the original explanatory variable in the model
sub-group membership is expressed by a combination of 1 (member) or 0 (no member) of all the dummy variables
we have various types of dummy coding. A common one is reference coding. In the reference category we find 0 in all dummy variables. This is default in R’s lm() and in others.
we have one effect and a specific coefficient for each dummy variable. For reference coding it expresses the difference of means in comparison with the reference category.
fixed effects result in a coefficient for every explanatory variable with a significance test included. In the case of categorial explanatory variables this is true for every dummy variable that is used in the model in place of the categorial variable itself.
random effects are effects that are valid in a sub-group of observations only
these sub-groups are defined by one ore more leveling variable. It’s levels define sublevel membership of our observations.
with more than one leveling variable the levels are defined by a sub-level combination
in MLM we get coefficients that are specific for every sub-level
random effect coefficients have no significance test
random effects “bind” or “explain” residual variability not explained by fixed effects
we get an impression of the importance of random effects via model tests or via confidence intervals confint()
random coefficients can be seen as correctors of fixed effect coefficients, they are added to predict an observations outcome
interpretation of random coefficients is the same like for fixed coefficients. They also depend on scale level.
in nlme::lme we can use fixef() and ranef() to output fixed and random coefficients resp.
the dummy variable decomposition of categorial variables holds true for random effects also. The mean differences they express are valid for the sub-goup they refer to.
categorial random effects generate (sub-level * dummy variables) coefficients. These can be a lot in comparison to the number of observations.
it doesn’t make sense to include a level defining variable as a fixed effect
fixed coefficients can be used to predict outcome for new observations. We can use predict() to do so
random coefficients can only be used for prediction, if the new observations hold the same leveling variables as were used for model estimation.
in nlme::lme() sub-levels are defined after the | in in the random part of our model formula
in front of the | we ask for the random effects we want integrate in the model
the syntax for the fixed part in the model formula is the same like in lm()
we also have a summary() for nlme::lme() result objects

the above in German

Es gibt fixed und random Effects
Diese Effekte wirken entweder für alle Beobachtungen (Vpn) gleich, sind also “fix”, oder sie wirken spezifisch für bestimmte Subgruppen, aus Sicht von Level 1 (Grundlevel) also “random”.
Effekte und Interpretation von Parametern hängen vom Skalenniveau der erkärenden Variablen ab.
Relevant sind in unserem Kontext kategoriale oder intervallskalierte (stetige) Variablen.
Intervallskalierte Variablen haben in R den Datentyp numeric(), kategoriale Variablen den Datentyp factor().
Bei intervallskalierten erklärenden Variablen ist der anzupassende Parameter eine Steigung über ihren Wertebereich (in Kombination mit einem Intercept, um die Gerade zu fixieren)
Bei kategorialen erklärenden Variablen sind die Parameter Werte, die bei allen Beobachtungen einer Teilkategorie zum Schätzwert dazugezählt oder abgezogen werden. Pro Teilkategorie gibt es also einen konstanten Schätzwert für alle dieser Teilkategorie angehörenden Beobachtungen. Dies ist auch der Mittelwert der vorhergesagten Werte in dieser Teilkategorie. Daher spricht man von Mittelwertsunterschieden.
Bei kategorialen erklärenden Variablen mit mehr als zwei Teilkategorien werden neue erklärende Variablen generiert und benutzt and Stelle der tatsächlichen Werte der Variable. Sie werden Dummy-Variablen genannt.
Dummy-Variablen sorgen dafür, dass die Zugehörigkeit zu einer Teilkategorie durch 0 (nicht zugehörig) und 1 (zugehörig) in Kombination mit den anderen Dummy-Variablen ausgedrückt wird Es gibt Teilkategorien - 1 Dummy-Variablen für eine kategoriale erklärende Variable. Meist wird eine der Teilkategorien als Referenz-Kategorie benutzt (Referenzkodierung). Beobachtungen aus dieser Referenzgruppe haben in allen Dummy-Variablen eine 0
Für jede Dummy-Variable wird ein eigener Effekt geschätzt. Dies ist bei Referenzkodierung die Abweichung der Teilkategorie vom Mittelwert der Referenzkategorie.
Bei fixed Effects wird für jede erklärende Variable ein Koeffizient (Parameter) geschätzt und statistisch geprüft. Wenn kategoriale Variablen im Spiel sind, gilt das für jede Dummy-Variable.
random Effects sind Effekte, die nur in speziellen Teilschichten wirken. Teilschichten sind Subgruppen von Beobachtungen, die nach einer bestimmten (oder mehreren) kategorialen Variable gebildet werden. in MLM sollen diese Teilschichten nicht als kategoriale erklärende Variable in den fixed Effects geprüft werden. Es werden pro Teilschicht ebenfalls Effekte geschätzt, die Parameter werden aber nicht einzeln statistisch geprüft.
random Effects sind Effekte, die die zugehörigen fixed Effects für die jeweilige Teilschicht korrigieren.
statistisch gehen die random Effects dadurch in die Testung ein, dass sie Varianz erklären, die sonst Residualvarianz wäre, also Varianz, die durch das Modell nicht erklärt wird.
auch bei random Effekts ist die Interpretation der Koeffizienten vergleichbar mit der bei fixed Effects und ist abhängig vom Skalenniveau der erklärenden Variablen
random Effects bzw. deren Koeffizienten für jede Teilschicht kann man sich bei nlme::lme() mit dem Befehl ranef(…) ausgeben lassen.
bei intervallskaierten, erklärenden Variablen sind die ausgegebenen Koeffizienten Korrekturen der Steigung des korrespondierenden fixed Effects
bei kategorialen erklärenden Variablen werden ebenfalls entsprechende Dummy-Effekte generiert und die Koeffizienten pro Teilschicht ermittelt
auch bei kategorialen erklärenden Variablen können die random Koeffizienten als Abweichung der jeweiligen Teilschicht von den analogen fixed Effects interpretiert werden, die für alle Beobachtungen gelten.
fixed Koeffizienten können für die Vorhersage von neuen Beobachtungen benutzt werden, beispielsweise mit dem Befehl predict(...)
random Effects können nur dann für eine Vorhersage von neuen Beobachtungen benutzt werden, wenn dieselbe Teilschicht auch bei der Parameterschätzung aufgetreten ist
wenn eine Versuchsperson die Teilschicht definiert (meist bei Messwiederholungs-Designs ), können also die random Koeffizienten nur für Vorhersagen bei derselben Versuchsperson benutzt werden
bei nlme::lme() erfolgt die Definition der Teilschichten hinter dem | im random-Teil der Formula
vor dem | im random Teil stehen die random Effects, die für die Teilschichten ermittelt werden sollen
der fixed Teil entspricht den Formulae bei den lm(), glm() etc. Befehlen
summary(…) funktioniert auch für lme() Ergebnisobjekte
die genauen Parameter können wir mit fixef(...) und ranef(...) anfordern

The same effekt fixed or random, categorial explanatory variables

This is shown using a four level categorial explanatory variable grp. The same grouping variable is used as fixed effect in one model and as random effect in another model. To count for covariation effects we include interaction in the fixed effect model.

library(tidyverse)
dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_dat.rds")))
#dd <- readRDS("parms_dat.rds")

# we begin simplifying things by reading in pre-post data only and consider it as beeing a between subjects design
dd <- dd %>% dplyr::filter(time %in% c("t0", "t1")) %>% dplyr::filter(grp %in% c("0cg", "0i1", "ecg", "ei1"))
dd$grp <- factor(dd$grp)
dd$time <- factor(dd$time)
dd$gender <- factor(dd$gender)

#psych::describe.by()
Rmisc::summarySE(dd, "rv", c("grp", "time"))

##   grp time  N         rv         sd         se         ci
## 1 0cg   t0 10  0.7033487 0.11979747 0.03788329 0.08569795
## 2 0cg   t1 10  0.5653026 0.15860233 0.05015446 0.11345727
## 3 0i1   t0 10  0.5568098 0.13325198 0.04213798 0.09532273
## 4 0i1   t1 10  1.1251133 0.08448893 0.02671775 0.06043974
## 5 ecg   t0 10  0.6070138 0.07976234 0.02522307 0.05705854
## 6 ecg   t1 10  0.5528961 0.09901235 0.03131046 0.07082917
## 7 ei1   t0 10 -0.0281138 0.13601622 0.04301210 0.09730014
## 8 ei1   t1 10  0.3768876 0.08264336 0.02613412 0.05911950

library(nlme)

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:lme4':
## 
##     lmList

## The following object is masked from 'package:dplyr':
## 
##     collapse

# seeing it as pure fixed effect model, no repeated measure design
m.a0 <- nlme::gls(rv ~ 1, data=dd, method='ML')
m.a1 <- nlme::gls(rv ~        grp, data=dd, method='ML')
m.a2 <- nlme::gls(rv ~ time*grp, data=dd, method='ML')
anova(m.a0, m.a1, m.a2)

##      Model df        AIC       BIC    logLik   Test   L.Ratio p-value
## m.a0     1  2   48.49202  53.25608 -22.24601                         
## m.a1     2  5  -13.35735  -1.44721  11.67867 1 vs 2  67.84937  <.0001
## m.a2     3  9 -109.37751 -87.93927  63.68875 2 vs 3 104.02016  <.0001

# we might be interested in the 
# summary(m.a1)
# but now more in the coefficients
m.a1$coefficients

## (Intercept)      grp0i1      grpecg      grpei1 
##   0.6343256   0.2066359  -0.0543707  -0.4599387

# so we see: alphabetic order applies to grp levels, our reference subgroup is 0cg, the coefficients are relative to 0cg
# coefficients can be interpreted as differences between group means
m.a2$coefficients

##   (Intercept)        timet1        grp0i1        grpecg        grpei1 
##    0.70334870   -0.13804611   -0.14653886   -0.09633493   -0.73146250 
## timet1:grp0i1 timet1:grpecg timet1:grpei1 
##    0.70634955    0.08392846    0.54304753

# visualization of m.a1
cc <- scales::hue_pal()(length(unique(dd$grp)))
displ <- 0.2
dd %>% 
  ggplot(aes(x=grp, y=rv, group=grp, color=grp)) +
  #geom_point(position="jitter") + 
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.2) + 
  # we add the effects based on coefficients
  geom_hline(aes(yintercept=m.a1$coefficients['(Intercept)']), color = cc[1], linetype="dashed") +
  geom_hline(aes(yintercept=m.a1$coefficients['(Intercept)'] + m.a1$coefficients['grp0i1']), color = cc[2], linetype="dashed") +
  geom_hline(aes(yintercept=m.a1$coefficients['(Intercept)'] + m.a1$coefficients['grpecg']), color = cc[3], linetype="dashed") +
  geom_hline(aes(yintercept=m.a1$coefficients['(Intercept)'] + m.a1$coefficients['grpei1']), color = cc[4], linetype="dashed") +
  geom_segment(aes(x = 1.8, y = m.a1$coefficients['(Intercept)'], xend = 2-displ, yend = m.a1$coefficients['(Intercept)'] + m.a1$coefficients['grp0i1']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  geom_segment(aes(x = 2.8, y = m.a1$coefficients['(Intercept)'], xend = 3-displ, yend = m.a1$coefficients['(Intercept)'] + m.a1$coefficients['grpecg']), color = cc[3], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  geom_segment(aes(x = 3.8, y = m.a1$coefficients['(Intercept)'], xend = 4-displ, yend = m.a1$coefficients['(Intercept)'] + m.a1$coefficients['grpei1']), color = cc[4], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  ggtitle('Visualization m.a1')

## `geom_smooth()` using formula 'y ~ x'

# visualization of m.a2
m.a2$coefficients

##   (Intercept)        timet1        grp0i1        grpecg        grpei1 
##    0.70334870   -0.13804611   -0.14653886   -0.09633493   -0.73146250 
## timet1:grp0i1 timet1:grpecg timet1:grpei1 
##    0.70634955    0.08392846    0.54304753

# cc <- scales::hue_pal()(length(m.a2$coefficients))
cc <- scales::hue_pal()(length(unique(dd$grp)))
dd %>% 
  ggplot(aes(x=time, y=rv, group=grp, color=grp)) +
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1) +
  stat_summary(fun = mean, geom="line") +
  geom_hline(aes(yintercept=m.a2$coefficients['(Intercept)']), color = cc[1], linetype="dashed") +
  geom_segment(aes(x = 1.8, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['timet1'], xend = 2.2, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['timet1']), color = cc[1], linetype="dotted") +
  geom_segment(aes(x = 1.8, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['timet1'] + m.a2$coefficients['grp0i1'] + m.a2$coefficients['timet1:grp0i1'], xend = 2.2, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['timet1'] +  m.a2$coefficients['grp0i1'] + m.a2$coefficients['timet1:grp0i1']), color = cc[2], linetype="dotted") +
  # geom_hline(aes(yintercept=m.a2$coefficients[1] + m.a2$coefficients[2]), color = cc[2], linetype="dashed") +
  #geom_hline(aes(yintercept=m.a2$coefficients[1] + m.a2$coefficients[3]), color = cc[3], linetype="dashed") +
  #geom_hline(aes(yintercept=m.a2$coefficients[1] + m.a2$coefficients[4]), color = cc[4], linetype="dashed") +
  #geom_hline(aes(yintercept=m.a2$coefficients[1] + m.a2$coefficients[5]), color = cc[5], linetype="dashed") +
  #
  # 0i1
  geom_segment(aes(x = 0.7, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grp0i1'], xend = 2.3, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grp0i1']), color = cc[2], linetype="dotted") +
  geom_segment(aes(x = 0.9, y = m.a2$coefficients['(Intercept)'], xend = 0.9, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grp0i1']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  geom_segment(aes(x = 2.1, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grp0i1'], xend = 2.1, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grp0i1'] + m.a2$coefficients['timet1']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  geom_segment(aes(x = 2.2, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grp0i1'] + m.a2$coefficients['timet1'], xend = 2.2, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grp0i1'] + m.a2$coefficients['timet1'] + m.a2$coefficients['timet1:grp0i1']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  # we show details for grp ei1 time t1
  geom_segment(aes(x = 0.7, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grpei1'], xend = 2.3, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grpei1']), color = cc[4], linetype="dotted") +
  geom_segment(aes(x = 0.8, y = m.a2$coefficients['(Intercept)'], xend = 0.8, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grpei1']), color = cc[4], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  # annotate(geom="text", x=0.7, y=0, label="grpei1", color=cc[4]) +
  geom_text(label="grpei1", x=0.9, y=0, angle=270, size=3, color=cc[4]) +
  geom_segment(aes(x = 2.1, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grpei1'], xend = 2.1, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grpei1'] + m.a2$coefficients['timet1']), color = cc[4], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  geom_text(label="timet1", x=2, y=-0.30, angle=270, size=3, color=cc[4]) +
    geom_segment(aes(x = 2.2, y = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grpei1'] + m.a2$coefficients['timet1'], xend = 2.2, yend = m.a2$coefficients['(Intercept)'] + m.a2$coefficients['grpei1'] + m.a2$coefficients['timet1'] + m.a2$coefficients['timet1:grpei1']), color = cc[4], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  geom_text(label="timet1:grpei1", x=2.3, y=-0.20, angle=90, size=3, color=cc[4]) +
    #geom_segment(aes(x = 2.8, y = m.a2$coefficients[1], xend = 3-displ, yend = m.a2$coefficients[1] + m.a2$coefficients[3]), color = cc[3], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  #geom_segment(aes(x = 3.8, y = m.a2$coefficients[1], xend = 4-displ, yend = m.a2$coefficients[1] + m.a2$coefficients[4]), color = cc[4], arrow = arrow(length = unit(0.3, "cm")), show.legend=F) +
  ggtitle('Visualization m.a2')

## `geom_smooth()` using formula 'y ~ x'

m.a2$coefficients

##   (Intercept)        timet1        grp0i1        grpecg        grpei1 
##    0.70334870   -0.13804611   -0.14653886   -0.09633493   -0.73146250 
## timet1:grp0i1 timet1:grpecg timet1:grpei1 
##    0.70634955    0.08392846    0.54304753

# cc <- scales::hue_pal()(length(m1$coefficients) - 1)
# dd %>% 
#   ggplot(aes(x=time, y=rv, group=gender, color=gender)) +
#   geom_jitter(width = 0.2, alpha=0.3) +
#   geom_smooth(method=lm, se=FALSE, size=0.2) +
#   stat_summary(fun = mean, geom="point", size=3) +
#   stat_summary(fun.data = mean_se, geom = "errorbar", width=0.2) +
#   stat_summary(fun = mean, geom="line") +
#   geom_hline(aes(yintercept=m1$coefficients[1]), color = cc[1], linetype="dashed") +
#   geom_hline(aes(yintercept=m1$coefficients[1] + m1$coefficients[2]), color = cc[2], linetype="dashed") +
#   geom_hline(aes(yintercept=m1$coefficients[1] + m1$coefficients[3]), color = cc[3], linetype="dashed") +
#   ggtitle('Visualization m1')
# 


# how about considering grp as random effect?
m.r0 <- lme(fixed = rv ~ 1, random = ~1 | grp, data=dd, method='ML')
# m.0 already has a fixed effect Intercept, which can be seen as the grand mean of all observations
# m.0 random effects are per subgroup variations of the fixed effect
m.r0$coefficients

## $fixed
## (Intercept) 
##   0.5574073 
## 
## $random
## $random$grp
##     (Intercept)
## 0cg  0.07388648
## 0i1  0.27237739
## ecg  0.02165891
## ei1 -0.36792278

# m.2 includes time (pre post) as fixed effect
m.r1 <- lme(fixed = rv ~ time, random = ~1  | grp, data=dd, method='ML')  
m.r2 <- lme(fixed = rv ~ time, random = ~1 + time | grp, data=dd, method='ML')
m.r2$coefficients

## $fixed
## (Intercept)      timet1 
##   0.4597646   0.1952853 
## 
## $random
## $random$grp
##     (Intercept)     timet1
## 0cg  0.23784828 -0.3234811
## 0i1  0.09940877  0.3617214
## ecg  0.14334157 -0.2420069
## ei1 -0.48059862  0.2037666

anova(m.r0, m.r1, m.r2)

##      Model df       AIC       BIC   logLik   Test  L.Ratio p-value
## m.r0     1  3  -0.31967   6.82641  3.15983                        
## m.r1     2  4 -17.01281  -7.48471 12.50641 1 vs 2 18.69315  <.0001
## m.r2     3  6 -74.96623 -60.67407 43.48312 2 vs 3 61.95342  <.0001

# visualization of m.r2
# we find coefficients for fixed effects in m.r2$coefficients[[1]]
# and coefficients for random effects in m.r2$coefficients[[2]]
#cc <- scales::hue_pal()(length(m.r2$coefficients))
cc <- scales::hue_pal()(length(unique(dd$grp)))
dd %>% 
  ggplot(aes(x=time, y=rv, group=grp, color=grp)) +
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1) +
  stat_summary(fun = mean, geom="line") +
  geom_hline(aes(yintercept=m.r2$coefficients[['fixed']]['(Intercept)']), color = "darkblue") +
  geom_text(label="fixed (Intercept)", x=0.7, y=m.r2$coefficients[['fixed']]['(Intercept)'] + 0.02, color="darkblue") +
  geom_hline(aes(yintercept=m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']), color = "darkblue", linetype="dashed") +
  geom_segment(aes(x = 1.9, y = m.r2$coefficients[['fixed']]['(Intercept)'], xend = 1.9, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']), color = "darkblue", arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_text(label="fixed timet1", x=1.9, y=m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1'] + 0.1, angle=90, color="darkblue") +
  geom_segment(aes(x = 1.1, y = m.r2$coefficients[['fixed']]['(Intercept)'], xend = 1.1, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['random']][[1]]['0cg','(Intercept)']), color = cc[1], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_segment(aes(x = 1.2, y = m.r2$coefficients[['fixed']]['(Intercept)'], xend = 1.2, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['random']][[1]]['0i1','(Intercept)']), color = cc[2], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_segment(aes(x = 1.3, y = m.r2$coefficients[['fixed']]['(Intercept)'], xend = 1.3, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['random']][[1]]['ecg','(Intercept)']), color = cc[3], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_segment(aes(x = 1.4, y = m.r2$coefficients[['fixed']]['(Intercept)'], xend = 1.4, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['random']][[1]]['ei1','(Intercept)']), color = cc[4], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_segment(aes(x = 1, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['random']][[1]]['ei1','(Intercept)'], 
                   xend = 1.4, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['random']][[1]]['ei1','(Intercept)']), color = cc[4], linetype="dotted", show.legend=F) +
  geom_text(label="random intercept ei1", x=1.4, y=-0.35, angle=270, size=3, color=cc[4]) +
  geom_segment(aes(x = 2.05, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1'], 
                   xend = 2.05, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['0cg','(Intercept)']), 
                     color = cc[1], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_segment(aes(x = 2.1, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']+ m.r2$coefficients[['random']][[1]]['0cg','(Intercept)'], 
                   xend = 2.1, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['0cg','(Intercept)'] + m.r2$coefficients[['random']][[1]]['0cg','timet1']), 
                     color = cc[1], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  # 0i1
  geom_segment(aes(x = 2.2, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1'], 
                   xend = 2.2, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['0i1','(Intercept)']), 
                     color = cc[2], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_text(label="random intercept 0i1", x=2.2, y=0.55, angle=90, size=3, color=cc[2]) +
  geom_segment(aes(x = 2.25, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']+ m.r2$coefficients[['random']][[1]]['0i1','(Intercept)'], 
                   xend = 2.25, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['0i1','(Intercept)'] + m.r2$coefficients[['random']][[1]]['0i1','timet1']), 
                     color = cc[2], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_segment(aes(x = 2, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']+ m.r2$coefficients[['random']][[1]]['0i1','(Intercept)'] + m.r2$coefficients[['random']][[1]]['0i1','timet1'], 
                   xend = 2.4, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['0i1','(Intercept)'] + m.r2$coefficients[['random']][[1]]['0i1','timet1']), 
                     color = cc[2], linetype="dotted", show.legend=F) +  
  geom_text(label="random timet1 0i1", x=2.25, y=0.82, angle=90, size=3, color=cc[2]) +
  # ei1
  geom_segment(aes(x = 2.35, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1'], 
                   xend = 2.35, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['ei1','(Intercept)']), 
                     color = cc[4], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_text(label="random intercept ei1", x=2.35, y=-0.2, angle=270, size=3, color=cc[4]) +
  geom_segment(aes(x = 2.4, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']+ m.r2$coefficients[['random']][[1]]['ei1','(Intercept)'], 
                   xend = 2.4, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['ei1','(Intercept)'] + m.r2$coefficients[['random']][[1]]['ei1','timet1']), 
                     color = cc[4], arrow = arrow(length = unit(0.1, "cm")), show.legend=F) +
  geom_segment(aes(x = 2, y = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']+ m.r2$coefficients[['random']][[1]]['ei1','(Intercept)'] + m.r2$coefficients[['random']][[1]]['ei1','timet1'], 
                   xend = 2.4, yend = m.r2$coefficients[['fixed']]['(Intercept)'] + m.r2$coefficients[['fixed']]['timet1']  + m.r2$coefficients[['random']][[1]]['ei1','(Intercept)'] + m.r2$coefficients[['random']][[1]]['ei1','timet1']), 
                     color = cc[4], linetype="dotted", show.legend=F) +
  geom_text(label="random timet1 ei1", x=2.4, y=0.1, angle=90, size=3, color=cc[4]) +
  ggtitle('Visualization m.r2')

## `geom_smooth()` using formula 'y ~ x'

m.r2$coefficients

## $fixed
## (Intercept)      timet1 
##   0.4597646   0.1952853 
## 
## $random
## $random$grp
##     (Intercept)     timet1
## 0cg  0.23784828 -0.3234811
## 0i1  0.09940877  0.3617214
## ecg  0.14334157 -0.2420069
## ei1 -0.48059862  0.2037666

We can interprete the parameters m.a2$coefficients of model m.a2 as differences of group means. We start at the mean of the reference group and add the effect (the coefficients) that are caused by membership of the respective sub-category.

With MLM and in model m.r2 the fixed intercept represents the mean at t0 for all sub-groups. Each sub-group (sub-level) has a random intercept which expresses the difference of the sub-group from the fixed intercept. t1 values are the result of fixed intercept + fixed intercept t1. We then add the random effects for the respective sub-group, i. e. random intercept plus random t1.

Conclusion

We finally get at the same means in model m.a2 and m.r2. So we express the same effects as fixed effects with interaction or as random effects. With fixed effects, every coefficient has it`s significance test.

MLM continuous explanatory variable and levels

A fixed effect for a continuous explanatory variable can have two parameters: an intercept and an inclination. These parameters are the same across all levels. If we put the same explanatory variable as random effect, we get per level corrections of the fixed effects that we have to add.

One continuous xplanatory variable and its fixed and random effects with a categorial level variable

mv - reaction variable
ntime - explanatory variable, type numeric, fixed
ntime - explanatory variable, type numeric, random
grp - level-variable, four sublevels, type categorial

We visualize the parameters and how to use or interprete them.

# dd.w <- read.delim("http://md.psych.bio.uni-goettingen.de/data/virt/v_stait_exam_wide.txt")
dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_dat.rds")))
#dd <- readRDS("parms_dat.rds")

# we begin simplifying things by reading in pre-post data only and consider it as beeing a between subjects design
#dd <- dd %>% dplyr::filter(time %in% c("t0", "t1")) %>% dplyr::filter(grp %in% c("0cg", "0i1", "ecg", "ei1"))
dd <- dd %>% dplyr::filter(time %in% c("t1", "tf1", "tf2")) %>% dplyr::filter(grp %in% c("0cg", "0i1", "ecg", "ei1"))
dd$grp <- factor(dd$grp)
# dd$time <- factor(dd$time) # we consider time to be constant, this would imply at least equal measure intervals ...
dd$time <- factor(dd$time, levels=c("0", "t1", "tf1", "tf2"))
dd$ntime <- as.numeric(factor(dd$time))
dd$gender <- factor(dd$gender)

#psych::describe.by()
Rmisc::summarySE(dd, "rv", c("grp", "time"))

##    grp time  N        rv         sd         se         ci
## 1  0cg   t1 10 0.5653026 0.15860233 0.05015446 0.11345727
## 2  0cg  tf1 10 0.6120282 0.14071816 0.04449899 0.10066371
## 3  0cg  tf2 10 0.6072571 0.12564241 0.03973162 0.08987916
## 4  0i1   t1 10 1.1251133 0.08448893 0.02671775 0.06043974
## 5  0i1  tf1 10 0.8705165 0.10621002 0.03358656 0.07597807
## 6  0i1  tf2 10 0.7824625 0.08434038 0.02667077 0.06033348
## 7  ecg   t1 10 0.5528961 0.09901235 0.03131046 0.07082917
## 8  ecg  tf1 10 0.5982876 0.13026683 0.04119399 0.09318728
## 9  ecg  tf2 10 0.6450731 0.10683574 0.03378443 0.07642569
## 10 ei1   t1 10 0.3768876 0.08264336 0.02613412 0.05911950
## 11 ei1  tf1 10 0.3462474 0.10115884 0.03198923 0.07236467
## 12 ei1  tf2 10 0.4498176 0.09536539 0.03015718 0.06822029

m.c0  <- lme(fixed = rv ~ 1, random = ~1 | grp, data=dd, method='ML')
m.c0a <- lme(fixed = rv ~ 1, random = ~1 | subj, data=dd, method='ML')

# m.0 already has a fixed effect Intercept, which can be seen as the grand mean of all observations
# m.0 random effects are per subgroup variations of the fixed effect
m.c0$coefficients

## $fixed
## (Intercept) 
##   0.6276575 
## 
## $random
## $random$grp
##     (Intercept)
## 0cg -0.03225586
## 0i1  0.29346956
## ecg -0.02843014
## ei1 -0.23278356

# m.2 includes time (pre post) as fixed effect
m.c1a  <- lme(fixed = rv ~ ntime, random = ~1  | grp, data=dd, method='ML')
ctrl <- lmeControl(opt='optim');
m.c1 <- lme(fixed = rv ~ ntime, random = ~1 + ntime | grp,      data=dd, control=ctrl, method='ML')
cc <- scales::hue_pal()(length(unique(dd$grp)))

# one fixed continuous explanatory variable, the same variable as random effect, one level
mm  <- lme(fixed = rv ~ ntime, random = ~1 + ntime | grp, data=dd, control=ctrl, method='ML')
mm$coefficients

## $fixed
## (Intercept)       ntime 
##  0.66155478 -0.01694866 
## 
## $random
## $random$grp
##     (Intercept)       ntime
## 0cg -0.09089703  0.02782686
## 0i1  0.58282181 -0.14226280
## ecg -0.11803254  0.04174906
## ei1 -0.37389224  0.07268687

dd %>% 
  ggplot(aes(x=time, y=rv, group=grp, color=grp)) + 
  # scale_x_discrete(limits=c('0','t1', 'tf1', 'tf2')) +
  #ggplot(aes(x=time, y=rv, group=grp, color=grp), scale_x_discrete(limits=c('0','t1', 'tf1', 'tf2'))) +  #xlim('0','t1', 'tf1', 'tf2')
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3, alpha=0.3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1, alpha=0.3) +
  stat_summary(fun = mean, geom="line", alpha=0.3) +
  # geom_abline(aes(intercept=a, slope=b), data=df3) +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = mm$coefficients$fixed['(Intercept)']), color = "darkblue", arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'], slope=mm$coefficients$fixed['ntime']), color = "darkblue", size=1.5) +
  geom_text(label="f (Intercept), f slope ntime", x=0.8, y=mm$coefficients$fixed['(Intercept)'] + 0.02, angle=-5, color="darkblue") +
  geom_text(label='0,0', x=-0, y=0, color="black", show.legend=F) +
  geom_segment(aes(x=-0.3, y=0, xend=0, yend=0), alpha=0, show.legend=F) +
  # we visualize random effect of grp 0i1
  geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'] + mm$coefficients$random$grp['0i1','(Intercept)'], slope=mm$coefficients$fixed['ntime'] + mm$coefficients$random$grp['0i1','ntime']), color = cc[2], size=1.5, linetype="dotted") +
  geom_segment(aes(x = 0, y = mm$coefficients$fixed['(Intercept)'], xend = 0, yend = mm$coefficients$fixed['(Intercept)'] + mm$coefficients$random$grp['0i1','(Intercept)']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  ggtitle('Visualization of, Parameters of mm')

## Warning: Ignoring unknown aesthetics: x

## Warning: Ignoring unknown aesthetics: x

## `geom_smooth()` using formula 'y ~ x'

m.c2 <- mm
m.c2$coefficients

## $fixed
## (Intercept)       ntime 
##  0.66155478 -0.01694866 
## 
## $random
## $random$grp
##     (Intercept)       ntime
## 0cg -0.09089703  0.02782686
## 0i1  0.58282181 -0.14226280
## ecg -0.11803254  0.04174906
## ei1 -0.37389224  0.07268687

Blue line

the blue line visualizes the fixed effect of ntime
it is valid for all four sublevels
it is composed on the fixed intercept 0.6615548 and the fixed slope -0.0169487
this line with a slope of -0.0169487 is fixed at x=0 and y=0.6615548

Green arrow and green dotted line

the vislualizes the effect of ntime for sublevel grp == '0i1'
it is composed on base of fixed and random parameters
the slope is the sum of fixed effect slope and random effect slope
we can see random parameters as a way to correct the fixed effect to a specific sublevel (subgroup)

One continuous xplanatory variable and its fixed and random effects with two nested categorial level variables

mv - reaction variable
ntime - explanatory variable, type numeric, fixed
ntime - explanatory variable, type numeric, random
grp - level-variable, four sublevels, type categorial, level 3
subj - level-variable that’s nested under grp, level 2

We visualize the parameters and show how to use or interprete them.

We show only one random group effect and the effects of one subject in this group, just to show the idea.

dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_dat.rds")))

# we use only the data after the intervention, that is post treatment (t1) and two follow ups
# we also consider it a repeated measures design because we take care of the common influence in the data of one person
dd <- dd %>% dplyr::filter(time %in% c("t1", "tf1", "tf2")) %>% dplyr::filter(grp %in% c("0cg", "0i1", "ecg", "ei1"))
dd$grp <- factor(dd$grp)
# dd$time <- factor(dd$time, levels=c("0", "t1", "tf1", "tf2"))
dd$time <- factor(dd$time)
dd$ntime <- as.numeric(factor(dd$time))
dd$gender <- factor(dd$gender)
# we prepare colors
cc <- scales::hue_pal()(length(unique(dd$grp)))
# and the data of one subject
#ss <- '0i1_1'
ss <- '0i1_9'
gs <- '0i1/0i1_9'
s.data <- c(dd$rv[dd$subj==ss & dd$time=='t1'],
            dd$rv[dd$subj==ss & dd$time=='tf1'],
            dd$rv[dd$subj==ss & dd$time=='tf2']
            )
# Rmisc::summarySE(dd, "rv", c("grp", "time"))

# one fixed continuous explanatory variable, the same variable as random effect, one level
mm  <- lme(fixed = rv ~ ntime, random = ~1 + ntime | grp/subj, data=dd, control=ctrl, method='ML')
mm$coefficients$fixed
mm$coefficients$random$grp
head(mm$coefficients$random$subj)
dd %>% 
  ggplot(aes(x=time, y=rv, group=grp, color=grp)) + 
  # scale_x_discrete(limits=c('0','t1', 'tf1', 'tf2')) +
  #ggplot(aes(x=time, y=rv, group=grp, color=grp), scale_x_discrete(limits=c('0','t1', 'tf1', 'tf2'))) +  #xlim('0','t1', 'tf1', 'tf2')
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3, alpha=0.3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1, alpha=0.3) +
  stat_summary(fun = mean, geom="line", alpha=0.3) +
  # geom_abline(aes(intercept=a, slope=b), data=df3) +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = mm$coefficients$fixed['(Intercept)']), color = "darkblue", arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'], slope=mm$coefficients$fixed['ntime']), color = "darkblue", size=1.5) +
  geom_text(label="f (Intercept), f slope ntime", x=0.8, y=mm$coefficients$fixed['(Intercept)'] + 0.02, angle=-5, color="darkblue") +
  geom_text(label='0,0', x=-0, y=0, color="black", show.legend=F) +
  geom_segment(aes(x=-0.3, y=0, xend=0, yend=0), alpha=0, show.legend=F) +
  # we visualize random effect of grp 0i1
  geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'] + mm$coefficients$random$grp['0i1','(Intercept)'], slope=mm$coefficients$fixed['ntime'] + mm$coefficients$random$grp['0i1','ntime']), color = cc[2], size=1.5, linetype="dashed") +
  geom_segment(aes(x = 0, y = mm$coefficients$fixed['(Intercept)'], xend = 0, yend = mm$coefficients$fixed['(Intercept)'] + mm$coefficients$random$grp['0i1','(Intercept)']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  #geom_point(data=dd[subj==0i1_1], size=3, color=cc[2])
  # geom_point(stat = "identity", aes(y=s.data, x=c("t1", "tf1", "tf2")), size=3, color=cc[2]) +
  geom_point(stat = "identity", aes(y=s.data[1], x="t1"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[2], x="tf1"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[3], x="tf2"), size=3, color="lightgreen") +
  # geom_line(aes())
  # intercept for one subject
  geom_segment(aes(x = 0, 
                   y = mm$coefficients$fixed['(Intercept)'] + 
                     mm$coefficients$random$grp['0i1', '(Intercept)'],
                   xend = 0, 
                   yend = mm$coefficients$fixed['(Intercept)'] + 
                     mm$coefficients$random$grp['0i1','(Intercept)'] + 
                     mm$coefficients$random$subj[gs,'(Intercept)']), 
                   #color = cc[2], 
                color="lightgreen",
                arrow = arrow(length = unit(0.3, "cm")), 
                show.legend=F) +
  # regression line with slope for the same subject
  geom_abline(aes(x=ntime, 
                  intercept=mm$coefficients$fixed['(Intercept)'] + 
                    mm$coefficients$random$grp['0i1','(Intercept)'] + 
                    mm$coefficients$random$subj[gs,'(Intercept)'],
                  slope=mm$coefficients$fixed['ntime'] + 
                    mm$coefficients$random$grp['0i1','ntime'] +
                    mm$coefficients$random$subj[gs,'ntime']), 
                  # color = cc[2], 
                color="lightgreen",
                size=1.2,
                linetype="dashed") +
    ggtitle('Visualization of, Parameters of mm')  
  

m.c2 <- mm
m.c2$coefficients




















# two nested levels
mm  <- lme(fixed = rv ~ ntime, random = ~1 + ntime | grp/subj, data=dd, control=ctrl, method='ML')


# examples to access fixed effects using names
m.c2$coefficients$fixed['ev1']
m.c2$coefficients$fixed['(Intercept)']  # caution: brackets are part of the name, R names are case sensitive (capital I in Intercept)

# examples to access random effects by name
m.c2$coefficients$random$grp['0i1', 'timetf2']             #random effect of timetf2 for grp 0i1 of model m.c2
m.c2$coefficients$random$subj['ei1/ei1_9', 'timetf1']  # random effect of timetf1 for subject ei1_9 in grp ei1 of model m.c2


m.c2  <- lme(fixed = rv ~ time + ev1, random = ~1 + time | grp/subj, data=dd, control=ctrl, method='ML')
dd %>% 
  ggplot(aes(x=time, y=rv, group=grp, color=grp)) +
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1) +
  stat_summary(fun = mean, geom="line", alpha=0.3) +
  geom_hline(aes(yintercept=m.c2$coefficients$fixed['(Intercept)']), color = "darkblue") +
  geom_text(label="f (Intercept)", x=0.7, y=m.c2$coefficients$fixed['(Intercept)'] + 0.02, color="darkblue") +
  geom_segment(aes(x = 1.7, y=m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf1'], xend=2.3, yend=m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf1']), color = "darkblue", linetype="dashed") +
  geom_segment(aes(x = 1.9, y = m.c2$coefficients$fixed['(Intercept)'], xend = 1.9, yend = m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf1']), color = "darkblue", arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_text(label="f timetf1", x=1.9, y=m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf1'] + 0.1, angle=90, color="darkblue") +
  #geom_hline(aes(yintercept=m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf2']), color = "darkblue", linetype="dashed") +
  geom_segment(aes(x = 2.7, y=m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf2'], xend=3.3, yend=m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf2']), color = "darkblue", linetype="dashed") +
  geom_segment(aes(x = 2.9, y = m.c2$coefficients$fixed['(Intercept)'], xend = 2.9, yend = m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf2']), color = "darkblue", arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_text(label="f timetf2", x=2.9, y=m.c2$coefficients$fixed['(Intercept)'] + m.c2$coefficients$fixed['timetf2'] + 0.1, angle=90, color="darkblue") +
  ggtitle('Visualization m.c2')  
  
# m.c2$coefficients[['random']][['random$subj']]['ei1/ei1_9']

m.c2$coefficients

Only parameters are used to visualize the regression lines and effects ect. in the graph above. This is the reason for the somewhat lengthy code.

Visualization of the modelling of one single observation

dd <- readRDS(gzcon(url("http://md.psych.bio.uni-goettingen.de/mv/data/div/parms_dat.rds")))
##readRDS("parms_dat.rds")
# we use only the data after the intervention, that is post treatment (t1) and two follow ups
# we also consider it a repeated measures design because we take care of the common influence in the data of one person
dd <- dd %>% dplyr::filter(time %in% c("t1", "tf1", "tf2", "tf3")) %>% dplyr::filter(grp %in% c("0cg", "0i1", "ecg", "ei1"))
dd$grp <- factor(dd$grp)
# dd$time <- factor(dd$time, levels=c("0", "t1", "tf1", "tf2"))
dd$time <- factor(dd$time)
dd$ntime <- as.numeric(factor(dd$time))
dd$gender <- factor(dd$gender)
# we prepare colors
cc <- scales::hue_pal()(length(unique(dd$grp)))
# and the data of one subject
#ss <- '0i1_1'
#ss <- '0i1_9'
ss <- '0i1_3'
gs <- '0i1/0i1_9'
s.data <- c(dd$rv[dd$subj==ss & dd$time=='t1'],
            dd$rv[dd$subj==ss & dd$time=='tf1'],
            dd$rv[dd$subj==ss & dd$time=='tf2'],
            dd$rv[dd$subj==ss & dd$time=='tf3']
            )
s.dd <- dd[dd$subj == ss,]
# Rmisc::summarySE(dd, "rv", c("grp", "time"))

ctrl <- lmeControl(opt='optim')
ctrl <- lmeControl(opt="nlminb")
ctrl <- lmeControl(maxIter = 500, msMaxIter = 500, tolerance = 1e-12, niterEM = 250,
          msMaxEval = 2000,
          msTol = 1e-14, msVerbose = FALSE,
          returnObject = FALSE, gradHess = TRUE, apVar = TRUE,
          .relStep = .Machine$double.eps^(1/3), minAbsParApVar = 0.005,
          #opt = "optim",
          optimMethod = "BFGS", natural = TRUE,
          sigma = NULL,
          allow.n.lt.q = FALSE
      )
##mm  <- lme(fixed = rv ~ ntime, random = ~1 + ntime | subj, data=dd, control=ctrl, method='ML')
##mm  <- lme(fixed = rv ~ ntime, random = ~1 | subj, data=dd, method='ML')

mm  <- lme(fixed = rv ~ ntime, random = ~1 + ntime | subj, control=list(opt="optim"), data=dd, method='ML')
mm$coefficients$fixed

## (Intercept)       ntime 
##  0.67153244 -0.02293525

##mm$coefficients$random$grp
##head(mm$coefficients$random$subj)
# random parameters for our selected subject
mm$coefficients$random$subj[ss,]

## (Intercept)       ntime 
##  0.31343587 -0.05016188

dd %>% 
  ggplot(aes(x=time, y=rv, group=grp, color=grp)) + 
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3, alpha=0.3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1, alpha=0.3) +
  stat_summary(fun = mean, geom="line", alpha=0.3) +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = mm$coefficients$fixed['(Intercept)']), color = "darkblue", arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'], slope=mm$coefficients$fixed['ntime']), color = "darkblue", size=1.5) +
  geom_text(label='0,0', x=0, y=0, color="black", show.legend=F) +
  geom_segment(aes(x=-0.3, y=0, xend=0, yend=0), alpha=0, show.legend=F) +
  # we visualize random effect of grp 0i1
  #geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'] + mm$coefficients$random$grp['0i1','(Intercept)'], slope=mm$coefficients$fixed['ntime'] + mm$coefficients$random$grp['0i1','ntime']), color = cc[2], size=1.5, linetype="dashed") +
  #geom_segment(aes(x = 0, y = mm$coefficients$fixed['(Intercept)'], xend = 0, yend = mm$coefficients$fixed['(Intercept)'] + mm$coefficients$random$grp['0i1','(Intercept)']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  #geom_point(data=dd[subj==0i1_1], size=3, color=cc[2])
  # geom_point(stat = "identity", aes(y=s.data, x=c("t1", "tf1", "tf2")), size=3, color=cc[2]) +
  geom_point(stat = "identity", aes(y=s.data[1], x="t1"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[2], x="tf1"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[3], x="tf2"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[4], x="tf3"), size=3, color="lightgreen") +
  geom_point(data=s.dd, color="lightblue") +
  geom_smooth(data=s.dd, method=lm, se=FALSE, size=0.2, color="lightblue") +
  # geom_line(aes())
  # intercept for one subject
  geom_segment(aes(x = 0, 
                   y = mm$coefficients$fixed['(Intercept)'],
                     #+ mm$coefficients$random$grp['0i1', '(Intercept)'],
                   xend = 0, 
                   yend = mm$coefficients$fixed['(Intercept)'] + 
                     #mm$coefficients$random$grp['0i1','(Intercept)'] + 
                     mm$coefficients$random$subj[ss,'(Intercept)']), 
                   #color = cc[2], 
                color="lightgreen",
                arrow = arrow(length = unit(0.3, "cm")), 
                show.legend=F) +
  # regression line with slope for the same subject
  geom_abline(aes(x=ntime, 
                  intercept=mm$coefficients$fixed['(Intercept)'] + 
                    #mm$coefficients$random$grp['0i1','(Intercept)'] + 
                    mm$coefficients$random$subj[ss,'(Intercept)'],
                  slope=mm$coefficients$fixed['ntime'] + 
                    #mm$coefficients$random$grp['0i1','ntime'] +
                    mm$coefficients$random$subj[ss,'ntime']), 
                  # color = cc[2], 
                color="lightgreen",
                size=1.2,
                linetype="dashed") +
    ggtitle('Fixed time, random time, level subj')

## Warning: Ignoring unknown aesthetics: x

## Warning: Ignoring unknown aesthetics: x

## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

# ... we see:
# blue arrow is fixed intercept
# blue line is fixed time slope
# light green arrow is random intercept for one chosen subject
# light green dashed line is random slope over time for selected subject 

# we add group membership as fixed effect

ss <- '0i1_4'
mm  <- lme(fixed = rv ~ ntime + grp, random = ~1 + ntime | subj, data=dd, control=list(opt="optim"), method='ML')
mm$coefficients$fixed

## (Intercept)       ntime      grp0i1      grpecg      grpei1 
##  0.64030424 -0.02293525  0.29267154  0.01437014 -0.18212887

##mm$coefficients$random$grp
##head(mm$coefficients$random$subj)
# random parameters for our selected subject
mm$coefficients$random$subj[ss,]

## (Intercept)       ntime 
##  0.23107596 -0.08626512

dd %>% 
  ggplot(aes(x=time, y=rv, group=grp, color=grp)) + 
  geom_jitter(width = 0.2, alpha=0.3) +
  geom_smooth(method=lm, se=FALSE, size=0.2) +
  stat_summary(fun = mean, geom="point", size=3, alpha=0.3) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1, alpha=0.3) +
  stat_summary(fun = mean, geom="line", alpha=0.3) +
  geom_segment(aes(x = 0, y = 0, xend = 0, yend = mm$coefficients$fixed['(Intercept)']), color = "darkblue", arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'], slope=mm$coefficients$fixed['ntime']), color = "darkblue", size=1.5) +
  geom_text(label='0,0', x=0, y=0, color="black", show.legend=F) +
  geom_segment(aes(x=-0.3, y=0, xend=0, yend=0), alpha=0, show.legend=F) +
  # we visualize the fixed effect of grp 0i1
  geom_segment(aes(x = 0, y = mm$coefficients$fixed['(Intercept)'], xend = 0, yend = mm$coefficients$fixed['(Intercept)'] + mm$coefficients$fixed['grp0i1']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  geom_abline(aes(x=ntime, intercept=mm$coefficients$fixed['(Intercept)'] + mm$coefficients$fixed['grp0i1'], slope=mm$coefficients$fixed['ntime']), color = cc[2], size=1.5, linetype="dashed") +
  #geom_segment(aes(x = 0, y = mm$coefficients$fixed['(Intercept)'], xend = 0, yend = mm$coefficients$fixed['(Intercept)'] + mm$coefficients$random$grp['0i1','(Intercept)']), color = cc[2], arrow = arrow(length = unit(0.3, "cm")), size=1.5, show.legend=F) +
  #geom_point(data=dd[subj==0i1_1], size=3, color=cc[2])
  # geom_point(stat = "identity", aes(y=s.data, x=c("t1", "tf1", "tf2")), size=3, color=cc[2]) +
  geom_point(stat = "identity", aes(y=s.data[1], x="t1"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[2], x="tf1"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[3], x="tf2"), size=3, color="lightgreen") +
  geom_point(stat = "identity", aes(y=s.data[4], x="tf3"), size=3, color="lightgreen") +
  geom_point(data=s.dd, color="lightblue") +
  geom_smooth(data=s.dd, method=lm, se=FALSE, size=0.2, color="lightblue") +
  # geom_line(aes())
  # intercept for one subject
  geom_segment(aes(x = 0, 
                   y = mm$coefficients$fixed['(Intercept)'] + mm$coefficients$fixed['grp0i1'], 
                     #+ mm$coefficients$random$grp['0i1', '(Intercept)'],
                   xend = 0, 
                   yend = mm$coefficients$fixed['(Intercept)'] + mm$coefficients$fixed['grp0i1'] +
                     #mm$coefficients$random$grp['0i1','(Intercept)'] + 
                     mm$coefficients$random$subj[ss,'(Intercept)']), 
                   #color = cc[2], 
                color="lightgreen",
                arrow = arrow(length = unit(0.3, "cm")), 
                show.legend=F) +
  # regression line with slope for the same subject
  geom_abline(aes(x=ntime, 
                  intercept=mm$coefficients$fixed['(Intercept)'] + 
                    mm$coefficients$fixed['grp0i1'] +
                    #mm$coefficients$random$grp['0i1','(Intercept)'] + 
                    mm$coefficients$random$subj[ss,'(Intercept)'],
                  slope=mm$coefficients$fixed['ntime'] + 
                    #mm$coefficients$random$grp['0i1','ntime'] +
                    mm$coefficients$random$subj[ss,'ntime']), 
                  # color = cc[2], 
                color="lightgreen",
                size=1.2,
                linetype="dashed") +
    ggtitle('Fixed group, fixed time, random time, level subj')

## Warning: Ignoring unknown aesthetics: x

## Warning: Ignoring unknown aesthetics: x

## Warning: Ignoring unknown aesthetics: x

## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

# ... we see:
# blue arrow is fixed intercept
# blue line is fixed time slope
# olive arrow is fixed group intercept for selected group 0i1
# olive dashed line is fixed group slope for selected group 0i1
# light green arrow is random intercept for one chosen subject
# light green dashed line is random slope over time for selected subject 

# prediction
predict(mm, s.dd)

##     0i1_3     0i1_3     0i1_3     0i1_3 
## 0.9206766 0.8908139 0.8609512 0.8310885 
## attr(,"label")
## [1] "Predicted values"

m.c2 <- mm
head(m.c2$coefficients)

## $fixed
## (Intercept)       ntime      grp0i1      grpecg      grpei1 
##  0.64030424 -0.02293525  0.29267154  0.01437014 -0.18212887 
## 
## $random
## $random$subj
##         (Intercept)        ntime
## 0cg_1   0.087069296 -0.031960138
## 0cg_10 -0.178430257  0.065749749
## 0cg_2   0.139760242 -0.052185384
## 0cg_3  -0.039174003  0.014740702
## 0cg_4   0.184158339 -0.068641560
## 0cg_5  -0.135869614  0.050526052
## 0cg_6  -0.155384849  0.058111456
## 0cg_7  -0.083752090  0.031252333
## 0cg_8   0.062165340 -0.022874156
## 0cg_9  -0.132149928  0.048811019
## 0i1_1   0.164203286 -0.060330831
## 0i1_10  0.194540818 -0.072745625
## 0i1_2   0.169782825 -0.063080830
## 0i1_3   0.017563490 -0.006927429
## 0i1_4   0.231075965 -0.086265117
## 0i1_5   0.210267959 -0.077573057
## 0i1_6   0.073838382 -0.027514626
## 0i1_7   0.143337018 -0.052641404
## 0i1_8   0.064495241 -0.024322408
## 0i1_9   0.296093794 -0.110430059
## ecg_1   0.012971439 -0.004588042
## ecg_10 -0.218842048  0.081549808
## ecg_2  -0.145530198  0.052916802
## ecg_3  -0.156428091  0.057771886
## ecg_4   0.047983153 -0.018331512
## ecg_5   0.043186870 -0.015814099
## ecg_6  -0.027774799  0.010517919
## ecg_7  -0.006006634  0.002750455
## ecg_8  -0.073474899  0.027085498
## ecg_9  -0.061505894  0.023759888
## ei1_1  -0.111445761  0.040709472
## ei1_10  0.016203475 -0.005502966
## ei1_2  -0.046821869  0.017604469
## ei1_3   0.093613518 -0.035513116
## ei1_4  -0.052221657  0.020165921
## ei1_5  -0.016071676  0.006511728
## ei1_6  -0.077487469  0.028810201
## ei1_7  -0.188762358  0.070151355
## ei1_8  -0.227055385  0.083908839
## ei1_9  -0.118120971  0.043836807

The two graphs above want to show the composition of the prediction of a specific observation. The first graph is based on a model with fixed effect time and random effect time. So the individual regression line (light green, dashed) is a “correction” of the dark blue regression line constructed on base of fixed parameters by the random parameters, so that finally results the light green, dashed regression line. You can clearly see, that it is adapted to the emphasized individual data points of the selected observation (fat light green points).

The second graph shows an extension. Here the resulting one observation regression line is composed of the fixed effect time, of the fixed effect grp which results in an group specific line (olive green). The selected observations regression line is build on base of the above mentioned two fixed effects plus it’s own random coefficients. The final individual line is the same, but built differently.

The ggplot() code is a bit lengthy to show, that all the visualizations can be built on base of coefficients, provided by our model mm.

Check

Continuous reaction variable (interval scaled) and …
- discrete explanatory variables (nominal scaled) result in parameters that represent means in (sub-)groups and their differences.
- with more than one nominal scaled explanatory variables we have additional differences in means of one expl. var. due to some level in another expl. var.
- with an interval scaled explanatory variable parameters reflect relationships estimation of values in one variable depend on the value (height) in the other variable. This is called interaction.
- in linear relationships (a line) parameters reflect the value of the reaction variable at 0 and the inclination (slope)
- when having multiple continuous explanatory variables they are combined to a single explanatory variable by a linear combination. Parameters reflect the weight of the expl. vars to construct the final multiple explanatory var.
Dichotomous reaction variable (nominal scaled) and …
- continuous explanatory variable(s): parameters describe a logistic function to estimate the probability to be have one of the two possible values (i.e. probability of group membership).

Version: 06 Juli, 2021 14:39

R Models, Parameters, Predictions, and Visualizations

M.Psy.205

Peter Zezula (pzezula@uni-goettingen.de)

parms: Rmd

Some Considerations on Parameters Rmd

Where to find parameters in R models result objects

Examples of models and their parameters

No information to predict a outcome variable

Nominal explanatory variable: We know subgroups, our observations belong to

Descrete explanatory variables (nominal scaled), more than one

Continuous explanatory variable (interval scaled)

We have both, a nominal scaled and a interval scaled explanatory variable

Several interval scaled explanatory variables

We have several of both, nominal scaled and interval scaled explanatory variables

We have data in nested groups (MLM)

Example school

Some points to remember regarding MLM and an example

the above in German

The same effekt fixed or random, categorial explanatory variables

Conclusion

MLM continuous explanatory variable and levels

One continuous xplanatory variable and its fixed and random effects with a categorial level variable

One continuous xplanatory variable and its fixed and random effects with two nested categorial level variables

Visualization of the modelling of one single observation

Check