Bei der logistischen Regression ist die Reaktionsvariable binär. Es wird über intervallskalierte oder kategoriale Erklärvariablen versucht, das Eintreffen bzw. Nicht-Eintreffen eines Zielzustandes zu erklären .

Generalisiertes Lineares Modell

Beim generalisierten linearen Modell kommt eine Link-Funktion für die Reaktionsvariable mit ins Spiel

\[ f(\pi) = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_qx_q \]

glm(formula, family=familytype(link=linkfunction), data=)

Family	Default Link Function
binomial	(link = “logit”)
gaussian	(link = “identity”)
Gamma	(link = “inverse”)
inverse.gaussian	(link = “1/mu^2”)
poisson	(link = “log”)
quasi	(link = “identity”, variance = “constant”)
quasibinomial	(link = “logit”)
quasipoisson	(link = “log”)

Allgemeine Lineare Modelle (ALM) können ausgedrückt werden als GLM mit der Link-Funktion “identity”, also einer Funktion, die schlicht den Eingabewert zurückgibt.

Generalisiertes Lineares Modell: Logistische Regression

Die Reaktionsvariable ist binär (dichotome response variable). Die erklärendenen Variablen sind stetig oder binär (explanatory variables dichotome or continuous).

Als Reaktionsvariable wird nicht die erhobene binäre Variable direkt geschätzt, sondern der Logit der Auftretenswahrscheinlichkeit bzw. die logarithmierten odds der Variablen. Die Wahrscheinlichkeit, der einen oder der anderen Ausprägung anzgehören, kann dann berechnet werden. Die Logit-Parameter stehen im Exponenten einer e-Funktion. Im Exponenten finden wir dann eine “ganz normale” lineare Funktion. Die Logit-Funktion (S-Form) wird benutzt, um die zu schätzende Größe (Auftretenswahrscheinlichkeit) in bestimmten Grenzen zu halten (0-1). Die für diesen Zweck wichtigen Parameter können über eine lineare Gleichung beschrieben werden, trotz der nicht-linearen Form.

Formeln

vgl. [https://md.psych.bio.uni-goettingen.de/mv/unit/glm_logistic/log-reg.pdf]

odds: \[ odds = \frac{p}{1 - p} \]

In der logistischen Funktion wird nicht die Wahrscheinlichkeit des Auftretens einer 1 (Zutreffen) einer Beobachtung direkt berechnet/geschätzt, sondern die logit davon.

Es findet also eine Link-Funktion Anwendung.

\[ L = logit(p) = log(odds) = log(\frac{p}{1 - p}) \]

Umkehrung [Achtung auf den Vorzeichenwechsel im Exponenten]:

\[ p = \frac{e^L}{1 + e^L} = \frac{1}{1 + e^{-L}} \]

Umformung erfolgt durch Multiplikation mit $e^{−x}$

$e^{−x} * e^{x} = 1$

bei einem Vorhersagewert: \[ logit(p) = log(\frac{p}{1 - p}) = \beta_0 + \beta_1x + \epsilon \]

ausgedrückt als Wahrscheinlichkeitsverhältnis (odds) der Wahrscheinlichkeit p zur Gegenwahrscheinlichkeit 1-p

\[ odds = \frac{p}{1 - p} = e^{\beta_0 + \beta_1x + \epsilon} = e^{\beta_0} + e^{\beta_1x} + e^{\epsilon} \]

bei q Vorhersagevariablen:

\[ logit(\pi) = log(\frac{\pi}{1 - \pi}) = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_qx_q = \beta_0 +\sum_{k=1}^{q} {\beta_kx_k} \]

Wald Test

Die geschätzten Parameter lassen sich mit einerm Wald-Test darauf testen, ob sie einen signifikanten Einfluss auf das Kriterium ausüben. Die Wald-Statistik berechnet sich, indem man eine Parameterschätzung durch ihren Standardfehler teilt:

\[ W = \frac{b}{se(b)} ∼ N(0;1) \]

Der W-Wert ist standardnormalverteilt, weshalb er mit einem einfachen z-Test getestet werden kann.

Anmerkungen zur Mathematik

In der Logistischen Regression werden Logits geschätzt. Logit ist der Logarithmus eines Odd. Odd ist das Verhältnis des Eintreffens eines Ereignisses zum Nicht-Eintreffen des Ereignisses, also eine andere Art, Wahrscheinlichkeiten auszudrücken. Bei einem Odd (Eintreffen/Nicht-Eintreffen) von 1:4 oder von 0.25 ist die Wahrscheinlichkeit des Eintreffens = 1/5 oder 0.2 .

exp(x) ist eine andere Schreibweise für $e^x$

log(exp(x)) ist x, log (Logarithmus naturalis) ist die Umkehrfunktion zu exp.

log(0)

## [1] -Inf

log(1)

## [1] 0

# prob of event 1 is high: 0.99
pe <- .99
# odds
pe / (1 - pe)

## [1] 99

# log odds
log(pe/(1-pe))

## [1] 4.59512

# library(psych) knows this as command logit
require(psych)

## Lade nötiges Paket: psych

logit(pe)

## [1] 4.59512

# fiffty fiffty
pe <- .5
cat(paste('odds', pe / (1-pe)))

## odds 1

cat(paste('logit or log(odds)', logit(pe)))

## logit or log(odds) 0

# low
pe <- .1
cat(paste('odds', pe / (1-pe)))

## odds 0.111111111111111

cat(paste('logit or log(odds)', logit(pe)))

## logit or log(odds) -2.19722457733622

# very low
pe <- .000000001
cat(paste('odds', pe / (1-pe)))

## odds 1.000000001e-09

cat(paste('logit or log(odds)', logit(pe)))

## logit or log(odds) -20.7232658359464

# the known linear model is put to the power of base e in logistic regression
odds <- function(x){
  return( x / (1-x))
  }

p	1-p	log(p)	odds(p) = p/(1-p)	logit(p) = log(odds(p))
.99	.01	-0.01	99	4.59
.90	.10	-0.10	9	2.20
.70	.30	-0.36	2.33	0.85
.50	.50	-0.69	1	0
.30	.70	-1.20	.43	-0.84
.10	.90	-2.30	.11	-2.21
.01	.99	-4.60	.01	-4.60

$0 \le p \le 1$ und damit 1-p schwankt zwischen 0 und 1.
$-\infty \le log(p) \le 0$
$0 <= odds(p) \le +\infty$
$-\infty \le logit(p) \le +\infty$

Die Logit-Werte werden negativ, wenn die Wahrscheinlichkeit eines Ereignisses unter .5 fällt.

Die logistische Regression schätzt die logitWerte mit einer linearen Funktion. Für jede Beobachtung können wir über predict() einen persönlichen logitWert berechnen. Wir können die logitWerte in Wahrscheinlichkeiten umrechnen, der Zielgruppe anzugehören, also der mit 1 als Ausprägung.

Dies geht mit p = 1/(1 + exp(-logitWert)) bzw. mit p = exp(logitWert)/(1 + exp(logitWert)).

Einführendes Beispiel

Quelle der Idee: tutorial

Kann das Entdecken im Visual-Attention-Test durch die Leistungen im Stroop-Test vorhergesagt werden?

Der “visual attention test”

Simons Lab

oder: youtube

Zur besseren Veranschaulichung werden die Daten ein wenig frisiert. Die Beobachtungen werden verdoppelt und die Interferenz etwas verstärkt. (Das Vorgehen findet sich ganz unten unter dem Punkt ‘Daten generieren’

Stroop Effect

Quelle

In his experiments, Stroop administered several variations of the same test for which three different kinds of stimuli were created. In the first one, names of colors appeared in black ink. In the second, names of colors appeared in a different ink than the color named. Finally in the third one, there were squares of a given color.

stroop-mat

Original figure 1 from experiment 2 of the article “Studies of interference in serial verbal reactions” (1935) Stroop J.R. Journal of Experimental Psychology 18: 643-622; the original article that described the Stroop effect in English. The diagram shows that it takes less time to name colors of squares (no conflict) than saying the ink color of a when what is written a different color (conflict)

stroop-fig

Es sind viele Varianten im Einsatz. stroop-mat-en

Zu den Stroop-Scores

Quelle: [https://www.sciencedirect.com/science/article/pii/S0887617703001215]

In Golden’s (1978) method, 45 s are given to read each page of color words (W) printed in black ink, color hues (C) printed as XXXX, and color hues printed as competing color words (CW) (e.g., ‘red’ printed in blue ink). W is the number of color words read in 45 s. C is the number of color hues named in 45 s, and CW is the number of color hues named (while ignoring the color words) in 45 s. Each of these tasks has its own normative values and table look-up, but as one can easily see the CW score will be dependent upon the relative strengths of word reading and of color naming.

Golden’s (1978) ipsative interference score is then based upon an assertion that the time to read a CW item is an additive function of the time to read a word plus the time to name a color. The addition of the time to read a word (45/W) and the time to name a color (45/C) gives the algebraically simplified formula of (W×C)/(W+C) for the number of predicted CW items completed in 45 s ( Golden, 1978). In using this function, the investigator thus makes a hidden assumption that the brain adds word reading to color naming processes to produce the results on the CW card.

Code | Erklärung
---  | ---
C    | Anzahl genannter Farben (keine Worte)
W    | Anzahl gelesener Farbworte in schwarz
CW   | Anzahl genannter Farben von Farbworten mit anderer Bedeutung als Druckfarbe (Interferenz)

Inspektion der Korrelations-Matrix

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
head(dd)

##   nr seen   W  C CW d.cw.c    q.cw.c
## 1  1    0 126 86 64    -22 0.7441860
## 2  2    0 118 76 48    -28 0.6315789
## 3  3    0  61 66 38    -28 0.5757576
## 4  4    0  69 48 32    -16 0.6666667
## 5  5    0  57 59 42    -17 0.7118644
## 6  6    0  78 64 53    -11 0.8281250

### a correlation matrix
cor(dd[2:5])

##             seen           W          C        CW
## seen  1.00000000 -0.04511388 0.05742984 0.2107604
## W    -0.04511388  1.00000000 0.43757398 0.3288920
## C     0.05742984  0.43757398 1.00000000 0.6327925
## CW    0.21076037  0.32889199 0.63279254 1.0000000

Die Stroop Leistungen sind moderat positiv korreliert, aber keine scheint mit der Erkennens-Leistung korreliert zu sein, zumindest nicht bedeutsam. Insgesamt ist scheint es hier nicht sonderlich viel zu sehen zu geben.

Modell mit einer erklärenden Variable: Interferenz-Leistung CW

Start mit der CW (Leistung in der Interferenz-Bedingung) als Prädiktor. Die normale lineare Regression ist nur bedingt nützlich. Sie ergibt u. U. vorhergesagte Werte außerhalb des möglichen Bereichs.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# predict seen by CW, classical lm
m.lm <- lm(seen ~ CW, data=dd)
summary(m.lm)

## 
## Call:
## lm(formula = seen ~ CW, data = dd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5924 -0.3966 -0.2716  0.5707  0.7883 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.103663   0.237652  -0.436   0.6637  
## CW           0.010876   0.005148   2.112   0.0372 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4812 on 96 degrees of freedom
## Multiple R-squared:  0.04442,    Adjusted R-squared:  0.03447 
## F-statistic: 4.463 on 1 and 96 DF,  p-value: 0.03724

m.lm$fitted.values[order(m.lm$fitted.values)]

##        67        18        37        74        86        26        75        25 
## 0.1464852 0.2008652 0.2117412 0.2117412 0.2117412 0.2226173 0.2226173 0.2334933 
##         4        53        58         9        56         7        94        42 
## 0.2443693 0.2443693 0.2443693 0.2552453 0.2661213 0.2769973 0.2769973 0.2878733 
##        45        91        19        33         3        64        82        89 
## 0.2878733 0.2878733 0.2878733 0.2987494 0.3096254 0.3096254 0.3096254 0.3096254 
##        40        10        29        59        78        68         8        16 
## 0.3205014 0.3205014 0.3205014 0.3205014 0.3205014 0.3313774 0.3422534 0.3422534 
##        49        98         5        52        54        62        63        11 
## 0.3422534 0.3422534 0.3531294 0.3531294 0.3531294 0.3531294 0.3531294 0.3640054 
##        41        57        90        15        12        20        27        17 
## 0.3640054 0.3640054 0.3640054 0.3748815 0.3857575 0.3857575 0.3857575 0.3966335 
##        34        61        65        83        35        84        87        55 
## 0.3966335 0.3966335 0.3966335 0.3966335 0.4075095 0.4075095 0.4075095 0.4075095 
##        71         2        13        21        31        80        85        14 
## 0.4075095 0.4183855 0.4183855 0.4183855 0.4183855 0.4183855 0.4183855 0.4292615 
##        60        70        36        38        46        95        23        51 
## 0.4292615 0.4292615 0.4292615 0.4292615 0.4292615 0.4292615 0.4401375 0.4401375 
##        76        79        30        66        28        69        77         6 
## 0.4401375 0.4401375 0.4510136 0.4510136 0.4618896 0.4618896 0.4618896 0.4727656 
##        22        48        81        97        73        32        24        44 
## 0.4727656 0.4836416 0.4836416 0.4836416 0.4945176 0.4945176 0.5053936 0.5053936 
##        72        93        43        92         1        50        96        47 
## 0.5053936 0.5053936 0.5706497 0.5815257 0.5924017 0.5924017 0.6032778 0.6141538 
##        88        39 
## 0.6141538 0.6250298

# extreme values leed to impossible probabilities (predicted values)
m.lm$coefficients

## (Intercept)          CW 
## -0.10366318  0.01087601

# very low but possible CW: we divide minimum CW by 10 
cat(m.lm$coefficients[1] + 0.1 * min(dd$CW) * m.lm$coefficients[2])

## -0.07864835

# very high but possible CW: we multiply maximum CW by 2
cat(m.lm$coefficients[1] + 2 * max(dd$CW) * m.lm$coefficients[2])

## 1.353723

Eine Grafik dazu.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = lm(seen ~ CW, data=dd)

# base plot
plot(dd$seen ~ dd$CW, ylim=c(-.5,1.5), xlim=c(-5,110))
abline(m.1$coefficients)

require("ggplot2")

## Lade nötiges Paket: ggplot2

## 
## Attache Paket: 'ggplot2'

## Die folgenden Objekte sind maskiert von 'package:psych':
## 
##     %+%, alpha

# check the fitted values (predicted seen probabilities)
pplot <- ggplot(data=dd, aes(x=CW, y=seen))
pplot + 
  geom_point() + 
  # stat_smooth(method = "lm", se=TRUE) +
  scale_x_continuous(limits = c(1, 140)) +
  geom_abline(intercept = m.1$coefficients[1], slope = m.1$coefficients[2])

# predictions of values below 0 and above 1 are possible

Es ist klar erkennbar, dass die Regressionsgerade Vorhersagen über 1 und unter 0 erlaubt, die nicht möglich sind.

Besser: Logistische Regression

Die Frage, um wieviel besser ein Modell die Reaktionsvariable besser erklärt als ein Vergleichsmodell, kann mit der Modell-Chi-Quadrat-Statistik angegangen werden. Die Differenz zwischen einem und einem anderen Modell wird ausgedrückt. Von der Log-Likelihood des kleineren Modells bzw. des Null-Modells wird die Log-Likelihood des größeren Modells abgezogen und ergibt den Modell-Vergleichs-Chi^2 -Wert. Aus Everitt (2010), S. 119: The lack of fit of a logistic regression model can be measured by a term known as the deviance, which is essentially the ratio of the likelihoods of the model of interest to the saturated model that fits the data perfectly.

Summary

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)

## 
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3581  -0.9977  -0.7969   1.3056   1.7310  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -2.62100    1.08997  -2.405   0.0162 *
## CW           0.04744    0.02324   2.041   0.0412 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 126.45  on 96  degrees of freedom
## AIC: 130.45
## 
## Number of Fisher Scoring iterations: 4

Im Summary:

Formel wird wiederholt
“Deviance Residuals” ist ein Maß für die Güte der Anpassung, also für den Model-Fit. Zunächst sehen wir eine Beschreibung der Verteilung der Deviance-Residuen der Beobachtungen, die in dieser Berechnung verwendet wurden. Wir werden noch sehen, wie der Model-Fit aus der Deviance ermittelt wird.
Danach sehen wir die Koeffizienten, ihre Standard-Errors, die z-Statistik (die manchmal “Wald-Z-Statistik” genannt wird) die zugehörigen p-Werte. “CW” ist auf dem 5% Niveau statistisch signifikant. Die Koeffizienten (Estimate) der logistischen Regression geben uns die Veränderung in logit-Werten (log(odds(p/1-p))) der Outcome-Werte, wenn die Vorhersagevariable um eine Einheit erhöht wird.
- Eine Erhöhung um einen CW-Punkt lässt den Logit-Wert des Erkennens des Gorilla um 0.04 anwachsen.
Unter der Tabelle der Koeffizienten finden sich Fit-Indices, wie die Null-Deviance, die Deviance der Residuen und das AIC. Mit Hilfe dieser Werte können wir den Model-Fit einschätzen (s. u.)

Modellvergleich via `anova()`

Das Berechnen der log-likelihood ratio und das explizite Ermitteln und Prüfen des Chi^2 Wertes kann auch durch den Befehl anova() erledigt werden.

Wie beim Vergleich von lm()-Ergebnisobjekten können zwei genestete glm-Ergebnisobjekte mit dem Befehl anova() verglichen werden. Wird nur ein Modell übergeben, wird das Null-Modell implizit als Vergleichsmodell angenommen. Die Chi^2 Prüfgröße wird über test="Chisq" angefordert.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# fit null model and model CW
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
m.1 = glm(seen ~ CW, family=binomial(), data=dd)

# check model explicitly using anova()
anova(m.0, m.1, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: seen ~ 1
## Model 2: seen ~ CW
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        97     130.88                       
## 2        96     126.45  1   4.4226  0.03547 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
m.1$deviance

## [1] 126.4531

m.1$null.deviance

## [1] 130.8757

anova(m.1, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: seen
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
## NULL                    97     130.88           
## CW    1   4.4226        96     126.45  0.03547 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Deviances und Modellvergleiche damit

Deviance eines Modells ist die log(likelihood) des Modells multipliziert mit -2 Die Multiplikation mit -2 ist eine Konvention. Die Deviances tauchen im Summary als “Null deviance” bzw. “Residual deviance” auf. Der Befehl logLik(), der als Parameter ein glm-Ergebnisobjekt erwartet, gibt uns die log(likelihood(Modell)) des Modells zurück.

Unterschiede in der Deviance können benutzt werden, um genestete Modelle gegeneinander zu testen. Dies erreichen wir, indem wir die Deviances der zu vergleichenden Modelle voneinander abziehen (Chi^2 Wert) sowie die Differenz der jeweiligen Modell-Freiheitsgrade benutzen, um in der Chi^2 Tabelle den zugehörigen p-Wert nachzuschlagen (Funktion: pchisq())

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")

# we can see the deviances in the model summary
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)

## 
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9906  -0.9906  -0.9906   1.3765   1.3765  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.4568     0.2073  -2.203   0.0276 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 130.88  on 97  degrees of freedom
## AIC: 132.88
## 
## Number of Fisher Scoring iterations: 4

# predict seen by CW
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)

## 
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3581  -0.9977  -0.7969   1.3056   1.7310  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -2.62100    1.08997  -2.405   0.0162 *
## CW           0.04744    0.02324   2.041   0.0412 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 126.45  on 96  degrees of freedom
## AIC: 130.45
## 
## Number of Fisher Scoring iterations: 4

# the likelihood of a model is a very small value
exp(logLik(m.1))

## 'log Lik.' 3.475788e-28 (df=2)

# so we usually work with the log of it
logLik(m.1)

## 'log Lik.' -63.22656 (df=2)

# or more common with -2*log likelihood or deviances
# the deviances are the -2LL
logLik(m.0)

## 'log Lik.' -65.43787 (df=1)

-2*logLik(m.0)

## 'log Lik.' 130.8757 (df=1)

logLik(m.1)

## 'log Lik.' -63.22656 (df=2)

-2*logLik(m.1)

## 'log Lik.' 126.4531 (df=2)

# deviances are properties of the model result object
m.0$deviance

## [1] 130.8757

m.1$deviance

## [1] 126.4531

m.1$null.deviance

## [1] 130.8757

m.1.chi <- m.1$null.deviance - m.1$deviance
m.1.chi

## [1] 4.422608

# degrees of freedom are attributes of model-object
m.1.chidf <- m.1$df.null - m.1$df.residual
m.1.chidf

## [1] 1

# get the probability of this chisquare
chisq.prob <- 1 - pchisq(m.1.chi, m.1.chidf)
chisq.prob

## [1] 0.0354658

# anova() does this all for us
anova(m.0, m.1, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: seen ~ 1
## Model 2: seen ~ CW
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        97     130.88                       
## 2        96     126.45  1   4.4226  0.03547 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Prüfung der Koeffizienten

z-statistic kann ansatzweise interpretiert werden wie der t-Wert im lm(). Allerdings hat er vor allem bei großen b-Koeffizienten eine Tendenz zum Fehler Typ II. Potenziell bedeutsame erklärende Variablen werden dann nicht identifiziert.

Die z-Statistik und ihre Prüfung ist zu finden im Summary des Modells.

Wald-Statistik: (z-statistic)^2

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)

## 
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3581  -0.9977  -0.7969   1.3056   1.7310  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -2.62100    1.08997  -2.405   0.0162 *
## CW           0.04744    0.02324   2.041   0.0412 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 126.45  on 96  degrees of freedom
## AIC: 130.45
## 
## Number of Fisher Scoring iterations: 4

R^2 bzw. Pseudo-R^2

Vorschläge zu Analogien von R^2 für die Logistische Regression sind:

Hosmer and Lemeshow R^2
Cox and Snell R^2
Nagelkerke R^2

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# define a function to report R^2 
# cf. Field (2012, p. 334)
pseudoR2 <- function(LogModel){
  dev <- LogModel$deviance
  nullDev <- LogModel$null.deviance
  modelN <- length(LogModel$fitted.values)
  R.l <- 1 - dev / nullDev
  R.cs <- 1 - exp( -(nullDev - dev) / modelN)
  R.n <- R.cs / ( 1 - ( exp (- (nullDev / modelN))))
  cat("Pseudo R^2 for logistic regression\n")
  cat("Hosmer and Lemeshow R^2", round(R.l, 3), "\n")
  cat("Cox and Snell R^2      ", round(R.cs, 3), "\n")
  cat("Nagelkerke R^2         ", round(R.n, 3), "\n")
}
# fit the model
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
# get the parameter tests
pseudoR2(m.1)

## Pseudo R^2 for logistic regression
## Hosmer and Lemeshow R^2 0.034 
## Cox and Snell R^2       0.044 
## Nagelkerke R^2          0.06

# package DescTools offers PseudoR2, take care of capital P
require(DescTools)

## Lade nötiges Paket: DescTools

## 
## Attache Paket: 'DescTools'

## Die folgenden Objekte sind maskiert von 'package:psych':
## 
##     AUC, ICC, SD

DescTools::PseudoR2(m.1)

##   McFadden 
## 0.03379242

DescTools::PseudoR2(m.1, which="Nagelkerke")

## Nagelkerke 
## 0.05987465

Vorhersage der Werte

Der Koeffizient für CW m1.coefficients[1] kann ähnlich interpretiert werden wie im lm(). Er drückt aus, um wieviel sich der logit der Reaktionsvariablen ändert, wenn sich die erklärende Variable um eine Einheit ändert. Logit der Reaktionsvariablen ist der natürliche Logarithmus der odds, dass die Reaktionsvariable 1 wird.

Einsetzen der errechneten Koeffizienten in die Formel der Auftretenswahrscheinlichkeit.

\[ p(y_i) = \frac{1}{1 + e^{-(b_0 + b_1x_{1i})}} \]

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
# coefficients
m.1$coefficients

## (Intercept)          CW 
## -2.62100185  0.04744238

# predict probability to see of first observation
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * dd$CW[1]))))

## 0.6023605

cat(dd$seen[1], dd$CW[1], m.1$fitted.values[1])

## 0 64 0.6023605

# very small value of CW
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 0.1   * min(dd$CW)))))

## 0.07502923

cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 0.001 * min(dd$CW)))))

## 0.06786794

# very high value of CW
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 2 * max(dd$CW)))))

## 0.9767125

cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 5 * max(dd$CW)))))

## 0.9999983

# check the fitted values (predicted probabilities to see the gorilla)
m.1$fitted.values[order(m.1$fitted.values)]

##        67        18        37        74        86        26        75        25 
## 0.1780190 0.2154105 0.2235369 0.2235369 0.2235369 0.2318792 0.2318792 0.2404365 
##         4        53        58         9        56         7        94        19 
## 0.2492071 0.2492071 0.2492071 0.2581889 0.2673791 0.2767744 0.2767744 0.2863708 
##        42        45        91        33         3        64        82        89 
## 0.2863708 0.2863708 0.2863708 0.2961636 0.3061477 0.3061477 0.3061477 0.3061477 
##        10        29        40        59        78        68         8        16 
## 0.3163171 0.3163171 0.3163171 0.3163171 0.3163171 0.3266652 0.3371850 0.3371850 
##        49        98         5        52        54        62        63        11 
## 0.3371850 0.3371850 0.3478684 0.3478684 0.3478684 0.3478684 0.3478684 0.3587072 
##        41        57        90        15        12        20        27        17 
## 0.3587072 0.3587072 0.3587072 0.3696923 0.3808140 0.3808140 0.3808140 0.3920621 
##        34        61        65        83        35        55        71        84 
## 0.3920621 0.3920621 0.3920621 0.3920621 0.4034260 0.4034260 0.4034260 0.4034260 
##        87         2        13        21        31        80        85        14 
## 0.4034260 0.4148945 0.4148945 0.4148945 0.4148945 0.4148945 0.4148945 0.4264560 
##        36        38        46        60        70        95        23        51 
## 0.4264560 0.4264560 0.4264560 0.4264560 0.4264560 0.4264560 0.4380985 0.4380985 
##        76        79        30        66        28        69        77         6 
## 0.4380985 0.4380985 0.4498095 0.4498095 0.4615764 0.4615764 0.4615764 0.4733862 
##        22        48        81        97        32        73        24        44 
## 0.4733862 0.4852259 0.4852259 0.4852259 0.4970823 0.4970823 0.5089419 0.5089419 
##        72        93        43        92         1        50        96        47 
## 0.5089419 0.5089419 0.5794280 0.5909437 0.6023605 0.6023605 0.6136670 0.6248522 
##        88        39 
## 0.6248522 0.6359057

# change in odds, calculated on base of CW = 50 and 51
p.50 <- 1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 50)))
p.51 <- 1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 51)))
# odds.50 is p.50 / (1-p.50)
# odds.51 is p.51 / (1-p.51)
# diff.odds is odds.51/odds.50
cat((p.51 / (1-p.51))/(p.50/(1-p.50)))

## 1.048586

cat(m.1$coefficients[2])

## 0.04744238

cat(exp(m.1$coefficients[2]))

## 1.048586

# log() as inverse of exp()
log(exp(m.1$coefficients[2]))

##         CW 
## 0.04744238

# we get
head(m.1$fitted)

##         1         2         3         4         5         6 
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862

head(m.1$fitted.values)

##         1         2         3         4         5         6 
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862

# we can also use predict() and get the probabilities to see the gorilla, but we have to set type=response
head(predict(m.1, type="response"))

##         1         2         3         4         5         6 
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862

# default is type "link" and returns the log(odds)
head(predict(m.1, type="link"))

##          1          2          3          4          5          6 
##  0.4153103 -0.3437677 -0.8181915 -1.1028458 -0.6284220 -0.1065558

# calculate p for the first observations using the log(odds)
head(1/(1 + exp(-predict(m.1, type="link"))))

##         1         2         3         4         5         6 
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862

Eine Grafik hierzu

Nur der eigentliche Range der Ausgangsdaten. Die S-Form ist erkennbar.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)

# check the fitted values (predicted seen probabilities)
dd.plot <- data.frame(cw.o = dd$CW[order(dd$CW)],
                       cw.p = m.1$fitted.values[order(m.1$fitted.values)])

ggplot(data=dd.plot, aes(x=cw.o, y=cw.p)) + geom_line() + geom_point()

Den angepassten Verlauf über einen größeren Bereich

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)

cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * dd$CW[1]))))

## 0.6023605

dd.plot <- data.frame(
  a.o = -100:200,
  a.p = 1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * -100:200)))
  )
                    
# check the fitted values (predicted seen probabilities)
dd.plot2 <- data.frame(cw.o = dd$CW[order(dd$CW)],
                       cw.p = m.1$fitted.values[order(m.1$fitted.values)])

ggplot(data=dd.plot, aes(x=a.o, y=a.p)) + 
  geom_line() + 
  geom_point(data=dd.plot2, aes(x=cw.o, y=cw.p))

# including the real answers and a cutoff line at p=0.5
ggplot(data=dd.plot[dd.plot$a.o > 0 & dd.plot$a.o < 100,], aes(x=a.o, y=a.p)) + 
  geom_line() + 
  geom_point(data=dd.plot2, aes(x=cw.o, y=cw.p)) +
  geom_point(data=dd, aes(x=CW, y=seen, color=seen)) +
  geom_vline(xintercept = 55, color="red")

Klassifikation der Personen auf Basis der Ergebnisse der Logistischen Regression

Der Befehl table() gibt die Möglichkeit, eine Bedingung mit anzugeben, nach der die Tabelle in zutreffend (‘TRUE’) und nicht zutreffend (‘FALSE’) aufgeteilt wird.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)

cutoff = 0.5
table(m.1$y, fitted(m.1) > cutoff)

##    
##     FALSE TRUE
##   0    56    4
##   1    30    8

Bei 56 Beobachtungen, die den Gorilla nicht sehen, ist der vorhergesagte Wert unter dem cutoff, der Vergleich Wert > cutoff also FALSE. Dies wäre eine korrekte Vorhersage des Nicht-Entdeckens. Für 4 Personen sagen wir das Entdecken vorher, weil der Vorhersagewert über dem Cutoff liegt, die Vorhersage ist aber falsch. Bei 8 Beobachtungen, die den Gorilla gesehen haben, ist liegt der Vorhersagewert über dem Cutoff. Für diese Personen würden wir das Entdecken vorhersagen. Allerdings liegen wir bei 30 Beobachtungen falsch mit unserer Vorhersage.

Mehr als eine erklärende Variable

Der oben erläuterte Grundansatz, logit-Werte durch ein lineares Modell vorherzusagen, funktioniert mit allen anderen Möglichkeiten, die das lineare Modell ansonsten auch bietet.

Alle Basis Leistungsmaße (CW, W, C) in das Modell aufnehmen.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# explicit null model
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)

## 
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9906  -0.9906  -0.9906   1.3765   1.3765  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.4568     0.2073  -2.203   0.0276 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 130.88  on 97  degrees of freedom
## AIC: 132.88
## 
## Number of Fisher Scoring iterations: 4

# predict seen by CW
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)

## 
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3581  -0.9977  -0.7969   1.3056   1.7310  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -2.62100    1.08997  -2.405   0.0162 *
## CW           0.04744    0.02324   2.041   0.0412 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 126.45  on 96  degrees of freedom
## AIC: 130.45
## 
## Number of Fisher Scoring iterations: 4

require(DescTools)
PseudoR2(m.1)

##   McFadden 
## 0.03379242

# predict seen by CW and all derived mesasures of inference
m.2 = glm(seen ~ CW + W + C, family=binomial(), data=dd)
summary(m.2)

## 
## Call:
## glm(formula = seen ~ CW + W + C, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3834  -0.9807  -0.7631   1.1841   1.7874  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.38347    1.39513  -0.992   0.3214  
## CW           0.06983    0.03102   2.251   0.0244 *
## W           -0.01322    0.01337  -0.989   0.3228  
## C           -0.01272    0.02121  -0.600   0.5485  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 124.44  on 94  degrees of freedom
## AIC: 132.44
## 
## Number of Fisher Scoring iterations: 4

PseudoR2(m.2)

##   McFadden 
## 0.04916736

# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
anova(m.2, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: seen
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
## NULL                    97     130.88           
## CW    1   4.4226        96     126.45  0.03547 *
## W     1   1.6457        95     124.81  0.19955  
## C     1   0.3665        94     124.44  0.54491  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# check improvement of expanding model1 to model2
anova(m.1, m.2, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: seen ~ CW
## Model 2: seen ~ CW + W + C
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        96     126.45                     
## 2        94     124.44  2   2.0122   0.3656

abgeleitete Stroop-Variablen: Inhibition-Scores

CW und die daraus abgeleiteten Interferenz-Werte in das Modell aufnehmen. Die Differenz der Farb-Benennleistung und der Interferenzleistung CW - C Der Quotient Farb-Benennleistung und der Interferenzleistung CW / C

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# explicit null model
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)

## 
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9906  -0.9906  -0.9906   1.3765   1.3765  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.4568     0.2073  -2.203   0.0276 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 130.88  on 97  degrees of freedom
## AIC: 132.88
## 
## Number of Fisher Scoring iterations: 4

# predict seen by CW
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)

## 
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3581  -0.9977  -0.7969   1.3056   1.7310  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -2.62100    1.08997  -2.405   0.0162 *
## CW           0.04744    0.02324   2.041   0.0412 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 126.45  on 96  degrees of freedom
## AIC: 130.45
## 
## Number of Fisher Scoring iterations: 4

# predict seen by CW and all derived mesasures of inference
m.2 = glm(seen ~ CW + d.cw.c + q.cw.c, family=binomial(), data=dd)
summary(m.2)

## 
## Call:
## glm(formula = seen ~ CW + d.cw.c + q.cw.c, family = binomial(), 
##     data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4681  -0.9866  -0.8022   1.2512   1.7676  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  1.72871    7.34360   0.235    0.814
## CW           0.07779    0.06436   1.209    0.227
## d.cw.c       0.06721    0.09462   0.710    0.478
## q.cw.c      -6.13666   11.83685  -0.518    0.604
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 125.18  on 94  degrees of freedom
## AIC: 133.18
## 
## Number of Fisher Scoring iterations: 4

# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
anova(m.2, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: seen
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
## NULL                      97     130.88           
## CW      1   4.4226        96     126.45  0.03547 *
## d.cw.c  1   0.9978        95     125.45  0.31785  
## q.cw.c  1   0.2762        94     125.18  0.59923  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# check improvement of expanding model1 to model2
anova(m.1, m.2, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: seen ~ CW
## Model 2: seen ~ CW + d.cw.c + q.cw.c
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        96     126.45                     
## 2        94     125.18  2   1.2739   0.5289

require(DescTools)
DescTools::PseudoR2(m.1)

##   McFadden 
## 0.03379242

DescTools::PseudoR2(m.2)

##   McFadden 
## 0.04352635

Interaktionsterme

Es können auch Interaktionsterme mit eingehen. Alle Rohmaße und die zweifachen Interaktionen in das Modell aufnehmen.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# explicit null model
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)

## 
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9906  -0.9906  -0.9906   1.3765   1.3765  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.4568     0.2073  -2.203   0.0276 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 130.88  on 97  degrees of freedom
## AIC: 132.88
## 
## Number of Fisher Scoring iterations: 4

# predict seen by CW, W, and C
m.1 = glm(seen ~ CW + W + C, family=binomial(), data=dd)
summary(m.1)

## 
## Call:
## glm(formula = seen ~ CW + W + C, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3834  -0.9807  -0.7631   1.1841   1.7874  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.38347    1.39513  -0.992   0.3214  
## CW           0.06983    0.03102   2.251   0.0244 *
## W           -0.01322    0.01337  -0.989   0.3228  
## C           -0.01272    0.02121  -0.600   0.5485  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 124.44  on 94  degrees of freedom
## AIC: 132.44
## 
## Number of Fisher Scoring iterations: 4

# predict seen by CW, W, C and the two way interactions
m.2 = glm(seen ~ CW + W + C + CW:W + CW:C + W:C, family=binomial(), data=dd)
summary(m.2)

## 
## Call:
## glm(formula = seen ~ CW + W + C + CW:W + CW:C + W:C, family = binomial(), 
##     data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7359  -0.9266  -0.6442   1.1180   1.9020  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.917e+01  9.893e+00  -1.938   0.0526 .
## CW          -1.328e-01  2.388e-01  -0.556   0.5782  
## W            2.443e-01  1.059e-01   2.307   0.0211 *
## C            2.645e-01  1.598e-01   1.655   0.0978 .
## CW:W         2.608e-04  2.114e-03   0.123   0.9018  
## CW:C         2.720e-03  1.713e-03   1.588   0.1124  
## W:C         -3.981e-03  1.736e-03  -2.293   0.0218 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 115.47  on 91  degrees of freedom
## AIC: 129.47
## 
## Number of Fisher Scoring iterations: 4

# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
anova(m.2, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: seen
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
## NULL                    97     130.88            
## CW    1   4.4226        96     126.45 0.035466 * 
## W     1   1.6457        95     124.81 0.199547   
## C     1   0.3665        94     124.44 0.544908   
## CW:W  1   1.5055        93     122.94 0.219825   
## CW:C  1   0.8185        92     122.12 0.365624   
## W:C   1   6.6449        91     115.47 0.009944 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# check improvement of expanding model1 to model2
anova(m.1, m.2, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: seen ~ CW + W + C
## Model 2: seen ~ CW + W + C + CW:W + CW:C + W:C
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        94     124.44                       
## 2        91     115.47  3   8.9689  0.02971 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

require(DescTools)
DescTools::PseudoR2(m.1)

##   McFadden 
## 0.04916736

DescTools::PseudoR2(m.2)

##  McFadden 
## 0.1176969

Stepwise

stepwise() (wenn Rcmdr vorhanden ist) oder stepAIC() funktioniert auch bei logistischer Regression.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
require("MASS")

## Lade nötiges Paket: MASS

m.step <- glm(seen ~ W+C+CW, family = binomial, data = dd)
#stepwise(m.step, trace = FALSE)
#Rcmdr::stepwise(m.step, direction="backward/forward", criterion="AIC")
stepAIC(m.step, direction='both', scope = list(upper = ~ W + C + CW, lower = ~ 1))

## Start:  AIC=132.44
## seen ~ W + C + CW
## 
##        Df Deviance    AIC
## - C     1   124.81 130.81
## - W     1   125.45 131.46
## <none>      124.44 132.44
## - CW    1   129.94 135.94
## 
## Step:  AIC=130.81
## seen ~ W + CW
## 
##        Df Deviance    AIC
## - W     1   126.45 130.45
## <none>      124.81 130.81
## + C     1   124.44 132.44
## - CW    1   130.68 134.68
## 
## Step:  AIC=130.45
## seen ~ CW
## 
##        Df Deviance    AIC
## <none>      126.45 130.45
## + W     1   124.81 130.81
## + C     1   125.45 131.46
## - CW    1   130.88 132.88

## 
## Call:  glm(formula = seen ~ CW, family = binomial, data = dd)
## 
## Coefficients:
## (Intercept)           CW  
##    -2.62100      0.04744  
## 
## Degrees of Freedom: 97 Total (i.e. Null);  96 Residual
## Null Deviance:       130.9 
## Residual Deviance: 126.5     AIC: 130.5

#detach("package:MASS", unload=TRUE)

Auch Interaktionsterme können mit auftauchen. Die kompakte model formula mit “*” funktioniert nicht. Die Interaktionsterme müssen explizit mit ‘+’ angehängt werden.

dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
require("MASS")
# run a model with two way interactions included
m.step <- glm(seen ~ W + C + CW + CW:W + CW:C + C:W, family = binomial, data = dd)
summary(m.step)

## 
## Call:
## glm(formula = seen ~ W + C + CW + CW:W + CW:C + C:W, family = binomial, 
##     data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7359  -0.9266  -0.6442   1.1180   1.9020  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.917e+01  9.893e+00  -1.938   0.0526 .
## W            2.443e-01  1.059e-01   2.307   0.0211 *
## C            2.645e-01  1.598e-01   1.655   0.0978 .
## CW          -1.328e-01  2.388e-01  -0.556   0.5782  
## W:CW         2.608e-04  2.114e-03   0.123   0.9018  
## C:CW         2.720e-03  1.713e-03   1.588   0.1124  
## W:C         -3.981e-03  1.736e-03  -2.293   0.0218 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 130.88  on 97  degrees of freedom
## Residual deviance: 115.47  on 91  degrees of freedom
## AIC: 129.47
## 
## Number of Fisher Scoring iterations: 4

#stepwise(m.step, trace = FALSE)
#stepwise(m.step, direction="backward/forward", criterion="AIC")
## Rcmdr::stepwise(m.step, direction="backward/forward", criterion="AIC")
stepAIC(m.step, direction='both', scope = list(upper = ~ W + C + CW + CW:W + CW:C + C:W, lower = ~ 1))

## Start:  AIC=129.47
## seen ~ W + C + CW + CW:W + CW:C + C:W
## 
##        Df Deviance    AIC
## - W:CW  1   115.49 127.49
## <none>      115.47 129.47
## - C:CW  1   118.09 130.09
## - W:C   1   122.12 134.12
## 
## Step:  AIC=127.49
## seen ~ W + C + CW + C:CW + W:C
## 
##        Df Deviance    AIC
## <none>      115.49 127.49
## - C:CW  1   118.19 128.19
## + W:CW  1   115.47 129.47
## - W:C   1   124.34 134.34

## 
## Call:  glm(formula = seen ~ W + C + CW + C:CW + W:C, family = binomial, 
##     data = dd)
## 
## Coefficients:
## (Intercept)            W            C           CW         C:CW          W:C  
##  -19.548108     0.248176     0.253177    -0.107634     0.002739    -0.003872  
## 
## Degrees of Freedom: 97 Total (i.e. Null);  92 Residual
## Null Deviance:       130.9 
## Residual Deviance: 115.5     AIC: 127.5

Ausflug: Auswertung mit den Original-Daten

Quelle:

Im Original-Beispiel kamen nur halb so viele Probanden wie in den obigen Beispielen zum Einsatz.

# read data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att.txt")
# create inhibition score
dd['d.cw.c'] <- dd$CW - dd$C
dd['q.cw.c'] <- dd$CW / dd$C

# some descriptive impression
require(psych)
describe(dd)

##        vars  n   mean    sd median trimmed   mad    min   max range  skew
## nr        1 49  25.00 14.29  25.00   25.00 17.79   1.00  49.0 48.00  0.00
## seen      2 49   0.39  0.49   0.00    0.37  0.00   0.00   1.0  1.00  0.45
## W         3 49  99.82 18.89 100.00   99.98 17.79  57.00 156.0 99.00  0.05
## C         4 49  74.39 14.50  73.00   73.93 14.83  46.00 112.0 66.00  0.41
## CW        5 49  47.14  8.92  48.00   47.05  8.90  28.00  67.0 39.00  0.00
## d.cw.c    6 49 -27.24 11.10 -24.00  -26.44  8.90 -63.00  -5.0 58.00 -0.90
## q.cw.c    7 49   0.64  0.10   0.65    0.64  0.09   0.43   0.9  0.47 -0.15
##        kurtosis   se
## nr        -1.27 2.04
## seen      -1.84 0.07
## W          0.40 2.70
## C          0.00 2.07
## CW        -0.24 1.27
## d.cw.c     0.71 1.59
## q.cw.c     0.15 0.01

# full model, only raw scores
m.a = glm(seen ~ W * C * CW, family=binomial(logit), data=dd)
summary(m.a)

## 
## Call:
## glm(formula = seen ~ W * C * CW, family = binomial(logit), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8073  -0.9897  -0.5740   1.2368   1.7362  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.323e+02  8.037e+01  -1.646   0.0998 .
## W            1.316e+00  7.514e-01   1.751   0.0799 .
## C            2.129e+00  1.215e+00   1.753   0.0797 .
## CW           2.206e+00  1.659e+00   1.329   0.1837  
## W:C         -2.128e-02  1.140e-02  -1.866   0.0621 .
## W:CW        -2.201e-02  1.530e-02  -1.439   0.1502  
## C:CW        -3.582e-02  2.413e-02  -1.485   0.1376  
## W:C:CW       3.579e-04  2.225e-04   1.608   0.1078  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 65.438  on 48  degrees of freedom
## Residual deviance: 57.281  on 41  degrees of freedom
## AIC: 73.281
## 
## Number of Fisher Scoring iterations: 5

# check full model (Null-Model is automatically generated)
anova(m.a, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: seen
## 
## Terms added sequentially (first to last)
## 
## 
##        Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
## NULL                      48     65.438           
## W       1   0.0755        47     65.362  0.78351  
## C       1   0.3099        46     65.052  0.57775  
## CW      1   0.1061        45     64.946  0.74467  
## W:C     1   2.3632        44     62.583  0.12423  
## W:CW    1   0.5681        43     62.015  0.45103  
## C:CW    1   1.4290        42     60.586  0.23193  
## W:C:CW  1   3.3053        41     57.281  0.06906 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# full model, only CW and dependent scores
m.a = glm(seen ~ CW * d.cw.c * q.cw.c , family=binomial(logit), data=dd)
summary(m.a)

## 
## Call:
## glm(formula = seen ~ CW * d.cw.c * q.cw.c, family = binomial(logit), 
##     data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7960  -0.8733  -0.7998   1.2038   1.5481  
## 
## Coefficients: (1 not defined because of singularities)
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)      -12.49993   35.24189  -0.355    0.723
## CW                -2.00115    1.48911  -1.344    0.179
## d.cw.c            -1.26358    1.15364  -1.095    0.273
## q.cw.c            23.07736   52.91598   0.436    0.663
## CW:d.cw.c          0.02908    0.02547   1.141    0.254
## CW:q.cw.c          1.73861    1.17523   1.479    0.139
## d.cw.c:q.cw.c           NA         NA      NA       NA
## CW:d.cw.c:q.cw.c  -0.05223    0.04294  -1.216    0.224
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 65.438  on 48  degrees of freedom
## Residual deviance: 61.030  on 42  degrees of freedom
## AIC: 75.03
## 
## Number of Fisher Scoring iterations: 4

# check full model (Null-Model is automatically generated)
anova(m.a, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: seen
## 
## Terms added sequentially (first to last)
## 
## 
##                  Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL                                48     65.438         
## CW                1  0.19491        47     65.243   0.6589
## d.cw.c            1  0.01587        46     65.227   0.8997
## q.cw.c            1  1.20412        45     64.023   0.2725
## CW:d.cw.c         1  0.65144        44     63.372   0.4196
## CW:q.cw.c         1  0.78125        43     62.590   0.3768
## d.cw.c:q.cw.c     0  0.00000        43     62.590         
## CW:d.cw.c:q.cw.c  1  1.56062        42     61.030   0.2116

Ausflug: Daten “verlängern”

Die Originaldaten wurden etwas “frisiert”. Zu den Originaldaten wurden ‘passende’ Beobachtungen hinzugefügt um die Power zu erhöhen. Für Interessierte hier das Vorgehen:

# double the observations with a little noise
dd.n <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att.txt")
dd.n$nr <- dd.n$nr + length(dd.n$nr)
dd.n$W  <- round(dd.n$W  + rnorm(length(dd.n$nr), 0, sd(dd.n$W)/10))
dd.n$C  <- round(dd.n$C  + rnorm(length(dd.n$nr), 0, sd(dd.n$C)/10))
dd.n$CW <- round(dd.n$CW + rnorm(length(dd.n$nr), 0, sd(dd.n$CW)/10))

# 'double' the observations
dd <- rbind(read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att.txt"), dd.n)

# give CW a little more predictive power
blind <- dd$nr[dd$seen == 0]
s.blind <- sample(blind, length(blind)/2)
# dd[s.blind,]
# dd[s.blind, c("nr", "CW", "seen")]
dd$CW[s.blind] <- round(dd$CW[s.blind] - sd(dd$CW) / 1.5)
# dd[s.blind, c("nr","CW", "seen")]

# create inhibition scores
dd['d.cw.c'] <- dd$CW - dd$C
dd['q.cw.c'] <- dd$CW / dd$C

# a descriptive impression
require(psych)
describe(dd)

##        vars  n   mean    sd median trimmed   mad    min   max range  skew
## nr        1 98  49.50 28.43  49.50   49.50 36.32   1.00  98.0 97.00  0.00
## seen      2 98   0.39  0.49   0.00    0.36  0.00   0.00   1.0  1.00  0.45
## W         3 98  99.91 18.85 100.00  100.12 18.53  57.00 156.0 99.00  0.06
## C         4 98  74.59 14.37  73.00   74.03 14.83  46.00 113.0 67.00  0.46
## CW        5 98  45.16  9.11  45.00   44.95  8.90  26.00  67.0 41.00  0.24
## d.cw.c    6 98 -29.43 11.36 -27.00  -28.69  8.90 -63.00  -5.0 58.00 -0.72
## q.cw.c    7 98   0.61  0.10   0.61    0.61  0.10   0.41   0.9  0.49  0.17
##        kurtosis   se
## nr        -1.24 2.87
## seen      -1.81 0.05
## W          0.37 1.90
## C          0.08 1.45
## CW        -0.16 0.92
## d.cw.c     0.18 1.15
## q.cw.c    -0.09 0.01

# save the data
##write.table(dd, "stroop_vis_att_double.txt", quote=F, sep="\t", row.names=F)

Ausflug: Kodierung und Faktorstufen

Je nach Kodierung und Festlegung der Faktorstufen sind die Interpretationen der Koeffizienten verschieden. Im Prinzip wird nur die erste Faktorstufe als das Event definiert, dessen Eintreten bzw. dessen log(odds) modelliert werden soll. Ein Vertauschen der beiden Events invertiert lediglich das Vorzeichen der Koeffizienten.

require(tidyverse)

## Lade nötiges Paket: tidyverse

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ tibble  3.1.6     ✔ dplyr   1.0.8
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ✔ purrr   0.3.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%()   masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ dplyr::select()  masks MASS::select()

dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")

## Rows: 400 Columns: 8

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dd <- dd %>% 
  rowwise() %>% # wir wollen zeilenweise vorgehen
  mutate(stats_joy = mean(c(stats_joy1, stats_joy2, # mittelwert bilden
                            stats_joy3, stats_joy4, 
                            stats_joy5)))
dd <- dd %>% mutate(aon_a = factor(attend_or_not, levels = c("course attended", "course not attended")))
dd <- dd %>% mutate(aon_n = factor(attend_or_not, levels = c("course not attended", "course attended")))

head(dd$aon_a)

## [1] course attended     course attended     course not attended
## [4] course attended     course attended     course attended    
## Levels: course attended course not attended

head(dd$aon_n)

## [1] course attended     course attended     course not attended
## [4] course attended     course attended     course attended    
## Levels: course not attended course attended

m.a <- glm(aon_a ~ stats_joy, data = dd, family = binomial())
m.n <- glm(aon_n ~ stats_joy, data = dd, family = binomial())

head(model.matrix(m.a))

##   (Intercept) stats_joy
## 1           1       6.6
## 2           1       4.6
## 3           1       3.8
## 4           1       4.0
## 5           1       6.0
## 6           1       5.6

head(model.matrix(m.n))

##   (Intercept) stats_joy
## 1           1       6.6
## 2           1       4.6
## 3           1       3.8
## 4           1       4.0
## 5           1       6.0
## 6           1       5.6

summary(m.a)

## 
## Call:
## glm(formula = aon_a ~ stats_joy, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8789  -0.6735  -0.3730  -0.1323   2.4889  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   3.6812     0.5830   6.315 2.71e-10 ***
## stats_joy    -1.0519     0.1301  -8.085 6.21e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 433.82  on 399  degrees of freedom
## Residual deviance: 337.01  on 398  degrees of freedom
## AIC: 341.01
## 
## Number of Fisher Scoring iterations: 5

summary(m.n)

## 
## Call:
## glm(formula = aon_n ~ stats_joy, family = binomial(), data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4889   0.1323   0.3730   0.6735   1.8789  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.6812     0.5830  -6.315 2.71e-10 ***
## stats_joy     1.0519     0.1301   8.085 6.21e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 433.82  on 399  degrees of freedom
## Residual deviance: 337.01  on 398  degrees of freedom
## AIC: 341.01
## 
## Number of Fisher Scoring iterations: 5

Beispiele, Aufgaben

Everitt GHQ

Vollziehen Sie das in log-reg.pdf dargestellte Rechenbeispiel für die GHQ-Daten aus everitt nach. Die Daten und das Beispiel entstammen dem Buch Everitt (2010) In den Beispielen findet sich das in Everitt hierzu vorgestellte GHQ-Beispiel.

UCLA-Zulassung

UCLA (University of California, Los Angeles)

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don’t admit, is a binary variable.

data

Variable	Bedeutung
admit	Zulassung an UCLA (binär)
gre	graduate record exmainations (Zulassungsprüfungspunkte)
gpa	grade point average (in etwa Notendurchschnitt: hier nach descriptives mit Maximum 4 (Bestnote) (A=4, B=3, C=2, D=1, F=0))
rank	Bewertung der Qualtiät Institution an der man vorher gelernt hat (vier Stufen, 1..4))

Fragen:

Wie gut kann die Zulassung zur UCLA durch die Zulassungs-Prüfungspunkte, den Notendurchschnitt und die Qualität der Herkunftsinstitution vorhergesagt werden?
Welche Prädiktoren und Kombinationen hieraus sind möglich?
Welche Prädiktoren und Kombinationen hieraus können wir statistisch gegen Zufall absichern?
Wie können die Koeffizienten interpretiert werden?
Ist die 4-stufige Klassifikation der Herkunfts-Institution intervallskaliert? Wie können wir vorgehen, wenn wir das nicht als gegeben ansehen?

#df <- read.csv("https://www.ats.ucla.edu/stat/dataa/binary.csv")
df <- read.csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/example_ucla.csv")

cf example_ucla.html

Beispiel Diabetes

glm() with aggregated data

source

If we have frequencies we can also fit a glm(). We need a column with proportion and a column with total number of observations for every combination of predictor values.

glm(..., family=binomial, weights=...)

dd.t <- readr::read_table("y     x1        x2     freq
0     No        0      268
0     No        1       14
0    Yes        0      109
0    Yes        1        1
1     No        0       31
1     No        1        6
1    Yes        0       45
1    Yes        1        6
")
total<-with(dd.t,freq[1:4]+freq[5:8])
logit=glm(freq/total~x1+x2, data=dd.t[1:4,], family=binomial, weights=total)
# pearson chi square value
chi_test<-sum(residuals(logit, type = "pearson")^2);

# so the data are finally
dd <- cbind(dd.t[1:4,], total)
logit2=glm(freq/total~x1+x2, data=dd, family=binomial, weights=total)
summary(logit2)

## 
## Call:
## glm(formula = freq/total ~ x1 + x2, family = binomial, data = dd, 
##     weights = total)
## 
## Deviance Residuals: 
##       1        2        3        4  
## -0.2307   0.5677   0.2119  -1.0507  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.2010     0.1892  11.632  < 2e-16 ***
## x1Yes        -1.3538     0.2516  -5.381 7.39e-08 ***
## x2           -1.6261     0.4313  -3.770 0.000163 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41.9209  on 3  degrees of freedom
## Residual deviance:  1.5244  on 1  degrees of freedom
## AIC: 23.149
## 
## Number of Fisher Scoring iterations: 3

more generic

Links und Referenzen

Idee für das einführende Beispiel: [https://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html]

example including gorilla experiment

Field (2012, p. 312) Kapitel 8: Logistic Regression

Check

Reaktionsvariable ist dichotom bzw. binär.

Logistische Regression ist ein Mitglied aus der Familie der Verallgemeinerten Linearen Modelle

Bedeutung der Koeffizienten

geschätzt werden die logits der Wahrscheinlichkeit der Zielgruppe anzugehören bzw die logarithmierten Chancen log(odds).
odds lassen sich in Wahrscheinlichkeiten umrechnen, der Zielgruppe anzugehören 1/(1 + exp(-predict(m.1, type="link")))
Interpretation exp(model$coefficients) ist: Zu welcher Änderung der Chance (odds) führt eine Erhöhung des Prädiktors um 1. Die Roh-Koeffizienten in model$coefficients beziehen sich also auf die logarithmierten odds.

Unterschied fitted und predict

fitted gibt Wahrscheinlichkeiten aus, zur Zielgruppe zu gehören. predict hat default predict(type='link') und gibt die Chancen aus (odds). Mit predict(type='response') erhalten wir direkt die Wahrscheinlichkeiten, wie mit fitted.

Beispiel unter glm_logistic.html#Vorhersage_der_Werte

Modellvergleiche

Wir vergleichen Modelle mit Maximum Liklihood geschätzten Parametern Deren Model-Fit wird u. a. ausgedrückt durch auf der Likelihood basierende Kenngrößen -2LL bzw. deviances. Wir benutzen anova(..., test = "Chisq") da die Likelihood-Unterschiede Chi^2 verteilt sind. Dies müssen wir im Befehl anova() mit dem Parameter test = "Chisq" angeben.

p und logit Werte

$p(y_i) = \frac{1}{1 + e^{-(b_0 + b_1x_{1i})}}$

library(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")

## Rows: 400 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dd <- dd %>% rowwise() %>% mutate(stats_joy = mean(c(stats_joy1, stats_joy2, stats_joy3, stats_joy4, stats_joy5)))
dd <- dd %>% mutate(attend_or_not.f = factor(attend_or_not, levels = c("course not attended", "course attended")))
dd <- dd %>% mutate(att.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.0 <- glm(att.bin ~ 1, data = dd, family = binomial())
m.1 <- glm(att.bin ~ stats_joy, data = dd, family = binomial())
m.1$coefficients

## (Intercept)   stats_joy 
##   -3.681209    1.051926

exp(m.1$coefficients)

## (Intercept)   stats_joy 
##  0.02519251  2.86315905

# we use our model to see the change in the outcome that is produced by 1 step in the predictor
# the difference is expressed in logit(odds) units.
# we define some data
new.d <- data.frame(stats_joy=c(1, 2, 3, 4))
# and predict the values
predict(m.1, newdata = new.d) -> n.d
n.d

##          1          2          3          4 
## -2.6292831 -1.5773575 -0.5254319  0.5264937

n.d[1] - n.d[2]

##         1 
## -1.051926

n.d[2] - n.d[3]

##         2 
## -1.051926

n.d[3] - n.d[4]

##         3 
## -1.051926

# logit(p) or log(odds) or log(p/1-p) change by 1.0519 when the average joy on statistics increases by 1

# psych package offers commands logit() and logistic() to convert probabilities to logits and vice versa
# logistic function (1/(1+exp(-x)) # logits given, probabilities result
# logit function (log(p/(1-p))     # probabilities given, logits result

# so we can calculate the actual probabilities given the logits
ll <- predict(m.1, type="link")
head(ll)

##         1         2         3         4         5         6 
## 3.2615002 1.1576490 0.3161086 0.5264937 2.6303448 2.2095746

head(psych::logistic(ll))

##         1         2         3         4         5         6 
## 0.9630842 0.7609053 0.5783756 0.6286649 0.9327892 0.9011060

pp <- 1/(1 + exp(-predict(m.1, type="link")))
head(pp)

##         1         2         3         4         5         6 
## 0.9630842 0.7609053 0.5783756 0.6286649 0.9327892 0.9011060

pp <- (predict(m.1, type="response"))
head(pp)

##         1         2         3         4         5         6 
## 0.9630842 0.7609053 0.5783756 0.6286649 0.9327892 0.9011060

head(psych::logit(pp))

##         1         2         3         4         5         6 
## 3.2615002 1.1576490 0.3161086 0.5264937 2.6303448 2.2095746

Modellvergleiche, Deviance und Likelihood, PseudoR^2

vgl

library(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")

## Rows: 400 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dd <- dd %>% rowwise() %>% mutate(stats_joy = mean(c(stats_joy1, stats_joy2, stats_joy3, stats_joy4, stats_joy5)))
dd <- dd %>% mutate(attend_or_not.f = factor(attend_or_not, levels = c("course not attended", "course attended")))
dd <- dd %>% mutate(att.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.0 <- glm(att.bin ~ 1, data = dd, family = binomial())
m.1 <- glm(att.bin ~ stats_joy, data = dd, family = binomial())
# we want to see the likelihood
exp(logLik(m.1))

## 'log Lik.' 6.588736e-74 (df=2)

# this is very small and not easy to handle. So we use log(likelihood) usually.
logLik(m.1)

## 'log Lik.' -168.5059 (df=2)

# or even more common -2LL
-2*logLik(m.1)

## 'log Lik.' 337.0119 (df=2)

# by the way, this is, what we call deviance
m.1$deviance

## [1] 337.0119

# ... and it is
sum(residuals(m.1)^2)

## [1] 337.0119

# in model comparison we compare it with a model that uses a constant as predictor
# m.0 <- glm(attend_or_not ~ 1, data = stats_lectures, family = binomial())
m.0$deviance

## [1] 433.8236

# this is done automatically and saved to an attribute in the model
m.1$null.deviance

## [1] 433.8236

# we compare these deviances to see, whether the expanded model is better
anova(m.0, m.1, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: att.bin ~ 1
## Model 2: att.bin ~ stats_joy
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       399     433.82                          
## 2       398     337.01  1   96.812 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# AIC is just a modified -2LL, where the number of parameters punish the model fit
AIC(m.1)

## [1] 341.0119

m.1$deviance + length(m.1$coefficients) * 2

## [1] 341.0119

AIC(m.0)

## [1] 435.8236

# BIC accounts for sample size on top of AIC
BIC(m.1)

## [1] 348.9948

m.1$deviance + length(m.1$coefficients) * log(nrow(dd))

## [1] 348.9948

BIC(m.0)

## [1] 439.8151

# the smaller AIC or BIC, the better data are fitted by the model when comparing hierarchical models

# package DescTools offers PseudoR2, take care of capital P
DescTools::PseudoR2(m.0, which="Nagelkerke")

## Nagelkerke 
##          0

DescTools::PseudoR2(m.1, which="Nagelkerke")

## Nagelkerke 
##  0.3247483

# and the real check of correctly identified attending students
cutoff = 0.5
table(dd$att.bin, fitted(m.1) > cutoff)

##    
##     FALSE TRUE
##   0    35   58
##   1    17  290

chisq.test(table(dd$att.bin, fitted(m.1) > cutoff))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(dd$att.bin, fitted(m.1) > cutoff)
## X-squared = 62.21, df = 1, p-value = 3.087e-15

AIC

AIC ist eine Modifikation von -2LL. Bestraft wird für die Aufnahme von Parametern

$AIC = -2LL + k * 2$ k ist die Anzahl der Prädiktoren

AIC(m.1): 341.0118708 = m.1$deviance + 2*2: 341.0118708

BIC

Gegenüber AIC bezieht BIC auch noch die Stichprobengröße in die Bewertung des Modells mit ein

BIC = -2LL + k * log(n) $BIC = -2LL + k * log(n-1)$ n ist die Stichprobengröße

Bei hierarchischen Modellen ist das Modell mit dem geringeren AIC bzw. BIC das bessere Modell, also das Modell, das weniger “unerklärte Varianz” in der Reaktionsvariable übriglässt.

Graphics static

library(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")

## Rows: 400 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dd <- dd %>% mutate(attend.bin = ifelse(attend_or_not == "course attended", 1, 0))
dd <- dd %>% rowwise() %>% 
  mutate(stats_joy = mean(c(stats_joy1, stats_joy2, stats_joy3, stats_joy4, stats_joy5)))
dd <- dd %>% mutate(attend_or_not = factor(attend_or_not, levels = c("course not attended", "course attended")))
dd <- dd %>% mutate(attend.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.1 <- glm(attend.bin ~ stats_joy, data = dd, family = binomial())

# this can be plotted by a 2 dimensional plot
ggplot(dd, aes(x=stats_joy, y=fitted(m.1))) + geom_point()

# complexer models might be visualized by 3d plots
m.4 <- glm(attend.bin ~ stats_joy + temperature + 
                sun_joy + temperature:sun_joy, 
              data = dd, family = binomial())
dd$fitted <- fitted(m.4)

require(rockchalk)

## Lade nötiges Paket: rockchalk
## 
## Attache Paket: 'rockchalk'
## 
## Das folgende Objekt ist maskiert 'package:dplyr':
## 
##     summarize
## 
## Das folgende Objekt ist maskiert 'package:MASS':
## 
##     mvrnorm

plotPlane(model = m.4, plotx1 = "temperature", plotx2 = "sun_joy")

plotPlane(model = m.4, plotx1 = "temperature", plotx2 = "sun_joy", theta = 30, phi =30)

Graphics interactive

Graph is not rendered when document is knitted, this makes no sense. We therefore set eval = FALSE in the R chunk header.

librry(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")
dd <- dd %>% mutate(attend.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.i <- glm(attend.bin ~ temperature + sun_joy + temperature:sun_joy, data = dd, family = binomial())

require(rgl)
open3d()
plot3d(x=dd$temperature, y=dd$sun_joy, z=dd$attend.bin, col="red", size=4)

grd <- expand.grid(temperature = sort(unique(dd$temperature)), sun_joy = sort(unique(dd$sun_joy)) )
# predicted values are logits, we have to convert them to probabilities that we want to display
grd$pred <- 1/(1 + exp(-predict(m.i, type="link", newdata = grd)))

persp3d(x=unique(grd[[1]]), y=unique(grd[[2]]), 
              z=matrix(grd[[3]],length(unique(grd[[1]])),length(unique(grd[[2]]))), add=TRUE)

Screencast

Logistic regression - check - concepts and models StudIP - ownCloud
Logistic regression - check - possibilities of visualization StudIP - ownCloud

Version: 09 Juni, 2022 10:10

Logistische Regression

M.Psy.205, Erklärungen und Beispiele

Peter Zezula (pzezula@uni-goettingen.de)