Bei der logistischen Regression ist die Reaktionsvariable binär. Es wird über intervallskalierte oder kategoriale Erklärvariablen versucht, das Eintreffen bzw. Nicht-Eintreffen eines Zielzustandes zu erklären .
Beim generalisierten linearen Modell kommt eine Link-Funktion für die Reaktionsvariable mit ins Spiel
\[ f(\pi) = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_qx_q \]
glm(formula, family=familytype(link=linkfunction), data=)
Family | Default Link Function |
---|---|
binomial | (link = “logit”) |
gaussian | (link = “identity”) |
Gamma | (link = “inverse”) |
inverse.gaussian | (link = “1/mu^2”) |
poisson | (link = “log”) |
quasi | (link = “identity”, variance = “constant”) |
quasibinomial | (link = “logit”) |
quasipoisson | (link = “log”) |
Allgemeine Lineare Modelle (ALM) können ausgedrückt werden als GLM mit der Link-Funktion “identity”, also einer Funktion, die schlicht den Eingabewert zurückgibt.
Die Reaktionsvariable ist binär (dichotome response variable). Die erklärendenen Variablen sind stetig oder binär (explanatory variables dichotome or continuous).
Als Reaktionsvariable wird nicht die erhobene binäre Variable direkt geschätzt, sondern der Logit der Auftretenswahrscheinlichkeit bzw. die logarithmierten odds der Variablen. Die Wahrscheinlichkeit, der einen oder der anderen Ausprägung anzgehören, kann dann berechnet werden. Die Logit-Parameter stehen im Exponenten einer e-Funktion. Im Exponenten finden wir dann eine “ganz normale” lineare Funktion. Die Logit-Funktion (S-Form) wird benutzt, um die zu schätzende Größe (Auftretenswahrscheinlichkeit) in bestimmten Grenzen zu halten (0-1). Die für diesen Zweck wichtigen Parameter können über eine lineare Gleichung beschrieben werden, trotz der nicht-linearen Form.
vgl. [https://md.psych.bio.uni-goettingen.de/mv/unit/glm_logistic/log-reg.pdf]
odds: \[ odds = \frac{p}{1 - p} \]
In der logistischen Funktion wird nicht die Wahrscheinlichkeit des Auftretens einer 1 (Zutreffen) einer Beobachtung direkt berechnet/geschätzt, sondern die logit davon.
Es findet also eine Link-Funktion Anwendung.
\[ L = logit(p) = log(odds) = log(\frac{p}{1 - p}) \]
Umkehrung [Achtung auf den Vorzeichenwechsel im Exponenten]:
\[ p = \frac{e^L}{1 + e^L} = \frac{1}{1 + e^{-L}} \]
Umformung erfolgt durch Multiplikation mit \(e^{−x}\)
\(e^{−x} * e^{x} = 1\)
bei einem Vorhersagewert: \[ logit(p) = log(\frac{p}{1 - p}) = \beta_0 + \beta_1x + \epsilon \]
ausgedrückt als Wahrscheinlichkeitsverhältnis (odds) der Wahrscheinlichkeit p zur Gegenwahrscheinlichkeit 1-p
\[ odds = \frac{p}{1 - p} = e^{\beta_0 + \beta_1x + \epsilon} = e^{\beta_0} + e^{\beta_1x} + e^{\epsilon} \]
bei q Vorhersagevariablen:
\[ logit(\pi) = log(\frac{\pi}{1 - \pi}) = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_qx_q = \beta_0 +\sum_{k=1}^{q} {\beta_kx_k} \]
Die geschätzten Parameter lassen sich mit einerm Wald-Test darauf testen, ob sie einen signifikanten Einfluss auf das Kriterium ausüben. Die Wald-Statistik berechnet sich, indem man eine Parameterschätzung durch ihren Standardfehler teilt:
\[ W = \frac{b}{se(b)} ∼ N(0;1) \]
Der W-Wert ist standardnormalverteilt, weshalb er mit einem einfachen z-Test getestet werden kann.
In der Logistischen Regression werden Logits geschätzt. Logit ist der Logarithmus eines Odd. Odd ist das Verhältnis des Eintreffens eines Ereignisses zum Nicht-Eintreffen des Ereignisses, also eine andere Art, Wahrscheinlichkeiten auszudrücken. Bei einem Odd (Eintreffen/Nicht-Eintreffen) von 1:4 oder von 0.25 ist die Wahrscheinlichkeit des Eintreffens = 1/5 oder 0.2 .
exp(x)
ist eine andere Schreibweise für \(e^x\)
log(exp(x))
ist x, log (Logarithmus naturalis) ist die
Umkehrfunktion zu exp.
log(0)
## [1] -Inf
log(1)
## [1] 0
# prob of event 1 is high: 0.99
pe <- .99
# odds
pe / (1 - pe)
## [1] 99
# log odds
log(pe/(1-pe))
## [1] 4.59512
# library(psych) knows this as command logit
require(psych)
## Lade nötiges Paket: psych
logit(pe)
## [1] 4.59512
# fiffty fiffty
pe <- .5
cat(paste('odds', pe / (1-pe)))
## odds 1
cat(paste('logit or log(odds)', logit(pe)))
## logit or log(odds) 0
# low
pe <- .1
cat(paste('odds', pe / (1-pe)))
## odds 0.111111111111111
cat(paste('logit or log(odds)', logit(pe)))
## logit or log(odds) -2.19722457733622
# very low
pe <- .000000001
cat(paste('odds', pe / (1-pe)))
## odds 1.000000001e-09
cat(paste('logit or log(odds)', logit(pe)))
## logit or log(odds) -20.7232658359464
# the known linear model is put to the power of base e in logistic regression
odds <- function(x){
return( x / (1-x))
}
p | 1-p | log(p) | odds(p) = p/(1-p) | logit(p) = log(odds(p)) |
---|---|---|---|---|
.99 | .01 | -0.01 | 99 | 4.59 |
.90 | .10 | -0.10 | 9 | 2.20 |
.70 | .30 | -0.36 | 2.33 | 0.85 |
.50 | .50 | -0.69 | 1 | 0 |
.30 | .70 | -1.20 | .43 | -0.84 |
.10 | .90 | -2.30 | .11 | -2.21 |
.01 | .99 | -4.60 | .01 | -4.60 |
Die Logit-Werte werden negativ, wenn die Wahrscheinlichkeit eines Ereignisses unter .5 fällt.
Die logistische Regression schätzt die logitWerte mit einer linearen
Funktion. Für jede Beobachtung können wir über predict()
einen persönlichen logitWert berechnen. Wir können die logitWerte in
Wahrscheinlichkeiten umrechnen, der Zielgruppe anzugehören, also der mit
1 als Ausprägung.
Dies geht mit p = 1/(1 + exp(-logitWert))
bzw. mit
p = exp(logitWert)/(1 + exp(logitWert))
.
Quelle der Idee: tutorial
Kann das Entdecken im Visual-Attention-Test durch die Leistungen im Stroop-Test vorhergesagt werden?
Der “visual attention test”
oder: youtube
Zur besseren Veranschaulichung werden die Daten ein wenig frisiert. Die Beobachtungen werden verdoppelt und die Interferenz etwas verstärkt. (Das Vorgehen findet sich ganz unten unter dem Punkt ‘Daten generieren’
In his experiments, Stroop administered several variations of the same test for which three different kinds of stimuli were created. In the first one, names of colors appeared in black ink. In the second, names of colors appeared in a different ink than the color named. Finally in the third one, there were squares of a given color.
stroop-mat
Original figure 1 from experiment 2 of the article “Studies of interference in serial verbal reactions” (1935) Stroop J.R. Journal of Experimental Psychology 18: 643-622; the original article that described the Stroop effect in English. The diagram shows that it takes less time to name colors of squares (no conflict) than saying the ink color of a when what is written a different color (conflict)
stroop-fig
Es sind viele Varianten im Einsatz.
Quelle: [https://www.sciencedirect.com/science/article/pii/S0887617703001215]
In Golden’s (1978) method, 45 s are given to read each page of color words (W) printed in black ink, color hues (C) printed as XXXX, and color hues printed as competing color words (CW) (e.g., ‘red’ printed in blue ink). W is the number of color words read in 45 s. C is the number of color hues named in 45 s, and CW is the number of color hues named (while ignoring the color words) in 45 s. Each of these tasks has its own normative values and table look-up, but as one can easily see the CW score will be dependent upon the relative strengths of word reading and of color naming.
Golden’s (1978) ipsative interference score is then based upon an assertion that the time to read a CW item is an additive function of the time to read a word plus the time to name a color. The addition of the time to read a word (45/W) and the time to name a color (45/C) gives the algebraically simplified formula of (W×C)/(W+C) for the number of predicted CW items completed in 45 s ( Golden, 1978). In using this function, the investigator thus makes a hidden assumption that the brain adds word reading to color naming processes to produce the results on the CW card.
Code | Erklärung
--- | ---
C | Anzahl genannter Farben (keine Worte)
W | Anzahl gelesener Farbworte in schwarz
CW | Anzahl genannter Farben von Farbworten mit anderer Bedeutung als Druckfarbe (Interferenz)
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
head(dd)
## nr seen W C CW d.cw.c q.cw.c
## 1 1 0 126 86 64 -22 0.7441860
## 2 2 0 118 76 48 -28 0.6315789
## 3 3 0 61 66 38 -28 0.5757576
## 4 4 0 69 48 32 -16 0.6666667
## 5 5 0 57 59 42 -17 0.7118644
## 6 6 0 78 64 53 -11 0.8281250
### a correlation matrix
cor(dd[2:5])
## seen W C CW
## seen 1.00000000 -0.04511388 0.05742984 0.2107604
## W -0.04511388 1.00000000 0.43757398 0.3288920
## C 0.05742984 0.43757398 1.00000000 0.6327925
## CW 0.21076037 0.32889199 0.63279254 1.0000000
Die Stroop Leistungen sind moderat positiv korreliert, aber keine scheint mit der Erkennens-Leistung korreliert zu sein, zumindest nicht bedeutsam. Insgesamt ist scheint es hier nicht sonderlich viel zu sehen zu geben.
Start mit der CW (Leistung in der Interferenz-Bedingung) als Prädiktor. Die normale lineare Regression ist nur bedingt nützlich. Sie ergibt u. U. vorhergesagte Werte außerhalb des möglichen Bereichs.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# predict seen by CW, classical lm
m.lm <- lm(seen ~ CW, data=dd)
summary(m.lm)
##
## Call:
## lm(formula = seen ~ CW, data = dd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5924 -0.3966 -0.2716 0.5707 0.7883
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.103663 0.237652 -0.436 0.6637
## CW 0.010876 0.005148 2.112 0.0372 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4812 on 96 degrees of freedom
## Multiple R-squared: 0.04442, Adjusted R-squared: 0.03447
## F-statistic: 4.463 on 1 and 96 DF, p-value: 0.03724
m.lm$fitted.values[order(m.lm$fitted.values)]
## 67 18 37 74 86 26 75 25
## 0.1464852 0.2008652 0.2117412 0.2117412 0.2117412 0.2226173 0.2226173 0.2334933
## 4 53 58 9 56 7 94 42
## 0.2443693 0.2443693 0.2443693 0.2552453 0.2661213 0.2769973 0.2769973 0.2878733
## 45 91 19 33 3 64 82 89
## 0.2878733 0.2878733 0.2878733 0.2987494 0.3096254 0.3096254 0.3096254 0.3096254
## 40 10 29 59 78 68 8 16
## 0.3205014 0.3205014 0.3205014 0.3205014 0.3205014 0.3313774 0.3422534 0.3422534
## 49 98 5 52 54 62 63 11
## 0.3422534 0.3422534 0.3531294 0.3531294 0.3531294 0.3531294 0.3531294 0.3640054
## 41 57 90 15 12 20 27 17
## 0.3640054 0.3640054 0.3640054 0.3748815 0.3857575 0.3857575 0.3857575 0.3966335
## 34 61 65 83 35 84 87 55
## 0.3966335 0.3966335 0.3966335 0.3966335 0.4075095 0.4075095 0.4075095 0.4075095
## 71 2 13 21 31 80 85 14
## 0.4075095 0.4183855 0.4183855 0.4183855 0.4183855 0.4183855 0.4183855 0.4292615
## 60 70 36 38 46 95 23 51
## 0.4292615 0.4292615 0.4292615 0.4292615 0.4292615 0.4292615 0.4401375 0.4401375
## 76 79 30 66 28 69 77 6
## 0.4401375 0.4401375 0.4510136 0.4510136 0.4618896 0.4618896 0.4618896 0.4727656
## 22 48 81 97 73 32 24 44
## 0.4727656 0.4836416 0.4836416 0.4836416 0.4945176 0.4945176 0.5053936 0.5053936
## 72 93 43 92 1 50 96 47
## 0.5053936 0.5053936 0.5706497 0.5815257 0.5924017 0.5924017 0.6032778 0.6141538
## 88 39
## 0.6141538 0.6250298
# extreme values leed to impossible probabilities (predicted values)
m.lm$coefficients
## (Intercept) CW
## -0.10366318 0.01087601
# very low but possible CW: we divide minimum CW by 10
cat(m.lm$coefficients[1] + 0.1 * min(dd$CW) * m.lm$coefficients[2])
## -0.07864835
# very high but possible CW: we multiply maximum CW by 2
cat(m.lm$coefficients[1] + 2 * max(dd$CW) * m.lm$coefficients[2])
## 1.353723
Eine Grafik dazu.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = lm(seen ~ CW, data=dd)
# base plot
plot(dd$seen ~ dd$CW, ylim=c(-.5,1.5), xlim=c(-5,110))
abline(m.1$coefficients)
require("ggplot2")
## Lade nötiges Paket: ggplot2
##
## Attache Paket: 'ggplot2'
## Die folgenden Objekte sind maskiert von 'package:psych':
##
## %+%, alpha
# check the fitted values (predicted seen probabilities)
pplot <- ggplot(data=dd, aes(x=CW, y=seen))
pplot +
geom_point() +
# stat_smooth(method = "lm", se=TRUE) +
scale_x_continuous(limits = c(1, 140)) +
geom_abline(intercept = m.1$coefficients[1], slope = m.1$coefficients[2])
# predictions of values below 0 and above 1 are possible
Es ist klar erkennbar, dass die Regressionsgerade Vorhersagen über 1 und unter 0 erlaubt, die nicht möglich sind.
Die Frage, um wieviel besser ein Modell die Reaktionsvariable besser erklärt als ein Vergleichsmodell, kann mit der Modell-Chi-Quadrat-Statistik angegangen werden. Die Differenz zwischen einem und einem anderen Modell wird ausgedrückt. Von der Log-Likelihood des kleineren Modells bzw. des Null-Modells wird die Log-Likelihood des größeren Modells abgezogen und ergibt den Modell-Vergleichs-Chi^2 -Wert. Aus Everitt (2010), S. 119: The lack of fit of a logistic regression model can be measured by a term known as the deviance, which is essentially the ratio of the likelihoods of the model of interest to the saturated model that fits the data perfectly.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)
##
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3581 -0.9977 -0.7969 1.3056 1.7310
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.62100 1.08997 -2.405 0.0162 *
## CW 0.04744 0.02324 2.041 0.0412 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 126.45 on 96 degrees of freedom
## AIC: 130.45
##
## Number of Fisher Scoring iterations: 4
Im Summary:
anova()
Das Berechnen der log-likelihood ratio und das explizite Ermitteln
und Prüfen des Chi^2 Wertes kann auch durch den Befehl
anova()
erledigt werden.
Wie beim Vergleich von lm()
-Ergebnisobjekten können zwei
genestete glm-Ergebnisobjekte mit dem Befehl anova()
verglichen werden. Wird nur ein Modell übergeben, wird das Null-Modell
implizit als Vergleichsmodell angenommen. Die Chi^2 Prüfgröße wird über
test="Chisq"
angefordert.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# fit null model and model CW
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
# check model explicitly using anova()
anova(m.0, m.1, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: seen ~ 1
## Model 2: seen ~ CW
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 97 130.88
## 2 96 126.45 1 4.4226 0.03547 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
m.1$deviance
## [1] 126.4531
m.1$null.deviance
## [1] 130.8757
anova(m.1, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: seen
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 97 130.88
## CW 1 4.4226 96 126.45 0.03547 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Deviance eines Modells ist die log(likelihood) des Modells
multipliziert mit -2 Die Multiplikation mit -2 ist eine Konvention. Die
Deviances tauchen im Summary als “Null deviance” bzw. “Residual
deviance” auf. Der Befehl logLik()
, der als Parameter ein
glm-Ergebnisobjekt erwartet, gibt uns die log(likelihood(Modell)) des
Modells zurück.
Unterschiede in der Deviance können benutzt werden, um genestete
Modelle gegeneinander zu testen. Dies erreichen wir, indem wir die
Deviances der zu vergleichenden Modelle voneinander abziehen (Chi^2
Wert) sowie die Differenz der jeweiligen Modell-Freiheitsgrade benutzen,
um in der Chi^2 Tabelle den zugehörigen p-Wert nachzuschlagen (Funktion:
pchisq()
)
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# we can see the deviances in the model summary
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)
##
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9906 -0.9906 -0.9906 1.3765 1.3765
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4568 0.2073 -2.203 0.0276 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 130.88 on 97 degrees of freedom
## AIC: 132.88
##
## Number of Fisher Scoring iterations: 4
# predict seen by CW
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)
##
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3581 -0.9977 -0.7969 1.3056 1.7310
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.62100 1.08997 -2.405 0.0162 *
## CW 0.04744 0.02324 2.041 0.0412 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 126.45 on 96 degrees of freedom
## AIC: 130.45
##
## Number of Fisher Scoring iterations: 4
# the likelihood of a model is a very small value
exp(logLik(m.1))
## 'log Lik.' 3.475788e-28 (df=2)
# so we usually work with the log of it
logLik(m.1)
## 'log Lik.' -63.22656 (df=2)
# or more common with -2*log likelihood or deviances
# the deviances are the -2LL
logLik(m.0)
## 'log Lik.' -65.43787 (df=1)
-2*logLik(m.0)
## 'log Lik.' 130.8757 (df=1)
logLik(m.1)
## 'log Lik.' -63.22656 (df=2)
-2*logLik(m.1)
## 'log Lik.' 126.4531 (df=2)
# deviances are properties of the model result object
m.0$deviance
## [1] 130.8757
m.1$deviance
## [1] 126.4531
m.1$null.deviance
## [1] 130.8757
m.1.chi <- m.1$null.deviance - m.1$deviance
m.1.chi
## [1] 4.422608
# degrees of freedom are attributes of model-object
m.1.chidf <- m.1$df.null - m.1$df.residual
m.1.chidf
## [1] 1
# get the probability of this chisquare
chisq.prob <- 1 - pchisq(m.1.chi, m.1.chidf)
chisq.prob
## [1] 0.0354658
# anova() does this all for us
anova(m.0, m.1, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: seen ~ 1
## Model 2: seen ~ CW
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 97 130.88
## 2 96 126.45 1 4.4226 0.03547 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z-statistic kann ansatzweise interpretiert werden wie der t-Wert im
lm()
. Allerdings hat er vor allem bei großen
b-Koeffizienten eine Tendenz zum Fehler Typ II. Potenziell bedeutsame
erklärende Variablen werden dann nicht identifiziert.
Die z-Statistik und ihre Prüfung ist zu finden im Summary des Modells.
Wald-Statistik: (z-statistic)^2
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)
##
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3581 -0.9977 -0.7969 1.3056 1.7310
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.62100 1.08997 -2.405 0.0162 *
## CW 0.04744 0.02324 2.041 0.0412 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 126.45 on 96 degrees of freedom
## AIC: 130.45
##
## Number of Fisher Scoring iterations: 4
Vorschläge zu Analogien von R^2 für die Logistische Regression sind:
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# define a function to report R^2
# cf. Field (2012, p. 334)
pseudoR2 <- function(LogModel){
dev <- LogModel$deviance
nullDev <- LogModel$null.deviance
modelN <- length(LogModel$fitted.values)
R.l <- 1 - dev / nullDev
R.cs <- 1 - exp( -(nullDev - dev) / modelN)
R.n <- R.cs / ( 1 - ( exp (- (nullDev / modelN))))
cat("Pseudo R^2 for logistic regression\n")
cat("Hosmer and Lemeshow R^2", round(R.l, 3), "\n")
cat("Cox and Snell R^2 ", round(R.cs, 3), "\n")
cat("Nagelkerke R^2 ", round(R.n, 3), "\n")
}
# fit the model
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
# get the parameter tests
pseudoR2(m.1)
## Pseudo R^2 for logistic regression
## Hosmer and Lemeshow R^2 0.034
## Cox and Snell R^2 0.044
## Nagelkerke R^2 0.06
# package DescTools offers PseudoR2, take care of capital P
require(DescTools)
## Lade nötiges Paket: DescTools
##
## Attache Paket: 'DescTools'
## Die folgenden Objekte sind maskiert von 'package:psych':
##
## AUC, ICC, SD
DescTools::PseudoR2(m.1)
## McFadden
## 0.03379242
DescTools::PseudoR2(m.1, which="Nagelkerke")
## Nagelkerke
## 0.05987465
Der Koeffizient für CW m1.coefficients[1]
kann ähnlich
interpretiert werden wie im lm()
. Er drückt aus, um wieviel
sich der logit der Reaktionsvariablen ändert, wenn sich die erklärende
Variable um eine Einheit ändert. Logit der Reaktionsvariablen ist der
natürliche Logarithmus der odds, dass die Reaktionsvariable 1 wird.
Einsetzen der errechneten Koeffizienten in die Formel der Auftretenswahrscheinlichkeit.
\[ p(y_i) = \frac{1}{1 + e^{-(b_0 + b_1x_{1i})}} \]
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
# coefficients
m.1$coefficients
## (Intercept) CW
## -2.62100185 0.04744238
# predict probability to see of first observation
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * dd$CW[1]))))
## 0.6023605
cat(dd$seen[1], dd$CW[1], m.1$fitted.values[1])
## 0 64 0.6023605
# very small value of CW
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 0.1 * min(dd$CW)))))
## 0.07502923
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 0.001 * min(dd$CW)))))
## 0.06786794
# very high value of CW
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 2 * max(dd$CW)))))
## 0.9767125
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 5 * max(dd$CW)))))
## 0.9999983
# check the fitted values (predicted probabilities to see the gorilla)
m.1$fitted.values[order(m.1$fitted.values)]
## 67 18 37 74 86 26 75 25
## 0.1780190 0.2154105 0.2235369 0.2235369 0.2235369 0.2318792 0.2318792 0.2404365
## 4 53 58 9 56 7 94 19
## 0.2492071 0.2492071 0.2492071 0.2581889 0.2673791 0.2767744 0.2767744 0.2863708
## 42 45 91 33 3 64 82 89
## 0.2863708 0.2863708 0.2863708 0.2961636 0.3061477 0.3061477 0.3061477 0.3061477
## 10 29 40 59 78 68 8 16
## 0.3163171 0.3163171 0.3163171 0.3163171 0.3163171 0.3266652 0.3371850 0.3371850
## 49 98 5 52 54 62 63 11
## 0.3371850 0.3371850 0.3478684 0.3478684 0.3478684 0.3478684 0.3478684 0.3587072
## 41 57 90 15 12 20 27 17
## 0.3587072 0.3587072 0.3587072 0.3696923 0.3808140 0.3808140 0.3808140 0.3920621
## 34 61 65 83 35 55 71 84
## 0.3920621 0.3920621 0.3920621 0.3920621 0.4034260 0.4034260 0.4034260 0.4034260
## 87 2 13 21 31 80 85 14
## 0.4034260 0.4148945 0.4148945 0.4148945 0.4148945 0.4148945 0.4148945 0.4264560
## 36 38 46 60 70 95 23 51
## 0.4264560 0.4264560 0.4264560 0.4264560 0.4264560 0.4264560 0.4380985 0.4380985
## 76 79 30 66 28 69 77 6
## 0.4380985 0.4380985 0.4498095 0.4498095 0.4615764 0.4615764 0.4615764 0.4733862
## 22 48 81 97 32 73 24 44
## 0.4733862 0.4852259 0.4852259 0.4852259 0.4970823 0.4970823 0.5089419 0.5089419
## 72 93 43 92 1 50 96 47
## 0.5089419 0.5089419 0.5794280 0.5909437 0.6023605 0.6023605 0.6136670 0.6248522
## 88 39
## 0.6248522 0.6359057
# change in odds, calculated on base of CW = 50 and 51
p.50 <- 1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 50)))
p.51 <- 1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * 51)))
# odds.50 is p.50 / (1-p.50)
# odds.51 is p.51 / (1-p.51)
# diff.odds is odds.51/odds.50
cat((p.51 / (1-p.51))/(p.50/(1-p.50)))
## 1.048586
cat(m.1$coefficients[2])
## 0.04744238
cat(exp(m.1$coefficients[2]))
## 1.048586
# log() as inverse of exp()
log(exp(m.1$coefficients[2]))
## CW
## 0.04744238
# we get
head(m.1$fitted)
## 1 2 3 4 5 6
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862
head(m.1$fitted.values)
## 1 2 3 4 5 6
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862
# we can also use predict() and get the probabilities to see the gorilla, but we have to set type=response
head(predict(m.1, type="response"))
## 1 2 3 4 5 6
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862
# default is type "link" and returns the log(odds)
head(predict(m.1, type="link"))
## 1 2 3 4 5 6
## 0.4153103 -0.3437677 -0.8181915 -1.1028458 -0.6284220 -0.1065558
# calculate p for the first observations using the log(odds)
head(1/(1 + exp(-predict(m.1, type="link"))))
## 1 2 3 4 5 6
## 0.6023605 0.4148945 0.3061477 0.2492071 0.3478684 0.4733862
Nur der eigentliche Range der Ausgangsdaten. Die S-Form ist erkennbar.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
# check the fitted values (predicted seen probabilities)
dd.plot <- data.frame(cw.o = dd$CW[order(dd$CW)],
cw.p = m.1$fitted.values[order(m.1$fitted.values)])
ggplot(data=dd.plot, aes(x=cw.o, y=cw.p)) + geom_line() + geom_point()
Den angepassten Verlauf über einen größeren Bereich
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
cat(1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * dd$CW[1]))))
## 0.6023605
dd.plot <- data.frame(
a.o = -100:200,
a.p = 1 /(1 + exp(-(m.1$coefficients[1] + m.1$coefficients[2] * -100:200)))
)
# check the fitted values (predicted seen probabilities)
dd.plot2 <- data.frame(cw.o = dd$CW[order(dd$CW)],
cw.p = m.1$fitted.values[order(m.1$fitted.values)])
ggplot(data=dd.plot, aes(x=a.o, y=a.p)) +
geom_line() +
geom_point(data=dd.plot2, aes(x=cw.o, y=cw.p))
# including the real answers and a cutoff line at p=0.5
ggplot(data=dd.plot[dd.plot$a.o > 0 & dd.plot$a.o < 100,], aes(x=a.o, y=a.p)) +
geom_line() +
geom_point(data=dd.plot2, aes(x=cw.o, y=cw.p)) +
geom_point(data=dd, aes(x=CW, y=seen, color=seen)) +
geom_vline(xintercept = 55, color="red")
Der Befehl table()
gibt die Möglichkeit, eine Bedingung
mit anzugeben, nach der die Tabelle in zutreffend (‘TRUE’) und nicht
zutreffend (‘FALSE’) aufgeteilt wird.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# model to fit
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
cutoff = 0.5
table(m.1$y, fitted(m.1) > cutoff)
##
## FALSE TRUE
## 0 56 4
## 1 30 8
Bei 56 Beobachtungen, die den Gorilla nicht sehen, ist der vorhergesagte Wert unter dem cutoff, der Vergleich Wert > cutoff also FALSE. Dies wäre eine korrekte Vorhersage des Nicht-Entdeckens. Für 4 Personen sagen wir das Entdecken vorher, weil der Vorhersagewert über dem Cutoff liegt, die Vorhersage ist aber falsch. Bei 8 Beobachtungen, die den Gorilla gesehen haben, ist liegt der Vorhersagewert über dem Cutoff. Für diese Personen würden wir das Entdecken vorhersagen. Allerdings liegen wir bei 30 Beobachtungen falsch mit unserer Vorhersage.
Der oben erläuterte Grundansatz, logit-Werte durch ein lineares Modell vorherzusagen, funktioniert mit allen anderen Möglichkeiten, die das lineare Modell ansonsten auch bietet.
Alle Basis Leistungsmaße (CW, W, C) in das Modell aufnehmen.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# explicit null model
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)
##
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9906 -0.9906 -0.9906 1.3765 1.3765
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4568 0.2073 -2.203 0.0276 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 130.88 on 97 degrees of freedom
## AIC: 132.88
##
## Number of Fisher Scoring iterations: 4
# predict seen by CW
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)
##
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3581 -0.9977 -0.7969 1.3056 1.7310
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.62100 1.08997 -2.405 0.0162 *
## CW 0.04744 0.02324 2.041 0.0412 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 126.45 on 96 degrees of freedom
## AIC: 130.45
##
## Number of Fisher Scoring iterations: 4
require(DescTools)
PseudoR2(m.1)
## McFadden
## 0.03379242
# predict seen by CW and all derived mesasures of inference
m.2 = glm(seen ~ CW + W + C, family=binomial(), data=dd)
summary(m.2)
##
## Call:
## glm(formula = seen ~ CW + W + C, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3834 -0.9807 -0.7631 1.1841 1.7874
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.38347 1.39513 -0.992 0.3214
## CW 0.06983 0.03102 2.251 0.0244 *
## W -0.01322 0.01337 -0.989 0.3228
## C -0.01272 0.02121 -0.600 0.5485
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 124.44 on 94 degrees of freedom
## AIC: 132.44
##
## Number of Fisher Scoring iterations: 4
PseudoR2(m.2)
## McFadden
## 0.04916736
# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
anova(m.2, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: seen
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 97 130.88
## CW 1 4.4226 96 126.45 0.03547 *
## W 1 1.6457 95 124.81 0.19955
## C 1 0.3665 94 124.44 0.54491
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check improvement of expanding model1 to model2
anova(m.1, m.2, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: seen ~ CW
## Model 2: seen ~ CW + W + C
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 96 126.45
## 2 94 124.44 2 2.0122 0.3656
CW und die daraus abgeleiteten Interferenz-Werte in das Modell aufnehmen. Die Differenz der Farb-Benennleistung und der Interferenzleistung CW - C Der Quotient Farb-Benennleistung und der Interferenzleistung CW / C
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# explicit null model
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)
##
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9906 -0.9906 -0.9906 1.3765 1.3765
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4568 0.2073 -2.203 0.0276 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 130.88 on 97 degrees of freedom
## AIC: 132.88
##
## Number of Fisher Scoring iterations: 4
# predict seen by CW
m.1 = glm(seen ~ CW, family=binomial(), data=dd)
summary(m.1)
##
## Call:
## glm(formula = seen ~ CW, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3581 -0.9977 -0.7969 1.3056 1.7310
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.62100 1.08997 -2.405 0.0162 *
## CW 0.04744 0.02324 2.041 0.0412 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 126.45 on 96 degrees of freedom
## AIC: 130.45
##
## Number of Fisher Scoring iterations: 4
# predict seen by CW and all derived mesasures of inference
m.2 = glm(seen ~ CW + d.cw.c + q.cw.c, family=binomial(), data=dd)
summary(m.2)
##
## Call:
## glm(formula = seen ~ CW + d.cw.c + q.cw.c, family = binomial(),
## data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4681 -0.9866 -0.8022 1.2512 1.7676
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.72871 7.34360 0.235 0.814
## CW 0.07779 0.06436 1.209 0.227
## d.cw.c 0.06721 0.09462 0.710 0.478
## q.cw.c -6.13666 11.83685 -0.518 0.604
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 125.18 on 94 degrees of freedom
## AIC: 133.18
##
## Number of Fisher Scoring iterations: 4
# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
anova(m.2, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: seen
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 97 130.88
## CW 1 4.4226 96 126.45 0.03547 *
## d.cw.c 1 0.9978 95 125.45 0.31785
## q.cw.c 1 0.2762 94 125.18 0.59923
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check improvement of expanding model1 to model2
anova(m.1, m.2, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: seen ~ CW
## Model 2: seen ~ CW + d.cw.c + q.cw.c
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 96 126.45
## 2 94 125.18 2 1.2739 0.5289
require(DescTools)
DescTools::PseudoR2(m.1)
## McFadden
## 0.03379242
DescTools::PseudoR2(m.2)
## McFadden
## 0.04352635
Es können auch Interaktionsterme mit eingehen. Alle Rohmaße und die zweifachen Interaktionen in das Modell aufnehmen.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
# explicit null model
m.0 = glm(seen ~ 1, family=binomial(), data=dd)
summary(m.0)
##
## Call:
## glm(formula = seen ~ 1, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9906 -0.9906 -0.9906 1.3765 1.3765
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4568 0.2073 -2.203 0.0276 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 130.88 on 97 degrees of freedom
## AIC: 132.88
##
## Number of Fisher Scoring iterations: 4
# predict seen by CW, W, and C
m.1 = glm(seen ~ CW + W + C, family=binomial(), data=dd)
summary(m.1)
##
## Call:
## glm(formula = seen ~ CW + W + C, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3834 -0.9807 -0.7631 1.1841 1.7874
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.38347 1.39513 -0.992 0.3214
## CW 0.06983 0.03102 2.251 0.0244 *
## W -0.01322 0.01337 -0.989 0.3228
## C -0.01272 0.02121 -0.600 0.5485
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 124.44 on 94 degrees of freedom
## AIC: 132.44
##
## Number of Fisher Scoring iterations: 4
# predict seen by CW, W, C and the two way interactions
m.2 = glm(seen ~ CW + W + C + CW:W + CW:C + W:C, family=binomial(), data=dd)
summary(m.2)
##
## Call:
## glm(formula = seen ~ CW + W + C + CW:W + CW:C + W:C, family = binomial(),
## data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7359 -0.9266 -0.6442 1.1180 1.9020
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.917e+01 9.893e+00 -1.938 0.0526 .
## CW -1.328e-01 2.388e-01 -0.556 0.5782
## W 2.443e-01 1.059e-01 2.307 0.0211 *
## C 2.645e-01 1.598e-01 1.655 0.0978 .
## CW:W 2.608e-04 2.114e-03 0.123 0.9018
## CW:C 2.720e-03 1.713e-03 1.588 0.1124
## W:C -3.981e-03 1.736e-03 -2.293 0.0218 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 115.47 on 91 degrees of freedom
## AIC: 129.47
##
## Number of Fisher Scoring iterations: 4
# check model (Null-Model is automatically generated, deviance of null-model can be found in model$null.deviance)
anova(m.2, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: seen
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 97 130.88
## CW 1 4.4226 96 126.45 0.035466 *
## W 1 1.6457 95 124.81 0.199547
## C 1 0.3665 94 124.44 0.544908
## CW:W 1 1.5055 93 122.94 0.219825
## CW:C 1 0.8185 92 122.12 0.365624
## W:C 1 6.6449 91 115.47 0.009944 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# check improvement of expanding model1 to model2
anova(m.1, m.2, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: seen ~ CW + W + C
## Model 2: seen ~ CW + W + C + CW:W + CW:C + W:C
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 94 124.44
## 2 91 115.47 3 8.9689 0.02971 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
require(DescTools)
DescTools::PseudoR2(m.1)
## McFadden
## 0.04916736
DescTools::PseudoR2(m.2)
## McFadden
## 0.1176969
stepwise()
(wenn Rcmdr vorhanden ist) oder
stepAIC()
funktioniert auch bei logistischer
Regression.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
require("MASS")
## Lade nötiges Paket: MASS
m.step <- glm(seen ~ W+C+CW, family = binomial, data = dd)
#stepwise(m.step, trace = FALSE)
#Rcmdr::stepwise(m.step, direction="backward/forward", criterion="AIC")
stepAIC(m.step, direction='both', scope = list(upper = ~ W + C + CW, lower = ~ 1))
## Start: AIC=132.44
## seen ~ W + C + CW
##
## Df Deviance AIC
## - C 1 124.81 130.81
## - W 1 125.45 131.46
## <none> 124.44 132.44
## - CW 1 129.94 135.94
##
## Step: AIC=130.81
## seen ~ W + CW
##
## Df Deviance AIC
## - W 1 126.45 130.45
## <none> 124.81 130.81
## + C 1 124.44 132.44
## - CW 1 130.68 134.68
##
## Step: AIC=130.45
## seen ~ CW
##
## Df Deviance AIC
## <none> 126.45 130.45
## + W 1 124.81 130.81
## + C 1 125.45 131.46
## - CW 1 130.88 132.88
##
## Call: glm(formula = seen ~ CW, family = binomial, data = dd)
##
## Coefficients:
## (Intercept) CW
## -2.62100 0.04744
##
## Degrees of Freedom: 97 Total (i.e. Null); 96 Residual
## Null Deviance: 130.9
## Residual Deviance: 126.5 AIC: 130.5
#detach("package:MASS", unload=TRUE)
Auch Interaktionsterme können mit auftauchen. Die kompakte model formula mit “*” funktioniert nicht. Die Interaktionsterme müssen explizit mit ‘+’ angehängt werden.
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att_doubled.txt")
require("MASS")
# run a model with two way interactions included
m.step <- glm(seen ~ W + C + CW + CW:W + CW:C + C:W, family = binomial, data = dd)
summary(m.step)
##
## Call:
## glm(formula = seen ~ W + C + CW + CW:W + CW:C + C:W, family = binomial,
## data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7359 -0.9266 -0.6442 1.1180 1.9020
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.917e+01 9.893e+00 -1.938 0.0526 .
## W 2.443e-01 1.059e-01 2.307 0.0211 *
## C 2.645e-01 1.598e-01 1.655 0.0978 .
## CW -1.328e-01 2.388e-01 -0.556 0.5782
## W:CW 2.608e-04 2.114e-03 0.123 0.9018
## C:CW 2.720e-03 1.713e-03 1.588 0.1124
## W:C -3.981e-03 1.736e-03 -2.293 0.0218 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.88 on 97 degrees of freedom
## Residual deviance: 115.47 on 91 degrees of freedom
## AIC: 129.47
##
## Number of Fisher Scoring iterations: 4
#stepwise(m.step, trace = FALSE)
#stepwise(m.step, direction="backward/forward", criterion="AIC")
## Rcmdr::stepwise(m.step, direction="backward/forward", criterion="AIC")
stepAIC(m.step, direction='both', scope = list(upper = ~ W + C + CW + CW:W + CW:C + C:W, lower = ~ 1))
## Start: AIC=129.47
## seen ~ W + C + CW + CW:W + CW:C + C:W
##
## Df Deviance AIC
## - W:CW 1 115.49 127.49
## <none> 115.47 129.47
## - C:CW 1 118.09 130.09
## - W:C 1 122.12 134.12
##
## Step: AIC=127.49
## seen ~ W + C + CW + C:CW + W:C
##
## Df Deviance AIC
## <none> 115.49 127.49
## - C:CW 1 118.19 128.19
## + W:CW 1 115.47 129.47
## - W:C 1 124.34 134.34
##
## Call: glm(formula = seen ~ W + C + CW + C:CW + W:C, family = binomial,
## data = dd)
##
## Coefficients:
## (Intercept) W C CW C:CW W:C
## -19.548108 0.248176 0.253177 -0.107634 0.002739 -0.003872
##
## Degrees of Freedom: 97 Total (i.e. Null); 92 Residual
## Null Deviance: 130.9
## Residual Deviance: 115.5 AIC: 127.5
Im Original-Beispiel kamen nur halb so viele Probanden wie in den obigen Beispielen zum Einsatz.
# read data
dd <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att.txt")
# create inhibition score
dd['d.cw.c'] <- dd$CW - dd$C
dd['q.cw.c'] <- dd$CW / dd$C
# some descriptive impression
require(psych)
describe(dd)
## vars n mean sd median trimmed mad min max range skew
## nr 1 49 25.00 14.29 25.00 25.00 17.79 1.00 49.0 48.00 0.00
## seen 2 49 0.39 0.49 0.00 0.37 0.00 0.00 1.0 1.00 0.45
## W 3 49 99.82 18.89 100.00 99.98 17.79 57.00 156.0 99.00 0.05
## C 4 49 74.39 14.50 73.00 73.93 14.83 46.00 112.0 66.00 0.41
## CW 5 49 47.14 8.92 48.00 47.05 8.90 28.00 67.0 39.00 0.00
## d.cw.c 6 49 -27.24 11.10 -24.00 -26.44 8.90 -63.00 -5.0 58.00 -0.90
## q.cw.c 7 49 0.64 0.10 0.65 0.64 0.09 0.43 0.9 0.47 -0.15
## kurtosis se
## nr -1.27 2.04
## seen -1.84 0.07
## W 0.40 2.70
## C 0.00 2.07
## CW -0.24 1.27
## d.cw.c 0.71 1.59
## q.cw.c 0.15 0.01
# full model, only raw scores
m.a = glm(seen ~ W * C * CW, family=binomial(logit), data=dd)
summary(m.a)
##
## Call:
## glm(formula = seen ~ W * C * CW, family = binomial(logit), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8073 -0.9897 -0.5740 1.2368 1.7362
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.323e+02 8.037e+01 -1.646 0.0998 .
## W 1.316e+00 7.514e-01 1.751 0.0799 .
## C 2.129e+00 1.215e+00 1.753 0.0797 .
## CW 2.206e+00 1.659e+00 1.329 0.1837
## W:C -2.128e-02 1.140e-02 -1.866 0.0621 .
## W:CW -2.201e-02 1.530e-02 -1.439 0.1502
## C:CW -3.582e-02 2.413e-02 -1.485 0.1376
## W:C:CW 3.579e-04 2.225e-04 1.608 0.1078
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 65.438 on 48 degrees of freedom
## Residual deviance: 57.281 on 41 degrees of freedom
## AIC: 73.281
##
## Number of Fisher Scoring iterations: 5
# check full model (Null-Model is automatically generated)
anova(m.a, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: seen
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 48 65.438
## W 1 0.0755 47 65.362 0.78351
## C 1 0.3099 46 65.052 0.57775
## CW 1 0.1061 45 64.946 0.74467
## W:C 1 2.3632 44 62.583 0.12423
## W:CW 1 0.5681 43 62.015 0.45103
## C:CW 1 1.4290 42 60.586 0.23193
## W:C:CW 1 3.3053 41 57.281 0.06906 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# full model, only CW and dependent scores
m.a = glm(seen ~ CW * d.cw.c * q.cw.c , family=binomial(logit), data=dd)
summary(m.a)
##
## Call:
## glm(formula = seen ~ CW * d.cw.c * q.cw.c, family = binomial(logit),
## data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7960 -0.8733 -0.7998 1.2038 1.5481
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -12.49993 35.24189 -0.355 0.723
## CW -2.00115 1.48911 -1.344 0.179
## d.cw.c -1.26358 1.15364 -1.095 0.273
## q.cw.c 23.07736 52.91598 0.436 0.663
## CW:d.cw.c 0.02908 0.02547 1.141 0.254
## CW:q.cw.c 1.73861 1.17523 1.479 0.139
## d.cw.c:q.cw.c NA NA NA NA
## CW:d.cw.c:q.cw.c -0.05223 0.04294 -1.216 0.224
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 65.438 on 48 degrees of freedom
## Residual deviance: 61.030 on 42 degrees of freedom
## AIC: 75.03
##
## Number of Fisher Scoring iterations: 4
# check full model (Null-Model is automatically generated)
anova(m.a, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: seen
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 48 65.438
## CW 1 0.19491 47 65.243 0.6589
## d.cw.c 1 0.01587 46 65.227 0.8997
## q.cw.c 1 1.20412 45 64.023 0.2725
## CW:d.cw.c 1 0.65144 44 63.372 0.4196
## CW:q.cw.c 1 0.78125 43 62.590 0.3768
## d.cw.c:q.cw.c 0 0.00000 43 62.590
## CW:d.cw.c:q.cw.c 1 1.56062 42 61.030 0.2116
Die Originaldaten wurden etwas “frisiert”. Zu den Originaldaten wurden ‘passende’ Beobachtungen hinzugefügt um die Power zu erhöhen. Für Interessierte hier das Vorgehen:
# double the observations with a little noise
dd.n <- read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att.txt")
dd.n$nr <- dd.n$nr + length(dd.n$nr)
dd.n$W <- round(dd.n$W + rnorm(length(dd.n$nr), 0, sd(dd.n$W)/10))
dd.n$C <- round(dd.n$C + rnorm(length(dd.n$nr), 0, sd(dd.n$C)/10))
dd.n$CW <- round(dd.n$CW + rnorm(length(dd.n$nr), 0, sd(dd.n$CW)/10))
# 'double' the observations
dd <- rbind(read.delim("https://md.psych.bio.uni-goettingen.de/mv/data/div/stroop_vis_att.txt"), dd.n)
# give CW a little more predictive power
blind <- dd$nr[dd$seen == 0]
s.blind <- sample(blind, length(blind)/2)
# dd[s.blind,]
# dd[s.blind, c("nr", "CW", "seen")]
dd$CW[s.blind] <- round(dd$CW[s.blind] - sd(dd$CW) / 1.5)
# dd[s.blind, c("nr","CW", "seen")]
# create inhibition scores
dd['d.cw.c'] <- dd$CW - dd$C
dd['q.cw.c'] <- dd$CW / dd$C
# a descriptive impression
require(psych)
describe(dd)
## vars n mean sd median trimmed mad min max range skew
## nr 1 98 49.50 28.43 49.50 49.50 36.32 1.00 98.0 97.00 0.00
## seen 2 98 0.39 0.49 0.00 0.36 0.00 0.00 1.0 1.00 0.45
## W 3 98 99.91 18.85 100.00 100.12 18.53 57.00 156.0 99.00 0.06
## C 4 98 74.59 14.37 73.00 74.03 14.83 46.00 113.0 67.00 0.46
## CW 5 98 45.16 9.11 45.00 44.95 8.90 26.00 67.0 41.00 0.24
## d.cw.c 6 98 -29.43 11.36 -27.00 -28.69 8.90 -63.00 -5.0 58.00 -0.72
## q.cw.c 7 98 0.61 0.10 0.61 0.61 0.10 0.41 0.9 0.49 0.17
## kurtosis se
## nr -1.24 2.87
## seen -1.81 0.05
## W 0.37 1.90
## C 0.08 1.45
## CW -0.16 0.92
## d.cw.c 0.18 1.15
## q.cw.c -0.09 0.01
# save the data
##write.table(dd, "stroop_vis_att_double.txt", quote=F, sep="\t", row.names=F)
Je nach Kodierung und Festlegung der Faktorstufen sind die Interpretationen der Koeffizienten verschieden. Im Prinzip wird nur die erste Faktorstufe als das Event definiert, dessen Eintreten bzw. dessen log(odds) modelliert werden soll. Ein Vertauschen der beiden Events invertiert lediglich das Vorzeichen der Koeffizienten.
require(tidyverse)
## Lade nötiges Paket: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble 3.1.6 ✔ dplyr 1.0.8
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ✔ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%() masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")
## Rows: 400 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dd <- dd %>%
rowwise() %>% # wir wollen zeilenweise vorgehen
mutate(stats_joy = mean(c(stats_joy1, stats_joy2, # mittelwert bilden
stats_joy3, stats_joy4,
stats_joy5)))
dd <- dd %>% mutate(aon_a = factor(attend_or_not, levels = c("course attended", "course not attended")))
dd <- dd %>% mutate(aon_n = factor(attend_or_not, levels = c("course not attended", "course attended")))
head(dd$aon_a)
## [1] course attended course attended course not attended
## [4] course attended course attended course attended
## Levels: course attended course not attended
head(dd$aon_n)
## [1] course attended course attended course not attended
## [4] course attended course attended course attended
## Levels: course not attended course attended
m.a <- glm(aon_a ~ stats_joy, data = dd, family = binomial())
m.n <- glm(aon_n ~ stats_joy, data = dd, family = binomial())
head(model.matrix(m.a))
## (Intercept) stats_joy
## 1 1 6.6
## 2 1 4.6
## 3 1 3.8
## 4 1 4.0
## 5 1 6.0
## 6 1 5.6
head(model.matrix(m.n))
## (Intercept) stats_joy
## 1 1 6.6
## 2 1 4.6
## 3 1 3.8
## 4 1 4.0
## 5 1 6.0
## 6 1 5.6
summary(m.a)
##
## Call:
## glm(formula = aon_a ~ stats_joy, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8789 -0.6735 -0.3730 -0.1323 2.4889
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.6812 0.5830 6.315 2.71e-10 ***
## stats_joy -1.0519 0.1301 -8.085 6.21e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 433.82 on 399 degrees of freedom
## Residual deviance: 337.01 on 398 degrees of freedom
## AIC: 341.01
##
## Number of Fisher Scoring iterations: 5
summary(m.n)
##
## Call:
## glm(formula = aon_n ~ stats_joy, family = binomial(), data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4889 0.1323 0.3730 0.6735 1.8789
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6812 0.5830 -6.315 2.71e-10 ***
## stats_joy 1.0519 0.1301 8.085 6.21e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 433.82 on 399 degrees of freedom
## Residual deviance: 337.01 on 398 degrees of freedom
## AIC: 341.01
##
## Number of Fisher Scoring iterations: 5
Vollziehen Sie das in log-reg.pdf dargestellte Rechenbeispiel für die GHQ-Daten aus everitt nach. Die Daten und das Beispiel entstammen dem Buch Everitt (2010) In den Beispielen findet sich das in Everitt hierzu vorgestellte GHQ-Beispiel.
UCLA (University of California, Los Angeles)
A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don’t admit, is a binary variable.
Variable | Bedeutung |
---|---|
admit | Zulassung an UCLA (binär) |
gre | graduate record exmainations (Zulassungsprüfungspunkte) |
gpa | grade point average (in etwa Notendurchschnitt: hier nach descriptives mit Maximum 4 (Bestnote) (A=4, B=3, C=2, D=1, F=0)) |
rank | Bewertung der Qualtiät Institution an der man vorher gelernt hat (vier Stufen, 1..4)) |
Fragen:
#df <- read.csv("https://www.ats.ucla.edu/stat/dataa/binary.csv")
df <- read.csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/example_ucla.csv")
Beispiel Diabetes
If we have frequencies we can also fit a glm(). We need a column with proportion and a column with total number of observations for every combination of predictor values.
glm(..., family=binomial, weights=...)
dd.t <- readr::read_table("y x1 x2 freq
0 No 0 268
0 No 1 14
0 Yes 0 109
0 Yes 1 1
1 No 0 31
1 No 1 6
1 Yes 0 45
1 Yes 1 6
")
total<-with(dd.t,freq[1:4]+freq[5:8])
logit=glm(freq/total~x1+x2, data=dd.t[1:4,], family=binomial, weights=total)
# pearson chi square value
chi_test<-sum(residuals(logit, type = "pearson")^2);
# so the data are finally
dd <- cbind(dd.t[1:4,], total)
logit2=glm(freq/total~x1+x2, data=dd, family=binomial, weights=total)
summary(logit2)
##
## Call:
## glm(formula = freq/total ~ x1 + x2, family = binomial, data = dd,
## weights = total)
##
## Deviance Residuals:
## 1 2 3 4
## -0.2307 0.5677 0.2119 -1.0507
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.2010 0.1892 11.632 < 2e-16 ***
## x1Yes -1.3538 0.2516 -5.381 7.39e-08 ***
## x2 -1.6261 0.4313 -3.770 0.000163 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 41.9209 on 3 degrees of freedom
## Residual deviance: 1.5244 on 1 degrees of freedom
## AIC: 23.149
##
## Number of Fisher Scoring iterations: 3
Idee für das einführende Beispiel: [https://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html]
example including gorilla experiment
Field (2012, p. 312) Kapitel 8: Logistic Regression
Reaktionsvariable ist dichotom bzw. binär.
geschätzt werden die logits der Wahrscheinlichkeit der Zielgruppe anzugehören bzw die logarithmierten Chancen log(odds).
odds lassen sich in Wahrscheinlichkeiten umrechnen, der
Zielgruppe anzugehören
1/(1 + exp(-predict(m.1, type="link")))
Interpretation exp(model$coefficients)
ist: Zu
welcher Änderung der Chance (odds) führt eine Erhöhung des Prädiktors um
1. Die Roh-Koeffizienten in model$coefficients
beziehen
sich also auf die logarithmierten odds.
fitted
gibt Wahrscheinlichkeiten aus, zur Zielgruppe zu
gehören. predict
hat default
predict(type='link')
und gibt die Chancen aus (odds). Mit
predict(type='response')
erhalten wir direkt die
Wahrscheinlichkeiten, wie mit fitted
.
Beispiel unter glm_logistic.html#Vorhersage_der_Werte
Wir vergleichen Modelle mit Maximum Liklihood geschätzten Parametern
Deren Model-Fit wird u. a. ausgedrückt durch auf der Likelihood
basierende Kenngrößen -2LL bzw. deviances. Wir benutzen
anova(..., test = "Chisq")
da die Likelihood-Unterschiede
Chi^2 verteilt sind. Dies müssen wir im Befehl anova()
mit
dem Parameter test = "Chisq"
angeben.
\(p(y_i) = \frac{1}{1 + e^{-(b_0 + b_1x_{1i})}}\)
library(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")
## Rows: 400 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dd <- dd %>% rowwise() %>% mutate(stats_joy = mean(c(stats_joy1, stats_joy2, stats_joy3, stats_joy4, stats_joy5)))
dd <- dd %>% mutate(attend_or_not.f = factor(attend_or_not, levels = c("course not attended", "course attended")))
dd <- dd %>% mutate(att.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.0 <- glm(att.bin ~ 1, data = dd, family = binomial())
m.1 <- glm(att.bin ~ stats_joy, data = dd, family = binomial())
m.1$coefficients
## (Intercept) stats_joy
## -3.681209 1.051926
exp(m.1$coefficients)
## (Intercept) stats_joy
## 0.02519251 2.86315905
# we use our model to see the change in the outcome that is produced by 1 step in the predictor
# the difference is expressed in logit(odds) units.
# we define some data
new.d <- data.frame(stats_joy=c(1, 2, 3, 4))
# and predict the values
predict(m.1, newdata = new.d) -> n.d
n.d
## 1 2 3 4
## -2.6292831 -1.5773575 -0.5254319 0.5264937
n.d[1] - n.d[2]
## 1
## -1.051926
n.d[2] - n.d[3]
## 2
## -1.051926
n.d[3] - n.d[4]
## 3
## -1.051926
# logit(p) or log(odds) or log(p/1-p) change by 1.0519 when the average joy on statistics increases by 1
# psych package offers commands logit() and logistic() to convert probabilities to logits and vice versa
# logistic function (1/(1+exp(-x)) # logits given, probabilities result
# logit function (log(p/(1-p)) # probabilities given, logits result
# so we can calculate the actual probabilities given the logits
ll <- predict(m.1, type="link")
head(ll)
## 1 2 3 4 5 6
## 3.2615002 1.1576490 0.3161086 0.5264937 2.6303448 2.2095746
head(psych::logistic(ll))
## 1 2 3 4 5 6
## 0.9630842 0.7609053 0.5783756 0.6286649 0.9327892 0.9011060
pp <- 1/(1 + exp(-predict(m.1, type="link")))
head(pp)
## 1 2 3 4 5 6
## 0.9630842 0.7609053 0.5783756 0.6286649 0.9327892 0.9011060
pp <- (predict(m.1, type="response"))
head(pp)
## 1 2 3 4 5 6
## 0.9630842 0.7609053 0.5783756 0.6286649 0.9327892 0.9011060
head(psych::logit(pp))
## 1 2 3 4 5 6
## 3.2615002 1.1576490 0.3161086 0.5264937 2.6303448 2.2095746
library(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")
## Rows: 400 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dd <- dd %>% rowwise() %>% mutate(stats_joy = mean(c(stats_joy1, stats_joy2, stats_joy3, stats_joy4, stats_joy5)))
dd <- dd %>% mutate(attend_or_not.f = factor(attend_or_not, levels = c("course not attended", "course attended")))
dd <- dd %>% mutate(att.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.0 <- glm(att.bin ~ 1, data = dd, family = binomial())
m.1 <- glm(att.bin ~ stats_joy, data = dd, family = binomial())
# we want to see the likelihood
exp(logLik(m.1))
## 'log Lik.' 6.588736e-74 (df=2)
# this is very small and not easy to handle. So we use log(likelihood) usually.
logLik(m.1)
## 'log Lik.' -168.5059 (df=2)
# or even more common -2LL
-2*logLik(m.1)
## 'log Lik.' 337.0119 (df=2)
# by the way, this is, what we call deviance
m.1$deviance
## [1] 337.0119
# ... and it is
sum(residuals(m.1)^2)
## [1] 337.0119
# in model comparison we compare it with a model that uses a constant as predictor
# m.0 <- glm(attend_or_not ~ 1, data = stats_lectures, family = binomial())
m.0$deviance
## [1] 433.8236
# this is done automatically and saved to an attribute in the model
m.1$null.deviance
## [1] 433.8236
# we compare these deviances to see, whether the expanded model is better
anova(m.0, m.1, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: att.bin ~ 1
## Model 2: att.bin ~ stats_joy
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 399 433.82
## 2 398 337.01 1 96.812 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# AIC is just a modified -2LL, where the number of parameters punish the model fit
AIC(m.1)
## [1] 341.0119
m.1$deviance + length(m.1$coefficients) * 2
## [1] 341.0119
AIC(m.0)
## [1] 435.8236
# BIC accounts for sample size on top of AIC
BIC(m.1)
## [1] 348.9948
m.1$deviance + length(m.1$coefficients) * log(nrow(dd))
## [1] 348.9948
BIC(m.0)
## [1] 439.8151
# the smaller AIC or BIC, the better data are fitted by the model when comparing hierarchical models
# package DescTools offers PseudoR2, take care of capital P
DescTools::PseudoR2(m.0, which="Nagelkerke")
## Nagelkerke
## 0
DescTools::PseudoR2(m.1, which="Nagelkerke")
## Nagelkerke
## 0.3247483
# and the real check of correctly identified attending students
cutoff = 0.5
table(dd$att.bin, fitted(m.1) > cutoff)
##
## FALSE TRUE
## 0 35 58
## 1 17 290
chisq.test(table(dd$att.bin, fitted(m.1) > cutoff))
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(dd$att.bin, fitted(m.1) > cutoff)
## X-squared = 62.21, df = 1, p-value = 3.087e-15
AIC ist eine Modifikation von -2LL. Bestraft wird für die Aufnahme von Parametern
\(AIC = -2LL + k * 2\) k ist die Anzahl der Prädiktoren
AIC(m.1)
: 341.0118708 = m.1$deviance + 2*2
:
341.0118708
Gegenüber AIC bezieht BIC auch noch die Stichprobengröße in die Bewertung des Modells mit ein
BIC = -2LL + k * log(n)
\(BIC
= -2LL + k * log(n-1)\) n ist die Stichprobengröße
Bei hierarchischen Modellen ist das Modell mit dem geringeren AIC bzw. BIC das bessere Modell, also das Modell, das weniger “unerklärte Varianz” in der Reaktionsvariable übriglässt.
library(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")
## Rows: 400 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): attend_or_not
## dbl (7): temperature, sun_joy, stats_joy1, stats_joy2, stats_joy3, stats_joy...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dd <- dd %>% mutate(attend.bin = ifelse(attend_or_not == "course attended", 1, 0))
dd <- dd %>% rowwise() %>%
mutate(stats_joy = mean(c(stats_joy1, stats_joy2, stats_joy3, stats_joy4, stats_joy5)))
dd <- dd %>% mutate(attend_or_not = factor(attend_or_not, levels = c("course not attended", "course attended")))
dd <- dd %>% mutate(attend.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.1 <- glm(attend.bin ~ stats_joy, data = dd, family = binomial())
# this can be plotted by a 2 dimensional plot
ggplot(dd, aes(x=stats_joy, y=fitted(m.1))) + geom_point()
# complexer models might be visualized by 3d plots
m.4 <- glm(attend.bin ~ stats_joy + temperature +
sun_joy + temperature:sun_joy,
data = dd, family = binomial())
dd$fitted <- fitted(m.4)
require(rockchalk)
## Lade nötiges Paket: rockchalk
##
## Attache Paket: 'rockchalk'
##
## Das folgende Objekt ist maskiert 'package:dplyr':
##
## summarize
##
## Das folgende Objekt ist maskiert 'package:MASS':
##
## mvrnorm
plotPlane(model = m.4, plotx1 = "temperature", plotx2 = "sun_joy")
plotPlane(model = m.4, plotx1 = "temperature", plotx2 = "sun_joy", theta = 30, phi =30)
Graph is not rendered when document is knitted, this makes no sense.
We therefore set eval = FALSE
in the R chunk header.
librry(tidyverse)
dd <- read_csv("https://md.psych.bio.uni-goettingen.de/mv/data/div/stats_lectures.csv")
dd <- dd %>% mutate(attend.bin = ifelse(attend_or_not == "course attended", 1, 0))
m.i <- glm(attend.bin ~ temperature + sun_joy + temperature:sun_joy, data = dd, family = binomial())
require(rgl)
open3d()
plot3d(x=dd$temperature, y=dd$sun_joy, z=dd$attend.bin, col="red", size=4)
grd <- expand.grid(temperature = sort(unique(dd$temperature)), sun_joy = sort(unique(dd$sun_joy)) )
# predicted values are logits, we have to convert them to probabilities that we want to display
grd$pred <- 1/(1 + exp(-predict(m.i, type="link", newdata = grd)))
persp3d(x=unique(grd[[1]]), y=unique(grd[[2]]),
z=matrix(grd[[3]],length(unique(grd[[1]])),length(unique(grd[[2]]))), add=TRUE)