Erfasst wird die Intensität des Vergleichs der eigenen, körperlichen Attraktivität mit der anderer Personen.
Sie fragen sich, ob sich übergewichtige bzw. untergewichtige Personen intensivere Körpervergleiche anstellen als normalgewichtige Personen. Für Übergewicht bzw. Untergewicht hat sich international der body-mass-index (BMI) durchgesetzt. Der BMI und seine Bewertung definiert sich wie folgt:
BMI = Körpergewicht / (Körpergröße * Körpergröße)
BMI = kg / (m * m)
BMI Klassen:
BMI unter 17.5 -> Untergewicht, Anorexie
BMI über 24 -> Übergewicht
BMI über 30 -> adipös
Der maximale, bisher gemessene BMI beträgt 204 Quelle
Sie haben von einer Reihe von Personen Körpergröße und -gewicht erfasst die in der Datei v_bmi.txt aufgelistet sind. Das Ausmaß des Körpervergleichs wurde über die 5 Items der PACS erfasst.
Von der Physical Appearance Comparison Scale PACS Website:
THE PHYSICAL APPEARANCE COMPARISON SCALE (PACS)
Using the following scale please select a number that comes closest to how you feel:
Never Seldom Sometimes Often Always 1 2 3 4 5
- At parties or other social events, I compare my physical appearance to the physical appearance of others.
- The best way for a person to know if they are overweight or underweight is to compare their figure to the figure of others.
- At parties or other social events, I compare how I am dressed to how other people are dressed.
- Comparing your “looks” to the “looks” of others is a bad way to determine if you are attractive or unattractive.
- In social situations, I sometimes compare my figure to the figures of other people.
Item 4 is reverse-scored
Übersetzungsversuch:
Benennungsversuche der Skala:
bilden Sie den BMI und fügen Sie ihn als neue Spalte an den DataFrame
bilden Sie eine neue Variable ‘bmi_class’ im Dataframe, in der die Klassifikation in ‘low,’ ‘high,’ ‘obese’ und ‘normal’ codiert wird
machen Sie diese Variable zu einem Faktor
bilden Sie eine neue Variable ‘bmi_dichotom’ als neue Spalte im Dataframe, in der die Klassifikation ‘normal’ und ‘extreme’ gemacht wird
bilden Sie eine neue Variable ‘bmi_normal’ als neue Spalte im Dataframe, in der die Klassifikation ‘normal’ als wahr oder falsch codiert wird
erstellen Sie eine neue Variable ‘pacs_sum’ in der Sie den PACS-Score als Summe der PACS-Items speichern. Achten Sie auf die Polung.
erstellen Sie eine neue Variable ‘pacs’ in der Sie den PACS-Score als Mittelwert der PACS-Items speichern. Achten Sie auf die Polung.
fügen Sie die neu berechneten Variablen an den Dataframe an
benutzen Sie die neu gebildeten Variablen, um sich Teil-Datensätze ausgeben zu lassen
speichern Sie den erweiterten Dataframe in ihrem Working Directory
[res_begin:pacs1]
# read data
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt")
# ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt", fileEncoding = "UTF-8") # read like that if there is coding mess on your platform
# dplyr must be available
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# check classes
# invert negativ item
# aggregate to scale
# compute bmi
#ddf["bmi"] <- ddf$weight / (ddf$height / 100) ** 2
ddf <- ddf %>% dplyr::mutate(bmi = weight / (height / 100) ** 2)
# control data
head(ddf)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 1 Meier 2 180 60 2.02 2 1 2
## 2 2 Müller 2 176 54 1.00 3 4 3
## 3 3 Strauß 1 180 110 2.03 5 5 5
## 4 4 Übeling 1 182 52 1.00 4 5 4
## 5 5 Ärger 1 160 62 1.00 3 2 2
## 6 6 Östro 1 156 41 1.00 3 4 3
## c_bad_way c_figure filling_time bmi
## 1 4 1 39.63 18.51852
## 2 1 5 26.00 17.43285
## 3 1 5 26.00 33.95062
## 4 2 3 26.00 15.69859
## 5 4 1 26.00 24.21875
## 6 3 4 26.00 16.84747
# get all persons with bmi < 17.5
#ddf$bmi[ddf$bmi < 17.5]
dplyr::filter(ddf, bmi < 17.5)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 2 Müller 2 176 54 1.00 3 4 3
## 2 4 Übeling 1 182 52 1.00 4 5 4
## 3 6 Östro 1 156 41 1.00 3 4 3
## 4 9 Lully 1 159 44 2.26 4 3 3
## 5 10 Lumière 2 189 60 1.21 5 4 5
## 6 16 Grimm 1 165 47 2.06 3 2 3
## 7 18 Panik 2 175 53 1.00 4 3 3
## 8 26 Cyrill 2 180 54 2.48 4 5 5
## 9 27 Compu 2 184 58 1.00 3 4 3
## 10 29 Miller 2 183 53 1.00 5 5 4
## c_bad_way c_figure filling_time bmi
## 1 1 5 26.00 17.43285
## 2 2 3 26.00 15.69859
## 3 3 4 26.00 16.84747
## 4 2 4 31.41 17.40437
## 5 1 4 32.82 16.79684
## 6 4 1 48.15 17.26354
## 7 2 4 29.48 17.30612
## 8 1 5 29.56 16.66667
## 9 2 3 40.12 17.13138
## 10 1 5 26.00 15.82609
# or those with overweight and obese ones
#ddf$bmi[ddf$bmi > 24]
dplyr::filter(ddf, bmi > 24)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 3 Strauß 1 180 110 2.03 5 5 5
## 2 5 Ärger 1 160 62 1.00 3 2 2
## 3 7 Valentin 1 165 71 2.47 4 3 4
## 4 8 Marquéz 1 160 70 2.37 4 4 4
## 5 12 Russki 2 192 130 2.97 4 4 5
## 6 13 Berking 1 187 91 1.87 3 4 3
## 7 19 Ütülümek 2 180 90 1.00 3 4 4
## 8 20 Chorvátc 1 177 80 3.24 4 3 5
## 9 21 Clánku 2 179 90 1.00 3 5 4
## 10 22 Eiland 2 182 100 1.60 5 5 5
## c_bad_way c_figure filling_time bmi
## 1 1 5 26.00 33.95062
## 2 4 1 26.00 24.21875
## 3 2 3 26.00 26.07897
## 4 2 4 26.00 27.34375
## 5 1 3 33.33 35.26476
## 6 2 5 26.16 26.02305
## 7 2 3 26.77 27.77778
## 8 1 4 113.48 25.53545
## 9 2 4 29.63 28.08901
## 10 1 4 34.28 30.18959
#ddf$bmi[ddf$bmi > 30]
dplyr::filter(ddf, bmi > 30)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 3 Strauß 1 180 110 2.03 5 5 5
## 2 12 Russki 2 192 130 2.97 4 4 5
## 3 22 Eiland 2 182 100 1.60 5 5 5
## c_bad_way c_figure filling_time bmi
## 1 1 5 26.00 33.95062
## 2 1 3 33.33 35.26476
## 3 1 4 34.28 30.18959
# compute factor bmi class
ddf["bmi_class"] <- NA # default is missing
dplyr::mutate(ddf, bmi_class = NA)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 1 Meier 2 180 60 2.02 2 1 2
## 2 2 Müller 2 176 54 1.00 3 4 3
## 3 3 Strauß 1 180 110 2.03 5 5 5
## 4 4 Übeling 1 182 52 1.00 4 5 4
## 5 5 Ärger 1 160 62 1.00 3 2 2
## 6 6 Östro 1 156 41 1.00 3 4 3
## 7 7 Valentin 1 165 71 2.47 4 3 4
## 8 8 Marquéz 1 160 70 2.37 4 4 4
## 9 9 Lully 1 159 44 2.26 4 3 3
## 10 10 Lumière 2 189 60 1.21 5 4 5
## 11 11 Bâtist 1 158 45 1.27 2 2 2
## 12 12 Russki 2 192 130 2.97 4 4 5
## 13 13 Berking 1 187 91 1.87 3 4 3
## 14 14 Møllar 2 193 86 1.62 2 2 1
## 15 15 Abdullah 2 173 56 1.00 1 2 1
## 16 16 Grimm 1 165 47 2.06 3 2 3
## 17 17 Guevarra 1 167 52 1.00 2 1 1
## 18 18 Panik 2 175 53 1.00 4 3 3
## 19 19 Ütülümek 2 180 90 1.00 3 4 4
## 20 20 Chorvátc 1 177 80 3.24 4 3 5
## 21 21 Clánku 2 179 90 1.00 3 5 4
## 22 22 Eiland 2 182 100 1.60 5 5 5
## 23 23 Kromer 2 178 60 1.60 1 1 1
## 24 24 Brauer 2 181 75 1.30 2 1 3
## 25 25 Büstisch 1 185 80 1.62 1 2 1
## 26 26 Cyrill 2 180 54 2.48 4 5 5
## 27 27 Compu 2 184 58 1.00 3 4 3
## 28 28 Messer 2 179 57 1.00 3 3 4
## 29 29 Miller 2 183 53 1.00 5 5 4
## 30 30 Terminal 1 162 54 2.15 2 1 1
## c_bad_way c_figure filling_time bmi bmi_class
## 1 4 1 39.63 18.51852 NA
## 2 1 5 26.00 17.43285 NA
## 3 1 5 26.00 33.95062 NA
## 4 2 3 26.00 15.69859 NA
## 5 4 1 26.00 24.21875 NA
## 6 3 4 26.00 16.84747 NA
## 7 2 3 26.00 26.07897 NA
## 8 2 4 26.00 27.34375 NA
## 9 2 4 31.41 17.40437 NA
## 10 1 4 32.82 16.79684 NA
## 11 4 2 26.00 18.02596 NA
## 12 1 3 33.33 35.26476 NA
## 13 2 5 26.16 26.02305 NA
## 14 5 1 26.00 23.08787 NA
## 15 5 2 40.00 18.71095 NA
## 16 4 1 48.15 17.26354 NA
## 17 5 3 39.05 18.64534 NA
## 18 2 4 29.48 17.30612 NA
## 19 2 3 26.77 27.77778 NA
## 20 1 4 113.48 25.53545 NA
## 21 2 4 29.63 28.08901 NA
## 22 1 4 34.28 30.18959 NA
## 23 5 1 26.00 18.93700 NA
## 24 5 2 37.44 22.89307 NA
## 25 4 2 30.51 23.37473 NA
## 26 1 5 29.56 16.66667 NA
## 27 2 3 40.12 17.13138 NA
## 28 1 5 29.99 17.78971 NA
## 29 1 5 26.00 15.82609 NA
## 30 4 1 88.70 20.57613 NA
#ddf$bmi_class[ddf$bmi < 17.5] = 'low'
ddf <- ddf %>% mutate(bmi_class = replace(bmi_class, bmi < 17.5, 'low'))
# check what happened
head(ddf)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 1 Meier 2 180 60 2.02 2 1 2
## 2 2 Müller 2 176 54 1.00 3 4 3
## 3 3 Strauß 1 180 110 2.03 5 5 5
## 4 4 Übeling 1 182 52 1.00 4 5 4
## 5 5 Ärger 1 160 62 1.00 3 2 2
## 6 6 Östro 1 156 41 1.00 3 4 3
## c_bad_way c_figure filling_time bmi bmi_class
## 1 4 1 39.63 18.51852 <NA>
## 2 1 5 26.00 17.43285 low
## 3 1 5 26.00 33.95062 <NA>
## 4 2 3 26.00 15.69859 low
## 5 4 1 26.00 24.21875 <NA>
## 6 3 4 26.00 16.84747 low
# ddf$bmi_class[ddf$bmi > 24] <- 'high'
# ddf$bmi_class[ddf$bmi > 30] <- 'obese'
ddf <- ddf %>% mutate(bmi_class = replace(bmi_class, bmi > 24, 'high'))
ddf <- ddf %>% mutate(bmi_class = replace(bmi_class, bmi > 30, 'obese'))
# set normal weight
#ddf$bmi_class[ddf$bmi >= 17.5 & ddf$bmi <= 24] <- 'normal'
ddf <- ddf %>% mutate(bmi_class = replace(bmi_class, bmi >= 17.5 & bmi <= 24, 'normal'))
# make it a factor
ddf <- ddf %>% mutate(bmi_class = as.factor(bmi_class))
# control it
class(ddf$bmi_class)
## [1] "factor"
str(ddf$bmi_class)
## Factor w/ 4 levels "high","low","normal",..: 3 2 4 2 1 2 1 1 2 2 ...
head(ddf)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 1 Meier 2 180 60 2.02 2 1 2
## 2 2 Müller 2 176 54 1.00 3 4 3
## 3 3 Strauß 1 180 110 2.03 5 5 5
## 4 4 Übeling 1 182 52 1.00 4 5 4
## 5 5 Ärger 1 160 62 1.00 3 2 2
## 6 6 Östro 1 156 41 1.00 3 4 3
## c_bad_way c_figure filling_time bmi bmi_class
## 1 4 1 39.63 18.51852 normal
## 2 1 5 26.00 17.43285 low
## 3 1 5 26.00 33.95062 obese
## 4 2 3 26.00 15.69859 low
## 5 4 1 26.00 24.21875 high
## 6 3 4 26.00 16.84747 low
# create dichotomous variable bmi_normal asl TRUE/FALSE vector
# compute factor bmi normal vs extreme
ddf <- ddf %>% mutate(bmi_normal = NA)
ddf <- ddf %>% mutate(bmi_normal = bmi_class == 'normal')
ddf <- ddf %>% mutate(bmi_dichotom = NA)
ddf <- ddf %>% mutate(bmi_dichotom = replace(bmi_dichotom, bmi_class == 'normal', 'normal'))
ddf <- ddf %>% mutate(bmi_dichotom = replace(bmi_dichotom, bmi_class != 'normal', 'extreme'))
# make it a factor
ddf <- ddf %>% mutate(bmi_dichotom = as.factor(bmi_dichotom))
#ddf$bmi_dichotom <- as.factor(ddf$bmi_dichotom)
head(ddf)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 1 Meier 2 180 60 2.02 2 1 2
## 2 2 Müller 2 176 54 1.00 3 4 3
## 3 3 Strauß 1 180 110 2.03 5 5 5
## 4 4 Übeling 1 182 52 1.00 4 5 4
## 5 5 Ärger 1 160 62 1.00 3 2 2
## 6 6 Östro 1 156 41 1.00 3 4 3
## c_bad_way c_figure filling_time bmi bmi_class bmi_normal bmi_dichotom
## 1 4 1 39.63 18.51852 normal TRUE normal
## 2 1 5 26.00 17.43285 low FALSE extreme
## 3 1 5 26.00 33.95062 obese FALSE extreme
## 4 2 3 26.00 15.69859 low FALSE extreme
## 5 4 1 26.00 24.21875 high FALSE extreme
## 6 3 4 26.00 16.84747 low FALSE extreme
# create an inverted pacs item no 4 c_bad_way
# ddf$i_c_bad_way = 6 - ddf$c_bad_way
ddf <- ddf %>% dplyr::mutate(i_c_bad_way = 6 - c_bad_way)
# define names of items in array
#p_items <- c('c_phys_app', 'c_good_way', 'c_dress', 'i_c_bad_way', 'c_figure')
# aggregate the items
# sum the items
ddf %>% rowwise() %>% dplyr::mutate(pacs_sum = sum(c(c_phys_app, c_good_way, c_dress, i_c_bad_way, c_figure))) -> ddf
# better build mean
ddf %>% rowwise() %>% dplyr::mutate(pacs = mean(c(c_phys_app, c_good_way, c_dress, i_c_bad_way, c_figure))) -> ddf
# look at newly generated variables
head(ddf %>% select(subj, bmi:pacs))
## # A tibble: 6 x 8
## # Rowwise:
## subj bmi bmi_class bmi_normal bmi_dichotom i_c_bad_way pacs_sum pacs
## <int> <dbl> <fct> <lgl> <fct> <dbl> <dbl> <dbl>
## 1 1 18.5 normal TRUE normal 2 8 1.6
## 2 2 17.4 low FALSE extreme 5 20 4
## 3 3 34.0 obese FALSE extreme 5 25 5
## 4 4 15.7 low FALSE extreme 4 20 4
## 5 5 24.2 high FALSE extreme 2 10 2
## 6 6 16.8 low FALSE extreme 3 17 3.4
# we safe the dataframe to our working directory
# we change default values of write.data() to have the format read.delim() needs
# write.table(ddf, quote=R, sep='\t', row.names=F)
[res_end:pacs1]
dplyr()[res_begin:pacs2]
# read data
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt")
# ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt", fileEncoding = "UTF-8") # read like that if there is coding mess on your platform
# check classes
# invert negativ item
# aggregate to scale
# compute bmi
ddf["bmi"] <- ddf$weight / (ddf$height / 100) ** 2
# control data
head(ddf)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 1 Meier 2 180 60 2.02 2 1 2
## 2 2 Müller 2 176 54 1.00 3 4 3
## 3 3 Strauß 1 180 110 2.03 5 5 5
## 4 4 Übeling 1 182 52 1.00 4 5 4
## 5 5 Ärger 1 160 62 1.00 3 2 2
## 6 6 Östro 1 156 41 1.00 3 4 3
## c_bad_way c_figure filling_time bmi
## 1 4 1 39.63 18.51852
## 2 1 5 26.00 17.43285
## 3 1 5 26.00 33.95062
## 4 2 3 26.00 15.69859
## 5 4 1 26.00 24.21875
## 6 3 4 26.00 16.84747
# get all persons with bmi < 17.5
ddf$bmi[ddf$bmi < 17.5]
## [1] 17.43285 15.69859 16.84747 17.40437 16.79684 17.26354 17.30612 16.66667
## [9] 17.13138 15.82609
# or those with overweight and obese ones
ddf$bmi[ddf$bmi > 24]
## [1] 33.95062 24.21875 26.07897 27.34375 35.26476 26.02305 27.77778 25.53545
## [9] 28.08901 30.18959
ddf$bmi[ddf$bmi > 30]
## [1] 33.95062 35.26476 30.18959
# compute factor bmi class
ddf["bmi_class"] <- NA # default is missing
ddf$bmi_class[ddf$bmi < 17.5] = 'low'
ddf$bmi_class[ddf$bmi > 24] <- 'high'
ddf$bmi_class[ddf$bmi > 30] <- 'obese'
# set normal weight
ddf$bmi_class[ddf$bmi >= 17.5 & ddf$bmi <= 24] <- 'normal'
# make it a factor
ddf$bmi_class <- as.factor(ddf$bmi_class)
# control it
str(ddf$bmi_class)
## Factor w/ 4 levels "high","low","normal",..: 3 2 4 2 1 2 1 1 2 2 ...
# compute factor bmi normal vs extreme
ddf$bmi_dichotom <- NA
ddf$bmi_dichotom[ddf$bmi_class == 'normal'] <- 'normal'
ddf$bmi_dichotom[ddf$bmi_class != 'normal'] <- 'extreme'
# make it a factor
ddf$bmi_dichotom <- as.factor(ddf$bmi_dichotom)
# create dichotomous variable bmi_normal asl TRUE/FALSE vector
ddf$bmi_normal <- NA
ddf$bmi_normal[ddf$bmi_class == 'normal'] <- TRUE
ddf$bmi_normal[ddf$bmi_class != 'normal'] <- FALSE
# create an inverted pacs item no 4 c_bad_way
ddf$i_c_bad_way = 6 - ddf$c_bad_way
# define names of items in array
p_items <- c('c_phys_app', 'c_good_way', 'c_dress', 'i_c_bad_way', 'c_figure')
# aggregate the items
ddf$pacs_sum <- apply(ddf[p_items], 1, sum)
ddf$pacs <- apply(ddf[p_items], 1, mean)
# look at it
head(ddf)
## subj name gender height weight grade c_phys_app c_good_way c_dress
## 1 1 Meier 2 180 60 2.02 2 1 2
## 2 2 Müller 2 176 54 1.00 3 4 3
## 3 3 Strauß 1 180 110 2.03 5 5 5
## 4 4 Übeling 1 182 52 1.00 4 5 4
## 5 5 Ärger 1 160 62 1.00 3 2 2
## 6 6 Östro 1 156 41 1.00 3 4 3
## c_bad_way c_figure filling_time bmi bmi_class bmi_dichotom bmi_normal
## 1 4 1 39.63 18.51852 normal normal TRUE
## 2 1 5 26.00 17.43285 low extreme FALSE
## 3 1 5 26.00 33.95062 obese extreme FALSE
## 4 2 3 26.00 15.69859 low extreme FALSE
## 5 4 1 26.00 24.21875 high extreme FALSE
## 6 3 4 26.00 16.84747 low extreme FALSE
## i_c_bad_way pacs_sum pacs
## 1 2 8 1.6
## 2 5 20 4.0
## 3 5 25 5.0
## 4 4 20 4.0
## 5 2 10 2.0
## 6 3 17 3.4
# we safe the dataframe to our working directory
# we change default values of write.data() to have the format read.delim() needs
# write.table(ddf, quote=R, sep='\t', row.names=F)
[res_end:pacs2]
Das Beispielmodell:
\[ y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \beta_3x_{1i}x_{2i} + \epsilon_i \]
Interaktion zweier stetiger erklärender Variablen: Vergleichsintensität und Schulnoten als Vorhersage der Zufriedenheit mit dem eigenen Körper
Idee bzw. Fragen: - Ist die Vorhersage von bodysatisf durch die Vergleichsintensität abhängig von der schulischen Leistungsfähigkeit, also von den Schulnoten?
Äquivalenz verschiedener Formulae.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
# create new variable pacs * grade
ddf$pacs_grade <- ddf$pacs * ddf$grade
# create various models
m.prod <- lm(bodysatisf ~ pacs + grade + pacs_grade, data=ddf)
summary(m.prod)
##
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs_grade, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.121 -5.193 -0.333 4.027 17.742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.796 14.572 0.810 0.426
## pacs 18.176 3.942 4.611 9.39e-05 ***
## grade -5.560 9.084 -0.612 0.546
## pacs_grade -1.779 2.371 -0.750 0.460
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared: 0.8806, Adjusted R-squared: 0.8668
## F-statistic: 63.91 on 3 and 26 DF, p-value: 3.958e-12
m.expl <- lm(bodysatisf ~ pacs + grade + pacs:grade, data=ddf)
summary(m.expl)
##
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs:grade, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.121 -5.193 -0.333 4.027 17.742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.796 14.572 0.810 0.426
## pacs 18.176 3.942 4.611 9.39e-05 ***
## grade -5.560 9.084 -0.612 0.546
## pacs:grade -1.779 2.371 -0.750 0.460
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared: 0.8806, Adjusted R-squared: 0.8668
## F-statistic: 63.91 on 3 and 26 DF, p-value: 3.958e-12
m.impl <- lm(bodysatisf ~ pacs * grade, data=ddf)
summary(m.impl)
##
## Call:
## lm(formula = bodysatisf ~ pacs * grade, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.121 -5.193 -0.333 4.027 17.742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.796 14.572 0.810 0.426
## pacs 18.176 3.942 4.611 9.39e-05 ***
## grade -5.560 9.084 -0.612 0.546
## pacs:grade -1.779 2.371 -0.750 0.460
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared: 0.8806, Adjusted R-squared: 0.8668
## F-statistic: 63.91 on 3 and 26 DF, p-value: 3.958e-12
m.prod
##
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs_grade, data = ddf)
##
## Coefficients:
## (Intercept) pacs grade pacs_grade
## 11.796 18.176 -5.560 -1.779
m.expl
##
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs:grade, data = ddf)
##
## Coefficients:
## (Intercept) pacs grade pacs:grade
## 11.796 18.176 -5.560 -1.779
m.impl
##
## Call:
## lm(formula = bodysatisf ~ pacs * grade, data = ddf)
##
## Coefficients:
## (Intercept) pacs grade pacs:grade
## 11.796 18.176 -5.560 -1.779
# and the model tests
# create model without interaction
m.add <- lm(bodysatisf ~ pacs + grade, data=ddf)
# test of interaction via model comparison
anova(m.add, m.prod)
## Analysis of Variance Table
##
## Model 1: bodysatisf ~ pacs + grade
## Model 2: bodysatisf ~ pacs + grade + pacs_grade
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 27 1609.2
## 2 26 1575.1 1 34.094 0.5628 0.4599
anova(m.add, m.expl)
## Analysis of Variance Table
##
## Model 1: bodysatisf ~ pacs + grade
## Model 2: bodysatisf ~ pacs + grade + pacs:grade
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 27 1609.2
## 2 26 1575.1 1 34.094 0.5628 0.4599
anova(m.add, m.impl)
## Analysis of Variance Table
##
## Model 1: bodysatisf ~ pacs + grade
## Model 2: bodysatisf ~ pacs * grade
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 27 1609.2
## 2 26 1575.1 1 34.094 0.5628 0.4599
# and the default summaries
summary(m.add)
##
## Call:
## lm(formula = bodysatisf ~ pacs + grade, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.8970 -5.1541 -0.0602 3.9541 18.0670
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.096 4.842 4.563 9.85e-05 ***
## pacs 15.346 1.137 13.494 1.62e-13 ***
## grade -12.171 2.189 -5.560 6.80e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.72 on 27 degrees of freedom
## Multiple R-squared: 0.878, Adjusted R-squared: 0.869
## F-statistic: 97.16 on 2 and 27 DF, p-value: 4.629e-13
summary(m.impl)
##
## Call:
## lm(formula = bodysatisf ~ pacs * grade, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.121 -5.193 -0.333 4.027 17.742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.796 14.572 0.810 0.426
## pacs 18.176 3.942 4.611 9.39e-05 ***
## grade -5.560 9.084 -0.612 0.546
## pacs:grade -1.779 2.371 -0.750 0.460
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared: 0.8806, Adjusted R-squared: 0.8668
## F-statistic: 63.91 on 3 and 26 DF, p-value: 3.958e-12
# get standardized coefficients
require(QuantPsyc)
## Loading required package: QuantPsyc
## Loading required package: boot
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
##
## norm
lm.beta(m.add)
## pacs grade
## 0.9182542 -0.3783276
lm.beta(m.impl)
## Warning in b * sx: Länge des längeren Objektes
## ist kein Vielfaches der Länge des kürzeren Objektes
## pacs grade pacs:grade
## 1.0875504 -0.1728437 -0.1064520
rm(m.add, m.prod, m.expl, m.impl)
Hier dargestellt an dem einfachen Fall von zwei Vorhersagevariablen, so dass die Verhältnisse noch in einem dreidimensionalen Raum vorstellbar sind. Wenn ein Interaktionsterm mit im Spiel ist, ist die Vorhersage abhängig von der Kombination der Vorhersagevariablen. Halten wir Vorhersagevariable fest und lassen die andere laufen, bekommen wir eine Regressionsgerade mit einer bestimmten Steigung. Halten wir diese Vorhersagevariablen auf einem anderen Wert fest und lassen die andere laufen, bekommen wir eine Regressionsgerade mit einer anderen Steigung.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)
# take a look at the relevant dat columns
head(cbind(ddf$subj, ddf$bodysatisf, ddf$pacs, ddf$grade))
## [,1] [,2] [,3] [,4]
## [1,] 1 16 1.6 2.02
## [2,] 2 76 4.0 1.00
## [3,] 3 68 5.0 2.03
## [4,] 4 70 4.0 1.00
## [5,] 5 45 2.0 1.00
## [6,] 6 58 3.4 1.00
# we predict subj 4 'manually'
m.i$coefficients['(Intercept)'] +
ddf$pacs[4] * m.i$coefficients['pacs'] +
ddf$grade[4] * m.i$coefficients['grade'] +
ddf$pacs[4] * ddf$grade[4] * m.i$coefficients['pacs:grade']
## (Intercept)
## 71.82169
head(predict.lm(m.i))
## 1 2 3 4 5 6
## 23.89501 71.82169 73.32874 71.82169 39.02869 61.98379
head(residuals.lm(m.i))
## 1 2 3 4 5 6
## -7.895006 4.178311 -5.328739 -1.821689 5.971309 -3.983789
head(ddf$bodysatisf)
## [1] 16 76 68 70 45 58
# coefficient in model summary is inclination for mean
Mit der library(rockchalk)
df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=df)
require(rockchalk)
## Loading required package: rockchalk
##
## Attaching package: 'rockchalk'
## The following object is masked from 'package:QuantPsyc':
##
## meanCenter
## The following object is masked from 'package:MASS':
##
## mvrnorm
## The following object is masked from 'package:dplyr':
##
## summarize
plotPlane(model = m.i, plotx1 = "pacs", plotx2 = "grade")
# or to emphasize the view of the interaction
plotPlane(model = m.i, plotx1 = "pacs", plotx2 = "grade", theta = 0, phi = 10)
plotPlane(model = m.i, plotx1 = "pacs", plotx2 = "grade", drawArrows = T)
Mit rgl()
df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
require(rgl)
## Loading required package: rgl
open3d()
## glX
## 1
plot3d(x=df$pacs, y=df$grade, z=df$bodysatisf, col="red", size=4)
m.i <- lm(bodysatisf ~ pacs * grade, data=df)
grd <- expand.grid(pacs = sort(unique(df$pacs)), grade = sort(unique(df$grade)) )
grd$pred <-predict(m.i, newdata=grd)
persp3d(x=unique(grd[[1]]), y=unique(grd[[2]]),
z=matrix(grd[[3]],length(unique(grd[[1]])),length(unique(grd[[2]]))), add=TRUE)
Problem: Bei vorliegender Interaktion hängt die Steigung der Vorhersageebene (im Fall von zwei Vorhersagevariablen) ab von der Ausprägung der Variable, die als Vorhersagevariable graphisch dargestellt werden soll.
Eine Visualisierungsidee ist, die zweite Vorhersagevariable, die in die Tiefe geht, je nach Ausprägung in unterschiedlicher Farbe darzustellen.
Eine weitere Idee ist, mehrere Steigungsgeraden einzuzeichnen, eine beim Mittelwert und je eine bei +/- eine Standardabweichung um den Mittelwert.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)
require("ggplot2")
## Loading required package: ggplot2
# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
#m.s <- lm(bodysatisf ~ pacs + gender, data=ddf)
# create base plot object
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf, color=grade))
# color management
cbPalette = c('blue', 'red')
# add scatterplot and gender specific regression lines as estimated by model above
# gender has levels 0 and 1
pplot +
#geom_point(aes(size=grade)) +
geom_point() +
# scale_color_manual(values=cbPalette) +
#geom_abline(intercept = m.i$coefficients['(Intercept)'], slope = m.i$coefficients['pacs'], color=cbPalette[1])
geom_abline(intercept = mean(ddf$pacs), slope = m.i$coefficients['pacs'], color=cbPalette[1]) +
#geom_abline(intercept = m.s$coefficients['(Intercept)'] + 1 * m.s$coefficients['sex'], slope = m.s$coefficients['extro'], color=cbPalette[2])
geom_abline(intercept = mean(ddf$pacs), slope = m.i$coefficients['pacs:grade'], color=cbPalette[2])
Gut zu erkennen ist die vermeintliche Asymmetrie der Regressionsgeraden zur Punktwolke. Wenn wir uns, wie hier, nur isoliert die Regressionsgerade von bodysatisf auf pacs einzeichnen, fehlen die zusätzlichen Einflüsse von grade und der Wechselwirkung. Die “korrekte” Regressionsgerade wird also durch die beiden weiteren Einflüsse ‘moderiert,’ in unserem Fall also ‘etwas nach unten gezogen’ (s.u.).
Wir können die Daten zentrieren, um die Steigungskoeffizienten vergleichend darstellen zu können. Vorteil: Der Intercept fällt weg.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)
# we center the data for visualization
ddf$c.pacs <- ddf$pacs - mean(ddf$pacs)
ddf$c.bodysatisf <- ddf$bodysatisf - mean(ddf$bodysatisf)
ddf$c.grade <- ddf$grade - mean(ddf$grade)
m.i.c <- lm(c.bodysatisf ~ c.pacs * c.grade, data=ddf)
#c.v.pacs <- seq(-2, 2, step=.1)
c.v.pacs <- seq(-2, 2, length.out=length(ddf$pacs))
c.pred.lo <- c.v.pacs * m.i.c$coefficients['c.pacs'] +
m.i.c$coefficients['c.pacs:c.grade'] * c.v.pacs * -1
c.pred.hi <- c.v.pacs * m.i.c$coefficients['c.pacs'] +
m.i.c$coefficients['c.pacs:c.grade'] * c.v.pacs * +1
ggplot(ddf, aes(x=c.pacs, y=c.bodysatisf, color=c.grade)) +
geom_point() +
stat_smooth(aes(color=c.grade), method = "lm", se=FALSE) +
geom_abline(intercept = 0, slope = m.i.c$coefficients['c.pacs'], color="red") +
geom_line(mapping=aes(x=c.v.pacs, y=c.pred.lo), color="green") +
geom_line(mapping=aes(x=c.v.pacs, y=c.pred.hi), color="green")
## `geom_smooth()` using formula 'y ~ x'
Mehrere Regressionsgerade einzeichnen, um relevante Stellen auf der bebogenen Vorhersageebene darzustellen. Eine Möglichkeit ist, die Regressionsgeraden beim Mittelwert und eine Standardabweichung um den Mittelwert für die in die Tiefe gehende Variable auszugeben.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)
# we define some virtual pacs values to create the 'regression lines' on different levels of grade
v.pacs <- seq(min(ddf$pacs), max(ddf$pacs), length.out=length(ddf$pacs))
# we create prediction points at mean grade
pred.mean <- m.i$coefficients['(Intercept)'] +
m.i$coefficients['pacs'] * v.pacs +
m.i$coefficients['grade'] * mean(ddf$grade) +
m.i$coefficients['pacs:grade'] * v.pacs * mean(ddf$grade)
# prediction of grade of mean + 1 sd
pred.m.hi <- m.i$coefficients['(Intercept)'] +
m.i$coefficients['pacs'] * v.pacs +
m.i$coefficients['grade'] * (mean(ddf$grade) + sd(ddf$grade)) +
m.i$coefficients['pacs:grade'] * v.pacs * (mean(ddf$grade) + sd(ddf$grade))
# prediction of grade of mean - 1 sd
pred.m.lo <- m.i$coefficients['(Intercept)'] +
m.i$coefficients['pacs'] * v.pacs +
m.i$coefficients['grade'] * (mean(ddf$grade) - sd(ddf$grade)) +
m.i$coefficients['pacs:grade'] * v.pacs * (mean(ddf$grade) - sd(ddf$grade))
# l.b and h.b are limits to cut down grade values in three classes
#l.b <- mean(ddf$grade) - sd(ddf$grade)
l.b <- 1.1
h.b <- mean(ddf$grade) + sd(ddf$grade)
# we may mark grade 'regions' or 'classes' with different colors
ggplot(ddf, aes(x=pacs, y=bodysatisf)) +
geom_point(aes(color = cut(grade, c(-Inf, l.b, h.b, Inf))), size = 3) +
scale_color_manual(name = "grade", values = c("green", "black", "red"), labels = c("grade good", "medium", "poor")) +
#geom_abline(intercept = 0, slope = m.i$coefficients['pacs'], color="grey") +
geom_line(mapping=aes(x=v.pacs, y=pred.mean), color="black") +
geom_line(mapping=aes(x=v.pacs, y=pred.m.lo), color="green") +
geom_line(mapping=aes(x=v.pacs, y=pred.m.hi), color="red")
# we might prefer to use a smooth color transition
ggplot(ddf, aes(x=pacs, y=bodysatisf, color=grade)) +
#geom_point(aes(color = cut(grade, c(-Inf, l.b, h.b, Inf))), size = 3) +
#scale_color_manual(name = "grade", values = c("green", "black", "red"), labels = c("grade good", "medium", "poor")) +
geom_point() +
#geom_abline(intercept = 0, slope = m.i$coefficients['pacs'], color="grey") +
# geom_abline(intercept = mean(ddf$grade), slope = m.i$coefficients['pacs'], color="grey") +
#geom_line(mapping=aes(x=v.pacs, y=pred.mean), color="black") +
geom_line(mapping=aes(x=v.pacs, y=pred.mean), color="#5555CC") +
geom_line(mapping=aes(x=v.pacs, y=pred.m.lo), color="#444499") +
geom_line(mapping=aes(x=v.pacs, y=pred.m.hi), color="#7777FF")
rm(v.pacs, pred.mean, pred.m.lo, pred.m.hi, l.b, h.b)
regression line at mean +/- sd
Zufriedenheit mit dem eigenen Körper und Vergleichsintensität in Abhängigkeit vom Geschlecht
Idee bzw. Fragen: - Ist die Vorhersage von bodysatisf bei Frauen und Männern verschieden? - Ist die Vorhersage vielleicht unterschiedlich genau? - spezifischer: Ist die Steigung der Regressionsgeraden unterschiedlich in Abhängigkeit vom Geschlecht?
Problem: binärer (nicht stetiger) Prädiktor: Geschlecht.
Äquivalenz bei verschiedenen Ausführungsformeln. Kleiner Trick: Um die Vergleichbarkeit der Ansätze zeigen zu können bzw. um die Interpretierbarkeit zu erhöhen werden die Faktorstufen von Geschlecht auf 0 1 gesetzt.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
# we will use factor gender
ddf$gender <- factor(ddf$gender)
as.numeric(ddf$gender)
## [1] 2 2 1 1 1 1 1 1 1 2 1 2 1 2 2 1 1 2 2 1 2 2 2 2 1 2 2 2 2 1
# for better interpretation of coefficients we can use 0,1
ddf$n.gender <- as.numeric(ddf$gender) - 1
# create new variable gender * pacs
ddf$gender_pacs <- ddf$n.gender * ddf$pacs
# create various models
m.prod <- lm(bodysatisf ~ pacs + gender + gender_pacs, data=ddf)
summary(m.prod)
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender_pacs, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.1982 -5.4794 0.7841 6.2039 15.4892
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.996 6.149 4.553 0.000109 ***
## pacs 6.629 1.901 3.487 0.001753 **
## genderm -37.003 8.184 -4.521 0.000119 ***
## gender_pacs 12.468 2.429 5.133 2.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared: 0.8713, Adjusted R-squared: 0.8564
## F-statistic: 58.67 on 3 and 26 DF, p-value: 1.044e-11
m.expl <- lm(bodysatisf ~ pacs + gender + gender:pacs, data=ddf)
summary(m.expl)
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender:pacs, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.1982 -5.4794 0.7841 6.2039 15.4892
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.996 6.149 4.553 0.000109 ***
## pacs 6.629 1.901 3.487 0.001753 **
## genderm -37.003 8.184 -4.521 0.000119 ***
## pacs:genderm 12.468 2.429 5.133 2.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared: 0.8713, Adjusted R-squared: 0.8564
## F-statistic: 58.67 on 3 and 26 DF, p-value: 1.044e-11
m.impl <- lm(bodysatisf ~ pacs * gender, data=ddf)
summary(m.impl)
##
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.1982 -5.4794 0.7841 6.2039 15.4892
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.996 6.149 4.553 0.000109 ***
## pacs 6.629 1.901 3.487 0.001753 **
## genderm -37.003 8.184 -4.521 0.000119 ***
## pacs:genderm 12.468 2.429 5.133 2.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared: 0.8713, Adjusted R-squared: 0.8564
## F-statistic: 58.67 on 3 and 26 DF, p-value: 1.044e-11
m.prod
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender_pacs, data = ddf)
##
## Coefficients:
## (Intercept) pacs genderm gender_pacs
## 27.996 6.629 -37.003 12.468
m.expl
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender:pacs, data = ddf)
##
## Coefficients:
## (Intercept) pacs genderm pacs:genderm
## 27.996 6.629 -37.003 12.468
m.impl
##
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
##
## Coefficients:
## (Intercept) pacs genderm pacs:genderm
## 27.996 6.629 -37.003 12.468
rm(m.prod, m.expl, m.impl)
Die Hinzunahme eines Interaktionsterms kann geprüft werden wie die Hinzunahme einer zusätzlichen Variable, die die Multiplikation der jeweiligen Einzel-Vorhersagevariablen miteinander enthält.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
ddf$gender <- factor(ddf$gender)
ddf$gender_pacs <- (as.numeric(ddf$gender) - 1) * ddf$pacs
# create model without interaction
m.add <- lm(bodysatisf ~ pacs + gender, data=ddf)
summary(m.add)
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.782 -7.961 1.016 8.504 18.307
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.869 5.826 0.836 0.411
## pacs 14.265 1.648 8.657 2.84e-09 ***
## genderm 2.128 4.144 0.514 0.612
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.25 on 27 degrees of freedom
## Multiple R-squared: 0.7409, Adjusted R-squared: 0.7217
## F-statistic: 38.6 on 2 and 27 DF, p-value: 1.208e-08
# create various models
m.prod <- lm(bodysatisf ~ pacs + gender + gender_pacs, data=ddf)
m.expl <- lm(bodysatisf ~ pacs + gender + pacs:gender, data=ddf)
m.impl <- lm(bodysatisf ~ pacs * gender, data=ddf)
# test of interaction
anova(m.add, m.prod)
## Analysis of Variance Table
##
## Model 1: bodysatisf ~ pacs + gender
## Model 2: bodysatisf ~ pacs + gender + gender_pacs
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 27 3418.0
## 2 26 1697.7 1 1720.3 26.346 2.368e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.add, m.expl)
## Analysis of Variance Table
##
## Model 1: bodysatisf ~ pacs + gender
## Model 2: bodysatisf ~ pacs + gender + pacs:gender
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 27 3418.0
## 2 26 1697.7 1 1720.3 26.346 2.368e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.add, m.impl)
## Analysis of Variance Table
##
## Model 1: bodysatisf ~ pacs + gender
## Model 2: bodysatisf ~ pacs * gender
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 27 3418.0
## 2 26 1697.7 1 1720.3 26.346 2.368e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rm(m.add, m.prod, m.expl, m.impl)
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
# we want to use gender as predictive variable
ddf$gender <- factor(ddf$gender)
ddf$gender
## [1] m m f f f f f f f m f m f m m f f m m f m m m m f m m m m f
## Levels: f m
# gender is of type factor
class(ddf$gender)
## [1] "factor"
as.numeric(ddf$gender)
## [1] 2 2 1 1 1 1 1 1 1 2 1 2 1 2 2 1 1 2 2 1 2 2 2 2 1 2 2 2 2 1
# in order to use gender as numeric predictor, we convert it to 0,1 values
ddf$n.gender <- as.numeric(ddf$gender)
ddf$n.gender[ddf$n.gender == 2] <- 0
class(ddf$n.gender)
## [1] "numeric"
# additive model
m.ia <- lm(bodysatisf ~ pacs + n.gender, data=ddf)
summary(m.ia)
##
## Call:
## lm(formula = bodysatisf ~ pacs + n.gender, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.782 -7.961 1.016 8.504 18.307
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.997 6.140 1.140 0.264
## pacs 14.265 1.648 8.657 2.84e-09 ***
## n.gender -2.128 4.144 -0.514 0.612
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.25 on 27 degrees of freedom
## Multiple R-squared: 0.7409, Adjusted R-squared: 0.7217
## F-statistic: 38.6 on 2 and 27 DF, p-value: 1.208e-08
m.1a <- lm(bodysatisf ~ pacs + gender, data=ddf)
summary(m.1a)
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.782 -7.961 1.016 8.504 18.307
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.869 5.826 0.836 0.411
## pacs 14.265 1.648 8.657 2.84e-09 ***
## genderm 2.128 4.144 0.514 0.612
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.25 on 27 degrees of freedom
## Multiple R-squared: 0.7409, Adjusted R-squared: 0.7217
## F-statistic: 38.6 on 2 and 27 DF, p-value: 1.208e-08
# models with interaction
m.i <- lm(bodysatisf ~ pacs * n.gender, data=ddf)
summary(m.i)
##
## Call:
## lm(formula = bodysatisf ~ pacs * n.gender, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.1982 -5.4794 0.7841 6.2039 15.4892
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.007 5.401 -1.668 0.107366
## pacs 19.097 1.512 12.629 1.34e-12 ***
## n.gender 37.003 8.184 4.521 0.000119 ***
## pacs:n.gender -12.468 2.429 -5.133 2.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared: 0.8713, Adjusted R-squared: 0.8564
## F-statistic: 58.67 on 3 and 26 DF, p-value: 1.044e-11
m.1 <- lm(bodysatisf ~ pacs * gender, data=ddf)
summary(m.1)
##
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.1982 -5.4794 0.7841 6.2039 15.4892
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.996 6.149 4.553 0.000109 ***
## pacs 6.629 1.901 3.487 0.001753 **
## genderm -37.003 8.184 -4.521 0.000119 ***
## pacs:genderm 12.468 2.429 5.133 2.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared: 0.8713, Adjusted R-squared: 0.8564
## F-statistic: 58.67 on 3 and 26 DF, p-value: 1.044e-11
# old
# for better interpretation of coefficients we can use 0,1
ddf$n.gender <- as.numeric(ddf$gender) - 1
# create new variable gender * pacs
ddf$gender_pacs <- ddf$n.gender * ddf$pacs
m.i <- lm(bodysatisf ~ pacs * gender, data=ddf)
summary(m.i)
##
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.1982 -5.4794 0.7841 6.2039 15.4892
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.996 6.149 4.553 0.000109 ***
## pacs 6.629 1.901 3.487 0.001753 **
## genderm -37.003 8.184 -4.521 0.000119 ***
## pacs:genderm 12.468 2.429 5.133 2.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared: 0.8713, Adjusted R-squared: 0.8564
## F-statistic: 58.67 on 3 and 26 DF, p-value: 1.044e-11
m.i
##
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
##
## Coefficients:
## (Intercept) pacs genderm pacs:genderm
## 27.996 6.629 -37.003 12.468
#relevant values
head(ddf[c('bodysatisf', 'pacs', 'n.gender')])
## bodysatisf pacs n.gender
## 1 16 1.6 1
## 2 76 4.0 1
## 3 68 5.0 0
## 4 70 4.0 0
## 5 45 2.0 0
## 6 58 3.4 0
head(predict(m.i))
## 1 2 3 4 5 6
## 21.54714 67.37888 61.13947 54.51075 41.25332 50.53352
# we calculate first value by hand
# first observation (man)
27.996 + 6.629 * 1.6 -37.003 * 1 + 12.468 * 1.6 * 1
## [1] 21.5482
# third observation (woman)
27.996 + 6.629 * 5 -37.003 * 0 + 12.468 * 5 * 0
## [1] 61.141
# or, using the coefficients:
names(m.i$coefficients)
## [1] "(Intercept)" "pacs" "genderm" "pacs:genderm"
m.i$coefficients['(Intercept)'] + m.i$coefficients['genderm'] * 0 + m.i$coefficients['pacs'] * ddf$pacs[3] + m.i$coefficients['pacs:genderm'] * 0 * ddf$pacs[3]
## (Intercept)
## 61.13947
# rm(m.i)
m.i$coefficients enthält die Parameter des Modells: 27.9958887, 6.628716, -37.0032423, 12.4678436. Die Namen (Intercept), pacs, genderm, pacs:genderm geben uns Hinweise, wie die Parameter interpretiert werden können. genderm ist der Wert, um den ein weiblicher Wert erhöht werden muss, um zu einem männlichen Wert zu kommen. pacs:genderm ist der Wechselwirkungseffekt, der zusätzlich hinzugezählt wird.
Scatterplot mit unterschiedlichen Werten für Geschlecht und den beiden Regressionsgeraden für das jeweilige Geschlecht, ohne Interaktionsterm. Die unterschiedlichen Regressionslinien entstehen durch die Multiplikation des jeweiligen Levels von Geschlecht mit dem Gewicht für Geschlecht. Die Gewichte fügen wir ein, indem wir direkt auf die Attribute des Ergebnisobjekts zugreifen (m.s$coefficients).
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
ddf$gender_pacs <- (as.numeric(ddf$gender) - 1) * ddf$pacs
## Warning: NAs durch Umwandlung erzeugt
require("ggplot2")
# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
m.s <- lm(bodysatisf ~ pacs + gender, data=ddf)
# create base plot object
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# color management
cbPalette = c('blue', 'red')
# add scatterplot and gender specific regression lines as estimated by model above
# gender has levels 0 and 1
pplot +
geom_point(aes(color=gender, shape=gender)) +
scale_color_manual(values=cbPalette) +
geom_abline(intercept = m.s$coefficients['(Intercept)'] + 0 * m.s$coefficients['sex'], slope = m.s$coefficients['extro'], color=cbPalette[1]) +
geom_abline(intercept = m.s$coefficients['(Intercept)'] + 1 * m.s$coefficients['sex'], slope = m.s$coefficients['extro'], color=cbPalette[2])
## Warning: Removed 1 rows containing missing values (geom_abline).
## Warning: Removed 1 rows containing missing values (geom_abline).
rm(m.s)
Scatterplot mit unterschiedlichen Werten für Geschlecht und den beiden Regressionsgeraden für das jeweilige Geschlecht, inkl. Interaktionsterm. Random slope model. Die unterschiedlichen Regressionslinien entstehen durch die Multiplikation des jeweiligen Levels von Geschlecht mit dem Gewicht für Geschlecht plus der Multiplikation des jeweiligen Levels von Geschlecht mit dem Gewicht des Interaktionsterms.
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
#ddf$gender <- as.numeric(ddf$gender) - 1
require("ggplot2")
# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
m.s <- lm(bodysatisf ~ pacs + gender, data=ddf)
m.s
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender, data = ddf)
##
## Coefficients:
## (Intercept) pacs genderm
## 4.869 14.265 2.128
m.i <- lm(bodysatisf ~ pacs + gender + pacs:gender, data=ddf)
m.i
##
## Call:
## lm(formula = bodysatisf ~ pacs + gender + pacs:gender, data = ddf)
##
## Coefficients:
## (Intercept) pacs genderm pacs:genderm
## 27.996 6.629 -37.003 12.468
# create base plot object
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# color management
cbPalette = c('blue', 'red')
# add scatterplot and gender specific regression lines as estimated by model above - interaction included in model, less distance in parallel regression lines
pplot +
geom_point(aes(color=gender, shape=gender)) +
scale_color_manual(values=cbPalette) +
geom_abline(intercept = (m.s$coefficients['(Intercept)'] + 0 * m.s$coefficients['genderm']), slope = m.s$coefficients['pacs'], color=cbPalette[1]) +
geom_abline(intercept = (m.s$coefficients['(Intercept)'] + 1 * m.s$coefficients['genderm']), slope = m.s$coefficients['pacs'], color=cbPalette[2])
# add scatterplot and gender specific regression lines as estimated by model above, different slope included
pplot +
geom_point(aes(color=gender, shape=gender)) +
scale_color_manual(values=cbPalette) +
geom_abline(intercept = m.i$coefficients['(Intercept)'] + 0 * m.i$coefficients['genderm'], slope = m.i$coefficients['pacs'] + 0 * m.i$coefficients['pacs:genderm'], color=cbPalette[1]) +
geom_abline(intercept = m.i$coefficients['(Intercept)'] + 1 * m.i$coefficients['genderm'], slope = m.i$coefficients['pacs'] + 1 * m.i$coefficients['pacs:genderm'], color=cbPalette[2])
rm(m.s, m.i)
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
require("ggplot2")
# raw scatterplot
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
pplot + geom_point()
# graphical look at gender specifity. Gender must have type factor to be used adequately. This is given already
ddf$gender
## [1] "m" "m" "f" "f" "f" "f" "f" "f" "f" "m" "f" "m" "f" "m" "m" "f" "f" "m" "m"
## [20] "f" "m" "m" "m" "m" "f" "m" "m" "m" "m" "f"
# plot values in different color and shape
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# add a geom_point to base plot object
pplot + geom_point(aes(color=ddf$gender, shape=ddf$gender))
## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.
## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.
# include global regression line
pplot +
geom_point(aes(color=ddf$gender, shape=ddf$gender)) +
stat_smooth(method = "lm", se=FALSE)
## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.
## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.
## `geom_smooth()` using formula 'y ~ x'
# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
attach(ddf)
# the model
m.2 <- lm(bodysatisf ~ pacs + gender, data=ddf)
#m.e_s <- lm(time ~ extro + sex)
# create base plot object
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# color management
cbPalette = c('blue', 'red')
# add scatterplot and gender specific regression lines as estimated by model above
pplot +
geom_point(aes(color=gender, shape=gender)) +
scale_color_manual(values=cbPalette) +
geom_abline(intercept = m.2$coefficients['(Intercept)'], slope = m.2$coefficients['pacs'], color=cbPalette[1]) +
geom_abline(intercept = m.2$coefficients['(Intercept)'] + m.2$coefficients['genderm'], slope = m.2$coefficients['pacs'], color=cbPalette[2])
# include group specific regression lines - this is NOT the statistical model 'time~extro+sex' discussed here
pplot +
geom_point(aes(color=gender, shape=gender)) +
stat_smooth(aes(fill = gender), method = "lm")
## `geom_smooth()` using formula 'y ~ x'
# include both regression lines
pplot +
geom_point(aes(color=gender, shape=gender)) +
stat_smooth(method = "lm", se=FALSE) +
stat_smooth(aes(fill = gender, color=gender), method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
# include both regression lines, global confidence interval
pplot +
geom_point(aes(color=gender, shape=gender)) +
stat_smooth(method = "lm") +
stat_smooth(aes(fill = gender, color=gender), method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
# include both regression lines and confidence intervals for gender
pplot +
geom_point(aes(color=gender, shape=gender)) +
stat_smooth(method = "lm", se=FALSE) +
stat_smooth(aes(fill = gender, color=gender), method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Die Steigung der Regressionsgeraden für die Vorhersage der Zufriedenheit mit dem eigenen Körper aufgrund der Vergleichsintensität hängt also ab von der Ausprägung einer dritten Variablen - dem Geschlecht.
Hier handelt es sich um eine dichotome Variable. Allerdings gilt das Prinzip der Abhängigkeit der Vorhersage von einer dritten Variable auch für stetige Variablen.
Version: 06 Mai, 2021 07:55