Rmd

Erfasst wird die Intensität des Vergleichs der eigenen, körperlichen Attraktivität mit der anderer Personen.

Setting

Sie fragen sich, ob sich übergewichtige bzw. untergewichtige Personen intensivere Körpervergleiche anstellen als normalgewichtige Personen. Für Übergewicht bzw. Untergewicht hat sich international der body-mass-index (BMI) durchgesetzt. Der BMI und seine Bewertung definiert sich wie folgt:

BMI = Körpergewicht / (Körpergröße * Körpergröße)
BMI = kg / (m * m)
BMI Klassen:
BMI unter 17.5 -> Untergewicht, Anorexie
BMI über 24 -> Übergewicht
BMI über 30 -> adipös

Der maximale, bisher gemessene BMI beträgt 204 Quelle

Sie haben von einer Reihe von Personen Körpergröße und -gewicht erfasst die in der Datei v_bmi.txt aufgelistet sind. Das Ausmaß des Körpervergleichs wurde über die 5 Items der PACS erfasst.

Von der Physical Appearance Comparison Scale PACS Website:

THE PHYSICAL APPEARANCE COMPARISON SCALE (PACS)

Using the following scale please select a number that comes closest to how you feel:

Never Seldom Sometimes Often Always
1 2 3 4 5
  1. At parties or other social events, I compare my physical appearance to the physical appearance of others.
  2. The best way for a person to know if they are overweight or underweight is to compare their figure to the figure of others.
  3. At parties or other social events, I compare how I am dressed to how other people are dressed.
  4. Comparing your “looks” to the “looks” of others is a bad way to determine if you are attractive or unattractive.
  5. In social situations, I sometimes compare my figure to the figures of other people.

Item 4 is reverse-scored

Übersetzungsversuch:

  1. Auf Parties oder anderen sozialen Ereignissen vergleiche ich meine körperliches Aussehen mit dem der anderen.
  2. Ob man über- oder untergewichtig ist findet man am besten heraus, wenn man seinen Körper mit dem von anderen vergleicht.
  3. Auf Parties oder anderen sozialen Ereignissen vergleiche ich, wie ich im Vergleich mit anderen angezogen bin.
  4. Meine Aussehen mit dem anderer zu vergleichen ist nicht geeignet um herauszufinden, wie attraktiv oder unattraktiv ich bin.
  5. Manchmal vergleiche ich in sozialen Situationen meine Figur mit der von anderen.

Benennungsversuche der Skala:

  • Attraktivitätsvergleichstendenz
  • Vergleich körperlicher Attrakivität
  • Vergleichsintensität körperlicher Attraktivität
  • Vergleichsintensität der körperlichen Erscheinung
  • Intensität des Vergleichs der äußeren Erscheinung
  • Intensität des Vergleichs der eigenen äußeren Erscheinung mit der anderer

Datentransformationen

Aufgaben Transformationen

  • bilden Sie den BMI und fügen Sie ihn als neue Spalte an den DataFrame

  • bilden Sie eine neue Variable ‘bmi_class’ im Dataframe, in der die Klassifikation in ‘low,’ ‘high,’ ‘obese’ und ‘normal’ codiert wird

  • machen Sie diese Variable zu einem Faktor

  • bilden Sie eine neue Variable ‘bmi_dichotom’ als neue Spalte im Dataframe, in der die Klassifikation ‘normal’ und ‘extreme’ gemacht wird

  • bilden Sie eine neue Variable ‘bmi_normal’ als neue Spalte im Dataframe, in der die Klassifikation ‘normal’ als wahr oder falsch codiert wird

  • erstellen Sie eine neue Variable ‘pacs_sum’ in der Sie den PACS-Score als Summe der PACS-Items speichern. Achten Sie auf die Polung.

  • erstellen Sie eine neue Variable ‘pacs’ in der Sie den PACS-Score als Mittelwert der PACS-Items speichern. Achten Sie auf die Polung.

  • fügen Sie die neu berechneten Variablen an den Dataframe an

  • benutzen Sie die neu gebildeten Variablen, um sich Teil-Datensätze ausgeben zu lassen

  • speichern Sie den erweiterten Dataframe in ihrem Working Directory

Lösungen Transformationen

Lösung mit dplyr

[res_begin:pacs1]

# read data
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt")
# ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt", fileEncoding = "UTF-8")    # read like that if there is coding mess on your platform
# dplyr must be available
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# check classes
# invert negativ item
# aggregate to scale
# compute bmi
#ddf["bmi"] <- ddf$weight / (ddf$height / 100) ** 2
ddf <- ddf %>% dplyr::mutate(bmi = weight / (height / 100) ** 2)
# control data
head(ddf)
##   subj    name gender height weight grade c_phys_app c_good_way c_dress
## 1    1   Meier      2    180     60  2.02          2          1       2
## 2    2  Müller      2    176     54  1.00          3          4       3
## 3    3  Strauß      1    180    110  2.03          5          5       5
## 4    4 Übeling      1    182     52  1.00          4          5       4
## 5    5   Ärger      1    160     62  1.00          3          2       2
## 6    6   Östro      1    156     41  1.00          3          4       3
##   c_bad_way c_figure filling_time      bmi
## 1         4        1        39.63 18.51852
## 2         1        5        26.00 17.43285
## 3         1        5        26.00 33.95062
## 4         2        3        26.00 15.69859
## 5         4        1        26.00 24.21875
## 6         3        4        26.00 16.84747
# get all persons with bmi < 17.5
#ddf$bmi[ddf$bmi < 17.5]
dplyr::filter(ddf, bmi < 17.5)
##    subj    name gender height weight grade c_phys_app c_good_way c_dress
## 1     2  Müller      2    176     54  1.00          3          4       3
## 2     4 Übeling      1    182     52  1.00          4          5       4
## 3     6   Östro      1    156     41  1.00          3          4       3
## 4     9   Lully      1    159     44  2.26          4          3       3
## 5    10 Lumière      2    189     60  1.21          5          4       5
## 6    16   Grimm      1    165     47  2.06          3          2       3
## 7    18   Panik      2    175     53  1.00          4          3       3
## 8    26  Cyrill      2    180     54  2.48          4          5       5
## 9    27   Compu      2    184     58  1.00          3          4       3
## 10   29  Miller      2    183     53  1.00          5          5       4
##    c_bad_way c_figure filling_time      bmi
## 1          1        5        26.00 17.43285
## 2          2        3        26.00 15.69859
## 3          3        4        26.00 16.84747
## 4          2        4        31.41 17.40437
## 5          1        4        32.82 16.79684
## 6          4        1        48.15 17.26354
## 7          2        4        29.48 17.30612
## 8          1        5        29.56 16.66667
## 9          2        3        40.12 17.13138
## 10         1        5        26.00 15.82609
# or those with overweight and obese ones
#ddf$bmi[ddf$bmi > 24]
dplyr::filter(ddf, bmi > 24)
##    subj     name gender height weight grade c_phys_app c_good_way c_dress
## 1     3   Strauß      1    180    110  2.03          5          5       5
## 2     5    Ärger      1    160     62  1.00          3          2       2
## 3     7 Valentin      1    165     71  2.47          4          3       4
## 4     8  Marquéz      1    160     70  2.37          4          4       4
## 5    12   Russki      2    192    130  2.97          4          4       5
## 6    13  Berking      1    187     91  1.87          3          4       3
## 7    19 Ütülümek      2    180     90  1.00          3          4       4
## 8    20 Chorvátc      1    177     80  3.24          4          3       5
## 9    21   Clánku      2    179     90  1.00          3          5       4
## 10   22   Eiland      2    182    100  1.60          5          5       5
##    c_bad_way c_figure filling_time      bmi
## 1          1        5        26.00 33.95062
## 2          4        1        26.00 24.21875
## 3          2        3        26.00 26.07897
## 4          2        4        26.00 27.34375
## 5          1        3        33.33 35.26476
## 6          2        5        26.16 26.02305
## 7          2        3        26.77 27.77778
## 8          1        4       113.48 25.53545
## 9          2        4        29.63 28.08901
## 10         1        4        34.28 30.18959
#ddf$bmi[ddf$bmi > 30]
dplyr::filter(ddf, bmi > 30)
##   subj   name gender height weight grade c_phys_app c_good_way c_dress
## 1    3 Strauß      1    180    110  2.03          5          5       5
## 2   12 Russki      2    192    130  2.97          4          4       5
## 3   22 Eiland      2    182    100  1.60          5          5       5
##   c_bad_way c_figure filling_time      bmi
## 1         1        5        26.00 33.95062
## 2         1        3        33.33 35.26476
## 3         1        4        34.28 30.18959
# compute factor bmi class
ddf["bmi_class"] <- NA   # default is missing
dplyr::mutate(ddf, bmi_class = NA)
##    subj     name gender height weight grade c_phys_app c_good_way c_dress
## 1     1    Meier      2    180     60  2.02          2          1       2
## 2     2   Müller      2    176     54  1.00          3          4       3
## 3     3   Strauß      1    180    110  2.03          5          5       5
## 4     4  Übeling      1    182     52  1.00          4          5       4
## 5     5    Ärger      1    160     62  1.00          3          2       2
## 6     6    Östro      1    156     41  1.00          3          4       3
## 7     7 Valentin      1    165     71  2.47          4          3       4
## 8     8  Marquéz      1    160     70  2.37          4          4       4
## 9     9    Lully      1    159     44  2.26          4          3       3
## 10   10  Lumière      2    189     60  1.21          5          4       5
## 11   11   Bâtist      1    158     45  1.27          2          2       2
## 12   12   Russki      2    192    130  2.97          4          4       5
## 13   13  Berking      1    187     91  1.87          3          4       3
## 14   14   Møllar      2    193     86  1.62          2          2       1
## 15   15 Abdullah      2    173     56  1.00          1          2       1
## 16   16    Grimm      1    165     47  2.06          3          2       3
## 17   17 Guevarra      1    167     52  1.00          2          1       1
## 18   18    Panik      2    175     53  1.00          4          3       3
## 19   19 Ütülümek      2    180     90  1.00          3          4       4
## 20   20 Chorvátc      1    177     80  3.24          4          3       5
## 21   21   Clánku      2    179     90  1.00          3          5       4
## 22   22   Eiland      2    182    100  1.60          5          5       5
## 23   23   Kromer      2    178     60  1.60          1          1       1
## 24   24   Brauer      2    181     75  1.30          2          1       3
## 25   25 Büstisch      1    185     80  1.62          1          2       1
## 26   26   Cyrill      2    180     54  2.48          4          5       5
## 27   27    Compu      2    184     58  1.00          3          4       3
## 28   28   Messer      2    179     57  1.00          3          3       4
## 29   29   Miller      2    183     53  1.00          5          5       4
## 30   30 Terminal      1    162     54  2.15          2          1       1
##    c_bad_way c_figure filling_time      bmi bmi_class
## 1          4        1        39.63 18.51852        NA
## 2          1        5        26.00 17.43285        NA
## 3          1        5        26.00 33.95062        NA
## 4          2        3        26.00 15.69859        NA
## 5          4        1        26.00 24.21875        NA
## 6          3        4        26.00 16.84747        NA
## 7          2        3        26.00 26.07897        NA
## 8          2        4        26.00 27.34375        NA
## 9          2        4        31.41 17.40437        NA
## 10         1        4        32.82 16.79684        NA
## 11         4        2        26.00 18.02596        NA
## 12         1        3        33.33 35.26476        NA
## 13         2        5        26.16 26.02305        NA
## 14         5        1        26.00 23.08787        NA
## 15         5        2        40.00 18.71095        NA
## 16         4        1        48.15 17.26354        NA
## 17         5        3        39.05 18.64534        NA
## 18         2        4        29.48 17.30612        NA
## 19         2        3        26.77 27.77778        NA
## 20         1        4       113.48 25.53545        NA
## 21         2        4        29.63 28.08901        NA
## 22         1        4        34.28 30.18959        NA
## 23         5        1        26.00 18.93700        NA
## 24         5        2        37.44 22.89307        NA
## 25         4        2        30.51 23.37473        NA
## 26         1        5        29.56 16.66667        NA
## 27         2        3        40.12 17.13138        NA
## 28         1        5        29.99 17.78971        NA
## 29         1        5        26.00 15.82609        NA
## 30         4        1        88.70 20.57613        NA
#ddf$bmi_class[ddf$bmi < 17.5] = 'low'
ddf <-  ddf %>% mutate(bmi_class = replace(bmi_class, bmi < 17.5, 'low'))
# check what happened
head(ddf)
##   subj    name gender height weight grade c_phys_app c_good_way c_dress
## 1    1   Meier      2    180     60  2.02          2          1       2
## 2    2  Müller      2    176     54  1.00          3          4       3
## 3    3  Strauß      1    180    110  2.03          5          5       5
## 4    4 Übeling      1    182     52  1.00          4          5       4
## 5    5   Ärger      1    160     62  1.00          3          2       2
## 6    6   Östro      1    156     41  1.00          3          4       3
##   c_bad_way c_figure filling_time      bmi bmi_class
## 1         4        1        39.63 18.51852      <NA>
## 2         1        5        26.00 17.43285       low
## 3         1        5        26.00 33.95062      <NA>
## 4         2        3        26.00 15.69859       low
## 5         4        1        26.00 24.21875      <NA>
## 6         3        4        26.00 16.84747       low
# ddf$bmi_class[ddf$bmi > 24] <- 'high'
# ddf$bmi_class[ddf$bmi > 30] <- 'obese'
ddf <-  ddf %>% mutate(bmi_class = replace(bmi_class, bmi > 24, 'high'))
ddf <-  ddf %>% mutate(bmi_class = replace(bmi_class, bmi > 30, 'obese'))
# set normal weight
#ddf$bmi_class[ddf$bmi >= 17.5 & ddf$bmi <= 24] <- 'normal'
ddf <-  ddf %>% mutate(bmi_class = replace(bmi_class, bmi >= 17.5 & bmi <= 24, 'normal'))
# make it a factor
ddf <- ddf %>% mutate(bmi_class = as.factor(bmi_class))
# control it
class(ddf$bmi_class)
## [1] "factor"
str(ddf$bmi_class)
##  Factor w/ 4 levels "high","low","normal",..: 3 2 4 2 1 2 1 1 2 2 ...
head(ddf)
##   subj    name gender height weight grade c_phys_app c_good_way c_dress
## 1    1   Meier      2    180     60  2.02          2          1       2
## 2    2  Müller      2    176     54  1.00          3          4       3
## 3    3  Strauß      1    180    110  2.03          5          5       5
## 4    4 Übeling      1    182     52  1.00          4          5       4
## 5    5   Ärger      1    160     62  1.00          3          2       2
## 6    6   Östro      1    156     41  1.00          3          4       3
##   c_bad_way c_figure filling_time      bmi bmi_class
## 1         4        1        39.63 18.51852    normal
## 2         1        5        26.00 17.43285       low
## 3         1        5        26.00 33.95062     obese
## 4         2        3        26.00 15.69859       low
## 5         4        1        26.00 24.21875      high
## 6         3        4        26.00 16.84747       low
# create dichotomous variable bmi_normal asl TRUE/FALSE vector
# compute factor bmi normal vs extreme
ddf <- ddf %>% mutate(bmi_normal = NA)
ddf <- ddf %>% mutate(bmi_normal = bmi_class == 'normal')
ddf <- ddf %>% mutate(bmi_dichotom = NA)
ddf <- ddf %>% mutate(bmi_dichotom = replace(bmi_dichotom, bmi_class == 'normal', 'normal'))
ddf <- ddf %>% mutate(bmi_dichotom = replace(bmi_dichotom, bmi_class != 'normal', 'extreme'))
# make it a factor
ddf <- ddf %>% mutate(bmi_dichotom = as.factor(bmi_dichotom))
#ddf$bmi_dichotom <- as.factor(ddf$bmi_dichotom) 
head(ddf)
##   subj    name gender height weight grade c_phys_app c_good_way c_dress
## 1    1   Meier      2    180     60  2.02          2          1       2
## 2    2  Müller      2    176     54  1.00          3          4       3
## 3    3  Strauß      1    180    110  2.03          5          5       5
## 4    4 Übeling      1    182     52  1.00          4          5       4
## 5    5   Ärger      1    160     62  1.00          3          2       2
## 6    6   Östro      1    156     41  1.00          3          4       3
##   c_bad_way c_figure filling_time      bmi bmi_class bmi_normal bmi_dichotom
## 1         4        1        39.63 18.51852    normal       TRUE       normal
## 2         1        5        26.00 17.43285       low      FALSE      extreme
## 3         1        5        26.00 33.95062     obese      FALSE      extreme
## 4         2        3        26.00 15.69859       low      FALSE      extreme
## 5         4        1        26.00 24.21875      high      FALSE      extreme
## 6         3        4        26.00 16.84747       low      FALSE      extreme
# create an inverted pacs item no 4 c_bad_way
# ddf$i_c_bad_way = 6 - ddf$c_bad_way
ddf <- ddf %>% dplyr::mutate(i_c_bad_way = 6 - c_bad_way)

# define names of items in array
#p_items <- c('c_phys_app', 'c_good_way', 'c_dress', 'i_c_bad_way', 'c_figure')

# aggregate the items
# sum the items
ddf %>% rowwise() %>% dplyr::mutate(pacs_sum = sum(c(c_phys_app, c_good_way, c_dress, i_c_bad_way, c_figure))) -> ddf
# better build mean
ddf %>% rowwise() %>% dplyr::mutate(pacs = mean(c(c_phys_app, c_good_way, c_dress, i_c_bad_way, c_figure))) -> ddf

# look at newly generated variables
head(ddf %>% select(subj, bmi:pacs))
## # A tibble: 6 x 8
## # Rowwise: 
##    subj   bmi bmi_class bmi_normal bmi_dichotom i_c_bad_way pacs_sum  pacs
##   <int> <dbl> <fct>     <lgl>      <fct>              <dbl>    <dbl> <dbl>
## 1     1  18.5 normal    TRUE       normal                 2        8   1.6
## 2     2  17.4 low       FALSE      extreme                5       20   4  
## 3     3  34.0 obese     FALSE      extreme                5       25   5  
## 4     4  15.7 low       FALSE      extreme                4       20   4  
## 5     5  24.2 high      FALSE      extreme                2       10   2  
## 6     6  16.8 low       FALSE      extreme                3       17   3.4
# we safe the dataframe to our working directory
# we change default values of write.data() to have the format read.delim() needs
# write.table(ddf, quote=R, sep='\t', row.names=F)

[res_end:pacs1]

Lösung ohne dplyr()

[res_begin:pacs2]

# read data
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt")
# ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi.txt", fileEncoding = "UTF-8")    # read like that if there is coding mess on your platform
# check classes
# invert negativ item
# aggregate to scale
# compute bmi
ddf["bmi"] <- ddf$weight / (ddf$height / 100) ** 2
# control data
head(ddf)
##   subj    name gender height weight grade c_phys_app c_good_way c_dress
## 1    1   Meier      2    180     60  2.02          2          1       2
## 2    2  Müller      2    176     54  1.00          3          4       3
## 3    3  Strauß      1    180    110  2.03          5          5       5
## 4    4 Übeling      1    182     52  1.00          4          5       4
## 5    5   Ärger      1    160     62  1.00          3          2       2
## 6    6   Östro      1    156     41  1.00          3          4       3
##   c_bad_way c_figure filling_time      bmi
## 1         4        1        39.63 18.51852
## 2         1        5        26.00 17.43285
## 3         1        5        26.00 33.95062
## 4         2        3        26.00 15.69859
## 5         4        1        26.00 24.21875
## 6         3        4        26.00 16.84747
# get all persons with bmi < 17.5
ddf$bmi[ddf$bmi < 17.5]
##  [1] 17.43285 15.69859 16.84747 17.40437 16.79684 17.26354 17.30612 16.66667
##  [9] 17.13138 15.82609
# or those with overweight and obese ones
ddf$bmi[ddf$bmi > 24]
##  [1] 33.95062 24.21875 26.07897 27.34375 35.26476 26.02305 27.77778 25.53545
##  [9] 28.08901 30.18959
ddf$bmi[ddf$bmi > 30]
## [1] 33.95062 35.26476 30.18959
# compute factor bmi class
ddf["bmi_class"] <- NA   # default is missing
ddf$bmi_class[ddf$bmi < 17.5] = 'low'
ddf$bmi_class[ddf$bmi > 24] <- 'high'
ddf$bmi_class[ddf$bmi > 30] <- 'obese'
# set normal weight
ddf$bmi_class[ddf$bmi >= 17.5 & ddf$bmi <= 24] <- 'normal'
# make it a factor
ddf$bmi_class <- as.factor(ddf$bmi_class)
# control it
str(ddf$bmi_class)
##  Factor w/ 4 levels "high","low","normal",..: 3 2 4 2 1 2 1 1 2 2 ...
# compute factor bmi normal vs extreme
ddf$bmi_dichotom <- NA
ddf$bmi_dichotom[ddf$bmi_class == 'normal'] <- 'normal'
ddf$bmi_dichotom[ddf$bmi_class != 'normal'] <- 'extreme'
# make it a factor
ddf$bmi_dichotom <- as.factor(ddf$bmi_dichotom) 

# create dichotomous variable bmi_normal asl TRUE/FALSE vector
ddf$bmi_normal <- NA
ddf$bmi_normal[ddf$bmi_class == 'normal'] <- TRUE
ddf$bmi_normal[ddf$bmi_class != 'normal'] <- FALSE

# create an inverted pacs item no 4 c_bad_way
ddf$i_c_bad_way = 6 - ddf$c_bad_way

# define names of items in array
p_items <- c('c_phys_app', 'c_good_way', 'c_dress', 'i_c_bad_way', 'c_figure')

# aggregate the items
ddf$pacs_sum <- apply(ddf[p_items], 1, sum)

ddf$pacs <- apply(ddf[p_items], 1, mean)

# look at it
head(ddf)
##   subj    name gender height weight grade c_phys_app c_good_way c_dress
## 1    1   Meier      2    180     60  2.02          2          1       2
## 2    2  Müller      2    176     54  1.00          3          4       3
## 3    3  Strauß      1    180    110  2.03          5          5       5
## 4    4 Übeling      1    182     52  1.00          4          5       4
## 5    5   Ärger      1    160     62  1.00          3          2       2
## 6    6   Östro      1    156     41  1.00          3          4       3
##   c_bad_way c_figure filling_time      bmi bmi_class bmi_dichotom bmi_normal
## 1         4        1        39.63 18.51852    normal       normal       TRUE
## 2         1        5        26.00 17.43285       low      extreme      FALSE
## 3         1        5        26.00 33.95062     obese      extreme      FALSE
## 4         2        3        26.00 15.69859       low      extreme      FALSE
## 5         4        1        26.00 24.21875      high      extreme      FALSE
## 6         3        4        26.00 16.84747       low      extreme      FALSE
##   i_c_bad_way pacs_sum pacs
## 1           2        8  1.6
## 2           5       20  4.0
## 3           5       25  5.0
## 4           4       20  4.0
## 5           2       10  2.0
## 6           3       17  3.4
# we safe the dataframe to our working directory
# we change default values of write.data() to have the format read.delim() needs
# write.table(ddf, quote=R, sep='\t', row.names=F)

[res_end:pacs2]

Moderationsanalyse zwei stetige Erklärvariablen

Beispiel: Zufriedenheit mit dem eigenen Körper und Vergleichsintensität

Das Beispielmodell:

\[ y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \beta_3x_{1i}x_{2i} + \epsilon_i \]

Interaktion

Interaktion zweier stetiger erklärender Variablen: Vergleichsintensität und Schulnoten als Vorhersage der Zufriedenheit mit dem eigenen Körper

Idee bzw. Fragen: - Ist die Vorhersage von bodysatisf durch die Vergleichsintensität abhängig von der schulischen Leistungsfähigkeit, also von den Schulnoten?

Äquivalenz verschiedener Formulae.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")

# create new variable pacs * grade
ddf$pacs_grade <- ddf$pacs * ddf$grade

# create various models
m.prod <- lm(bodysatisf ~ pacs + grade + pacs_grade, data=ddf)
summary(m.prod)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs_grade, data = ddf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.121  -5.193  -0.333   4.027  17.742 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.796     14.572   0.810    0.426    
## pacs          18.176      3.942   4.611 9.39e-05 ***
## grade         -5.560      9.084  -0.612    0.546    
## pacs_grade    -1.779      2.371  -0.750    0.460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared:  0.8806, Adjusted R-squared:  0.8668 
## F-statistic: 63.91 on 3 and 26 DF,  p-value: 3.958e-12
m.expl <- lm(bodysatisf ~ pacs + grade + pacs:grade, data=ddf)
summary(m.expl)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs:grade, data = ddf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.121  -5.193  -0.333   4.027  17.742 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.796     14.572   0.810    0.426    
## pacs          18.176      3.942   4.611 9.39e-05 ***
## grade         -5.560      9.084  -0.612    0.546    
## pacs:grade    -1.779      2.371  -0.750    0.460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared:  0.8806, Adjusted R-squared:  0.8668 
## F-statistic: 63.91 on 3 and 26 DF,  p-value: 3.958e-12
m.impl <- lm(bodysatisf ~ pacs * grade, data=ddf)
summary(m.impl)
## 
## Call:
## lm(formula = bodysatisf ~ pacs * grade, data = ddf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.121  -5.193  -0.333   4.027  17.742 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.796     14.572   0.810    0.426    
## pacs          18.176      3.942   4.611 9.39e-05 ***
## grade         -5.560      9.084  -0.612    0.546    
## pacs:grade    -1.779      2.371  -0.750    0.460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared:  0.8806, Adjusted R-squared:  0.8668 
## F-statistic: 63.91 on 3 and 26 DF,  p-value: 3.958e-12
m.prod
## 
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs_grade, data = ddf)
## 
## Coefficients:
## (Intercept)         pacs        grade   pacs_grade  
##      11.796       18.176       -5.560       -1.779
m.expl
## 
## Call:
## lm(formula = bodysatisf ~ pacs + grade + pacs:grade, data = ddf)
## 
## Coefficients:
## (Intercept)         pacs        grade   pacs:grade  
##      11.796       18.176       -5.560       -1.779
m.impl
## 
## Call:
## lm(formula = bodysatisf ~ pacs * grade, data = ddf)
## 
## Coefficients:
## (Intercept)         pacs        grade   pacs:grade  
##      11.796       18.176       -5.560       -1.779
# and the model tests
# create model without interaction
m.add <- lm(bodysatisf ~ pacs + grade, data=ddf)
# test of interaction via model comparison
anova(m.add, m.prod)
## Analysis of Variance Table
## 
## Model 1: bodysatisf ~ pacs + grade
## Model 2: bodysatisf ~ pacs + grade + pacs_grade
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     27 1609.2                           
## 2     26 1575.1  1    34.094 0.5628 0.4599
anova(m.add, m.expl)
## Analysis of Variance Table
## 
## Model 1: bodysatisf ~ pacs + grade
## Model 2: bodysatisf ~ pacs + grade + pacs:grade
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     27 1609.2                           
## 2     26 1575.1  1    34.094 0.5628 0.4599
anova(m.add, m.impl)
## Analysis of Variance Table
## 
## Model 1: bodysatisf ~ pacs + grade
## Model 2: bodysatisf ~ pacs * grade
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     27 1609.2                           
## 2     26 1575.1  1    34.094 0.5628 0.4599
# and the default summaries
summary(m.add)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + grade, data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.8970  -5.1541  -0.0602   3.9541  18.0670 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   22.096      4.842   4.563 9.85e-05 ***
## pacs          15.346      1.137  13.494 1.62e-13 ***
## grade        -12.171      2.189  -5.560 6.80e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.72 on 27 degrees of freedom
## Multiple R-squared:  0.878,  Adjusted R-squared:  0.869 
## F-statistic: 97.16 on 2 and 27 DF,  p-value: 4.629e-13
summary(m.impl)
## 
## Call:
## lm(formula = bodysatisf ~ pacs * grade, data = ddf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.121  -5.193  -0.333   4.027  17.742 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   11.796     14.572   0.810    0.426    
## pacs          18.176      3.942   4.611 9.39e-05 ***
## grade         -5.560      9.084  -0.612    0.546    
## pacs:grade    -1.779      2.371  -0.750    0.460    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.783 on 26 degrees of freedom
## Multiple R-squared:  0.8806, Adjusted R-squared:  0.8668 
## F-statistic: 63.91 on 3 and 26 DF,  p-value: 3.958e-12
# get standardized coefficients
require(QuantPsyc)
## Loading required package: QuantPsyc
## Loading required package: boot
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:base':
## 
##     norm
lm.beta(m.add)
##       pacs      grade 
##  0.9182542 -0.3783276
lm.beta(m.impl)
## Warning in b * sx: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
##       pacs      grade pacs:grade 
##  1.0875504 -0.1728437 -0.1064520
rm(m.add, m.prod, m.expl, m.impl)

Vorhersage bei Modellen mit Interaktionsterm

Hier dargestellt an dem einfachen Fall von zwei Vorhersagevariablen, so dass die Verhältnisse noch in einem dreidimensionalen Raum vorstellbar sind. Wenn ein Interaktionsterm mit im Spiel ist, ist die Vorhersage abhängig von der Kombination der Vorhersagevariablen. Halten wir Vorhersagevariable fest und lassen die andere laufen, bekommen wir eine Regressionsgerade mit einer bestimmten Steigung. Halten wir diese Vorhersagevariablen auf einem anderen Wert fest und lassen die andere laufen, bekommen wir eine Regressionsgerade mit einer anderen Steigung.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)

# take a look at the relevant dat columns
head(cbind(ddf$subj, ddf$bodysatisf, ddf$pacs, ddf$grade))
##      [,1] [,2] [,3] [,4]
## [1,]    1   16  1.6 2.02
## [2,]    2   76  4.0 1.00
## [3,]    3   68  5.0 2.03
## [4,]    4   70  4.0 1.00
## [5,]    5   45  2.0 1.00
## [6,]    6   58  3.4 1.00
# we predict subj 4 'manually'
m.i$coefficients['(Intercept)'] + 
  ddf$pacs[4]  * m.i$coefficients['pacs'] +
  ddf$grade[4] * m.i$coefficients['grade'] +
  ddf$pacs[4]  * ddf$grade[4] * m.i$coefficients['pacs:grade']
## (Intercept) 
##    71.82169
head(predict.lm(m.i))
##        1        2        3        4        5        6 
## 23.89501 71.82169 73.32874 71.82169 39.02869 61.98379
head(residuals.lm(m.i))
##         1         2         3         4         5         6 
## -7.895006  4.178311 -5.328739 -1.821689  5.971309 -3.983789
head(ddf$bodysatisf)
## [1] 16 76 68 70 45 58
# coefficient in model summary is inclination for mean

Grafische Darstellung als 3D-Plot

Mit der library(rockchalk)

df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=df)
require(rockchalk)
## Loading required package: rockchalk
## 
## Attaching package: 'rockchalk'
## The following object is masked from 'package:QuantPsyc':
## 
##     meanCenter
## The following object is masked from 'package:MASS':
## 
##     mvrnorm
## The following object is masked from 'package:dplyr':
## 
##     summarize
plotPlane(model = m.i, plotx1 = "pacs", plotx2 = "grade")

# or to emphasize the view of the interaction
plotPlane(model = m.i, plotx1 = "pacs", plotx2 = "grade", theta = 0, phi = 10)

plotPlane(model = m.i, plotx1 = "pacs", plotx2 = "grade", drawArrows = T)

Mit rgl()

df <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
require(rgl)
## Loading required package: rgl
open3d()
## glX 
##   1
plot3d(x=df$pacs, y=df$grade, z=df$bodysatisf, col="red", size=4)

m.i <- lm(bodysatisf ~ pacs * grade, data=df)
grd <- expand.grid(pacs = sort(unique(df$pacs)), grade = sort(unique(df$grade)) )
grd$pred <-predict(m.i, newdata=grd)
persp3d(x=unique(grd[[1]]), y=unique(grd[[2]]), 
              z=matrix(grd[[3]],length(unique(grd[[1]])),length(unique(grd[[2]]))), add=TRUE)

Grafische Darstellung als 2D-Plot

Problem: Bei vorliegender Interaktion hängt die Steigung der Vorhersageebene (im Fall von zwei Vorhersagevariablen) ab von der Ausprägung der Variable, die als Vorhersagevariable graphisch dargestellt werden soll.

Eine Visualisierungsidee ist, die zweite Vorhersagevariable, die in die Tiefe geht, je nach Ausprägung in unterschiedlicher Farbe darzustellen.

Eine weitere Idee ist, mehrere Steigungsgeraden einzuzeichnen, eine beim Mittelwert und je eine bei +/- eine Standardabweichung um den Mittelwert.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)

require("ggplot2")
## Loading required package: ggplot2
# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
#m.s <- lm(bodysatisf ~ pacs + gender, data=ddf)
# create base plot object 
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf, color=grade))
# color management
cbPalette = c('blue', 'red')
# add scatterplot and gender specific regression lines as estimated by model above
# gender has levels 0 and 1
pplot + 
  #geom_point(aes(size=grade)) +
  geom_point() +
  # scale_color_manual(values=cbPalette) +
  #geom_abline(intercept = m.i$coefficients['(Intercept)'], slope =  m.i$coefficients['pacs'], color=cbPalette[1]) 
  geom_abline(intercept = mean(ddf$pacs), slope =  m.i$coefficients['pacs'], color=cbPalette[1]) + 
  #geom_abline(intercept = m.s$coefficients['(Intercept)'] + 1 * m.s$coefficients['sex'], slope =  m.s$coefficients['extro'], color=cbPalette[2])
  geom_abline(intercept = mean(ddf$pacs), slope =  m.i$coefficients['pacs:grade'], color=cbPalette[2]) 

Gut zu erkennen ist die vermeintliche Asymmetrie der Regressionsgeraden zur Punktwolke. Wenn wir uns, wie hier, nur isoliert die Regressionsgerade von bodysatisf auf pacs einzeichnen, fehlen die zusätzlichen Einflüsse von grade und der Wechselwirkung. Die “korrekte” Regressionsgerade wird also durch die beiden weiteren Einflüsse ‘moderiert,’ in unserem Fall also ‘etwas nach unten gezogen’ (s.u.).

Darstellung mit zentrierten Daten

Wir können die Daten zentrieren, um die Steigungskoeffizienten vergleichend darstellen zu können. Vorteil: Der Intercept fällt weg.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)

# we center the data for visualization
ddf$c.pacs <- ddf$pacs - mean(ddf$pacs)
ddf$c.bodysatisf <- ddf$bodysatisf - mean(ddf$bodysatisf)
ddf$c.grade <- ddf$grade - mean(ddf$grade)
m.i.c <- lm(c.bodysatisf ~ c.pacs * c.grade, data=ddf)

#c.v.pacs <- seq(-2, 2, step=.1)
c.v.pacs <- seq(-2, 2, length.out=length(ddf$pacs))
c.pred.lo <- c.v.pacs * m.i.c$coefficients['c.pacs'] +
   m.i.c$coefficients['c.pacs:c.grade'] * c.v.pacs * -1
c.pred.hi <- c.v.pacs * m.i.c$coefficients['c.pacs'] +
   m.i.c$coefficients['c.pacs:c.grade'] * c.v.pacs * +1

ggplot(ddf, aes(x=c.pacs, y=c.bodysatisf, color=c.grade)) +
  geom_point() +
  stat_smooth(aes(color=c.grade), method = "lm", se=FALSE) +
  geom_abline(intercept = 0, slope =  m.i.c$coefficients['c.pacs'], color="red") +
  geom_line(mapping=aes(x=c.v.pacs, y=c.pred.lo), color="green") +
  geom_line(mapping=aes(x=c.v.pacs, y=c.pred.hi), color="green")
## `geom_smooth()` using formula 'y ~ x'

Darstellung mit mehreren Regressionsgeraden

Mehrere Regressionsgerade einzeichnen, um relevante Stellen auf der bebogenen Vorhersageebene darzustellen. Eine Möglichkeit ist, die Regressionsgeraden beim Mittelwert und eine Standardabweichung um den Mittelwert für die in die Tiefe gehende Variable auszugeben.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
m.i <- lm(bodysatisf ~ pacs * grade, data=ddf)

# we define some virtual pacs values to create the 'regression lines' on different levels of grade
v.pacs <- seq(min(ddf$pacs), max(ddf$pacs), length.out=length(ddf$pacs))
# we create prediction points at mean grade
pred.mean <- m.i$coefficients['(Intercept)'] +
  m.i$coefficients['pacs'] * v.pacs +
  m.i$coefficients['grade'] * mean(ddf$grade) +
  m.i$coefficients['pacs:grade'] * v.pacs * mean(ddf$grade)
# prediction of grade of mean + 1 sd
pred.m.hi <- m.i$coefficients['(Intercept)'] +
  m.i$coefficients['pacs'] * v.pacs +
  m.i$coefficients['grade'] * (mean(ddf$grade) + sd(ddf$grade)) +
  m.i$coefficients['pacs:grade'] * v.pacs * (mean(ddf$grade) + sd(ddf$grade))
# prediction of grade of mean - 1 sd
pred.m.lo <- m.i$coefficients['(Intercept)'] +
  m.i$coefficients['pacs'] * v.pacs +
  m.i$coefficients['grade'] * (mean(ddf$grade) - sd(ddf$grade)) +
  m.i$coefficients['pacs:grade'] * v.pacs * (mean(ddf$grade) - sd(ddf$grade))

# l.b and h.b are limits to cut down grade values in three classes
#l.b <- mean(ddf$grade) - sd(ddf$grade)
l.b <- 1.1
h.b <- mean(ddf$grade) + sd(ddf$grade)
# we may mark grade 'regions' or 'classes' with different colors
ggplot(ddf, aes(x=pacs, y=bodysatisf)) +
  geom_point(aes(color = cut(grade, c(-Inf, l.b, h.b, Inf))),  size = 3) +
  scale_color_manual(name = "grade", values = c("green", "black", "red"), labels = c("grade good", "medium", "poor")) +
  #geom_abline(intercept = 0, slope =  m.i$coefficients['pacs'], color="grey") +
  geom_line(mapping=aes(x=v.pacs, y=pred.mean), color="black") +
  geom_line(mapping=aes(x=v.pacs, y=pred.m.lo), color="green") +
  geom_line(mapping=aes(x=v.pacs, y=pred.m.hi), color="red")

# we might prefer to use a smooth color transition
ggplot(ddf, aes(x=pacs, y=bodysatisf, color=grade)) +
  #geom_point(aes(color = cut(grade, c(-Inf, l.b, h.b, Inf))),  size = 3) +
  #scale_color_manual(name = "grade", values = c("green", "black", "red"), labels = c("grade good", "medium", "poor")) +
  geom_point() +
  #geom_abline(intercept = 0, slope =  m.i$coefficients['pacs'], color="grey") +
  # geom_abline(intercept = mean(ddf$grade), slope =  m.i$coefficients['pacs'], color="grey") +
  #geom_line(mapping=aes(x=v.pacs, y=pred.mean), color="black") +
  geom_line(mapping=aes(x=v.pacs, y=pred.mean), color="#5555CC") +
  geom_line(mapping=aes(x=v.pacs, y=pred.m.lo), color="#444499") +
  geom_line(mapping=aes(x=v.pacs, y=pred.m.hi), color="#7777FF")

rm(v.pacs, pred.mean, pred.m.lo, pred.m.hi, l.b, h.b)

regression line at mean +/- sd

Moderationsanalyse eine stetige und eine binäre Erklärvariable

Zufriedenheit mit dem eigenen Körper und Vergleichsintensität in Abhängigkeit vom Geschlecht

Idee bzw. Fragen: - Ist die Vorhersage von bodysatisf bei Frauen und Männern verschieden? - Ist die Vorhersage vielleicht unterschiedlich genau? - spezifischer: Ist die Steigung der Regressionsgeraden unterschiedlich in Abhängigkeit vom Geschlecht?

Problem: binärer (nicht stetiger) Prädiktor: Geschlecht.

Äquivalenz bei verschiedenen Ausführungsformeln. Kleiner Trick: Um die Vergleichbarkeit der Ansätze zeigen zu können bzw. um die Interpretierbarkeit zu erhöhen werden die Faktorstufen von Geschlecht auf 0 1 gesetzt.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
# we will use factor gender
ddf$gender <- factor(ddf$gender)
as.numeric(ddf$gender)
##  [1] 2 2 1 1 1 1 1 1 1 2 1 2 1 2 2 1 1 2 2 1 2 2 2 2 1 2 2 2 2 1
# for better interpretation of coefficients we can use 0,1 
ddf$n.gender <- as.numeric(ddf$gender) - 1
# create new variable gender * pacs
ddf$gender_pacs <- ddf$n.gender * ddf$pacs

# create various models
m.prod <- lm(bodysatisf ~ pacs + gender + gender_pacs, data=ddf)
summary(m.prod)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender_pacs, data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1982  -5.4794   0.7841   6.2039  15.4892 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   27.996      6.149   4.553 0.000109 ***
## pacs           6.629      1.901   3.487 0.001753 ** 
## genderm      -37.003      8.184  -4.521 0.000119 ***
## gender_pacs   12.468      2.429   5.133 2.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared:  0.8713, Adjusted R-squared:  0.8564 
## F-statistic: 58.67 on 3 and 26 DF,  p-value: 1.044e-11
m.expl <- lm(bodysatisf ~ pacs + gender + gender:pacs, data=ddf)
summary(m.expl)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender:pacs, data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1982  -5.4794   0.7841   6.2039  15.4892 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    27.996      6.149   4.553 0.000109 ***
## pacs            6.629      1.901   3.487 0.001753 ** 
## genderm       -37.003      8.184  -4.521 0.000119 ***
## pacs:genderm   12.468      2.429   5.133 2.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared:  0.8713, Adjusted R-squared:  0.8564 
## F-statistic: 58.67 on 3 and 26 DF,  p-value: 1.044e-11
m.impl <- lm(bodysatisf ~ pacs * gender, data=ddf)
summary(m.impl)
## 
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1982  -5.4794   0.7841   6.2039  15.4892 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    27.996      6.149   4.553 0.000109 ***
## pacs            6.629      1.901   3.487 0.001753 ** 
## genderm       -37.003      8.184  -4.521 0.000119 ***
## pacs:genderm   12.468      2.429   5.133 2.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared:  0.8713, Adjusted R-squared:  0.8564 
## F-statistic: 58.67 on 3 and 26 DF,  p-value: 1.044e-11
m.prod
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender_pacs, data = ddf)
## 
## Coefficients:
## (Intercept)         pacs      genderm  gender_pacs  
##      27.996        6.629      -37.003       12.468
m.expl
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender + gender:pacs, data = ddf)
## 
## Coefficients:
##  (Intercept)          pacs       genderm  pacs:genderm  
##       27.996         6.629       -37.003        12.468
m.impl
## 
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
## 
## Coefficients:
##  (Intercept)          pacs       genderm  pacs:genderm  
##       27.996         6.629       -37.003        12.468
rm(m.prod, m.expl, m.impl)

Die Hinzunahme eines Interaktionsterms kann geprüft werden wie die Hinzunahme einer zusätzlichen Variable, die die Multiplikation der jeweiligen Einzel-Vorhersagevariablen miteinander enthält.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
ddf$gender <- factor(ddf$gender)
ddf$gender_pacs <- (as.numeric(ddf$gender) - 1) * ddf$pacs
# create model without interaction
m.add <- lm(bodysatisf ~ pacs + gender, data=ddf)
summary(m.add)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender, data = ddf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.782  -7.961   1.016   8.504  18.307 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.869      5.826   0.836    0.411    
## pacs          14.265      1.648   8.657 2.84e-09 ***
## genderm        2.128      4.144   0.514    0.612    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.25 on 27 degrees of freedom
## Multiple R-squared:  0.7409, Adjusted R-squared:  0.7217 
## F-statistic:  38.6 on 2 and 27 DF,  p-value: 1.208e-08
# create various models
m.prod <- lm(bodysatisf ~ pacs + gender + gender_pacs, data=ddf)
m.expl <- lm(bodysatisf ~ pacs + gender + pacs:gender, data=ddf)
m.impl <- lm(bodysatisf ~ pacs * gender, data=ddf)
# test of interaction
anova(m.add, m.prod)
## Analysis of Variance Table
## 
## Model 1: bodysatisf ~ pacs + gender
## Model 2: bodysatisf ~ pacs + gender + gender_pacs
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     27 3418.0                                  
## 2     26 1697.7  1    1720.3 26.346 2.368e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.add, m.expl)
## Analysis of Variance Table
## 
## Model 1: bodysatisf ~ pacs + gender
## Model 2: bodysatisf ~ pacs + gender + pacs:gender
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     27 3418.0                                  
## 2     26 1697.7  1    1720.3 26.346 2.368e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.add, m.impl)
## Analysis of Variance Table
## 
## Model 1: bodysatisf ~ pacs + gender
## Model 2: bodysatisf ~ pacs * gender
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     27 3418.0                                  
## 2     26 1697.7  1    1720.3 26.346 2.368e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rm(m.add, m.prod, m.expl, m.impl)

Vorhersage bei Modell mit Interaktionsterm:

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
# we want to use gender as predictive variable
ddf$gender <- factor(ddf$gender)
ddf$gender
##  [1] m m f f f f f f f m f m f m m f f m m f m m m m f m m m m f
## Levels: f m
# gender is of type factor
class(ddf$gender)
## [1] "factor"
as.numeric(ddf$gender)
##  [1] 2 2 1 1 1 1 1 1 1 2 1 2 1 2 2 1 1 2 2 1 2 2 2 2 1 2 2 2 2 1
# in order to use gender as numeric predictor, we convert it to 0,1 values
ddf$n.gender <- as.numeric(ddf$gender)
ddf$n.gender[ddf$n.gender == 2] <- 0
class(ddf$n.gender)
## [1] "numeric"
# additive model
m.ia <- lm(bodysatisf ~ pacs + n.gender, data=ddf)
summary(m.ia)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + n.gender, data = ddf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.782  -7.961   1.016   8.504  18.307 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.997      6.140   1.140    0.264    
## pacs          14.265      1.648   8.657 2.84e-09 ***
## n.gender      -2.128      4.144  -0.514    0.612    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.25 on 27 degrees of freedom
## Multiple R-squared:  0.7409, Adjusted R-squared:  0.7217 
## F-statistic:  38.6 on 2 and 27 DF,  p-value: 1.208e-08
m.1a <- lm(bodysatisf ~ pacs + gender, data=ddf)
summary(m.1a)
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender, data = ddf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.782  -7.961   1.016   8.504  18.307 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.869      5.826   0.836    0.411    
## pacs          14.265      1.648   8.657 2.84e-09 ***
## genderm        2.128      4.144   0.514    0.612    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.25 on 27 degrees of freedom
## Multiple R-squared:  0.7409, Adjusted R-squared:  0.7217 
## F-statistic:  38.6 on 2 and 27 DF,  p-value: 1.208e-08
# models with interaction

m.i <- lm(bodysatisf ~ pacs * n.gender, data=ddf)
summary(m.i)
## 
## Call:
## lm(formula = bodysatisf ~ pacs * n.gender, data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1982  -5.4794   0.7841   6.2039  15.4892 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -9.007      5.401  -1.668 0.107366    
## pacs            19.097      1.512  12.629 1.34e-12 ***
## n.gender        37.003      8.184   4.521 0.000119 ***
## pacs:n.gender  -12.468      2.429  -5.133 2.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared:  0.8713, Adjusted R-squared:  0.8564 
## F-statistic: 58.67 on 3 and 26 DF,  p-value: 1.044e-11
m.1 <- lm(bodysatisf ~ pacs * gender, data=ddf)
summary(m.1)
## 
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1982  -5.4794   0.7841   6.2039  15.4892 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    27.996      6.149   4.553 0.000109 ***
## pacs            6.629      1.901   3.487 0.001753 ** 
## genderm       -37.003      8.184  -4.521 0.000119 ***
## pacs:genderm   12.468      2.429   5.133 2.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared:  0.8713, Adjusted R-squared:  0.8564 
## F-statistic: 58.67 on 3 and 26 DF,  p-value: 1.044e-11
# old

# for better interpretation of coefficients we can use 0,1 
ddf$n.gender <- as.numeric(ddf$gender) - 1
# create new variable gender * pacs
ddf$gender_pacs <- ddf$n.gender * ddf$pacs

m.i <- lm(bodysatisf ~ pacs * gender, data=ddf)
summary(m.i)
## 
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1982  -5.4794   0.7841   6.2039  15.4892 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    27.996      6.149   4.553 0.000109 ***
## pacs            6.629      1.901   3.487 0.001753 ** 
## genderm       -37.003      8.184  -4.521 0.000119 ***
## pacs:genderm   12.468      2.429   5.133 2.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.081 on 26 degrees of freedom
## Multiple R-squared:  0.8713, Adjusted R-squared:  0.8564 
## F-statistic: 58.67 on 3 and 26 DF,  p-value: 1.044e-11
m.i
## 
## Call:
## lm(formula = bodysatisf ~ pacs * gender, data = ddf)
## 
## Coefficients:
##  (Intercept)          pacs       genderm  pacs:genderm  
##       27.996         6.629       -37.003        12.468
#relevant values
head(ddf[c('bodysatisf', 'pacs', 'n.gender')])
##   bodysatisf pacs n.gender
## 1         16  1.6        1
## 2         76  4.0        1
## 3         68  5.0        0
## 4         70  4.0        0
## 5         45  2.0        0
## 6         58  3.4        0
head(predict(m.i))
##        1        2        3        4        5        6 
## 21.54714 67.37888 61.13947 54.51075 41.25332 50.53352
# we calculate first value by hand
# first observation (man)
27.996 + 6.629 * 1.6 -37.003 * 1 + 12.468 * 1.6 * 1  
## [1] 21.5482
# third observation (woman)
27.996 + 6.629 * 5 -37.003 * 0 + 12.468 * 5 * 0    
## [1] 61.141
# or, using the coefficients:
names(m.i$coefficients)
## [1] "(Intercept)"  "pacs"         "genderm"      "pacs:genderm"
m.i$coefficients['(Intercept)'] + m.i$coefficients['genderm'] * 0 +  m.i$coefficients['pacs'] * ddf$pacs[3] + m.i$coefficients['pacs:genderm'] * 0 * ddf$pacs[3]
## (Intercept) 
##    61.13947
# rm(m.i)

m.i$coefficients enthält die Parameter des Modells: 27.9958887, 6.628716, -37.0032423, 12.4678436. Die Namen (Intercept), pacs, genderm, pacs:genderm geben uns Hinweise, wie die Parameter interpretiert werden können. genderm ist der Wert, um den ein weiblicher Wert erhöht werden muss, um zu einem männlichen Wert zu kommen. pacs:genderm ist der Wechselwirkungseffekt, der zusätzlich hinzugezählt wird.

Visualisierung

Scatterplot mit unterschiedlichen Werten für Geschlecht und den beiden Regressionsgeraden für das jeweilige Geschlecht, ohne Interaktionsterm. Die unterschiedlichen Regressionslinien entstehen durch die Multiplikation des jeweiligen Levels von Geschlecht mit dem Gewicht für Geschlecht. Die Gewichte fügen wir ein, indem wir direkt auf die Attribute des Ergebnisobjekts zugreifen (m.s$coefficients).

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
ddf$gender_pacs <- (as.numeric(ddf$gender) - 1) * ddf$pacs
## Warning: NAs durch Umwandlung erzeugt
require("ggplot2")
# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
m.s <- lm(bodysatisf ~ pacs + gender, data=ddf)
# create base plot object 
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# color management
cbPalette = c('blue', 'red')
# add scatterplot and gender specific regression lines as estimated by model above
# gender has levels 0 and 1
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  scale_color_manual(values=cbPalette) +
  geom_abline(intercept = m.s$coefficients['(Intercept)'] + 0 * m.s$coefficients['sex'], slope =  m.s$coefficients['extro'], color=cbPalette[1]) + 
  geom_abline(intercept = m.s$coefficients['(Intercept)'] + 1 * m.s$coefficients['sex'], slope =  m.s$coefficients['extro'], color=cbPalette[2])
## Warning: Removed 1 rows containing missing values (geom_abline).
## Warning: Removed 1 rows containing missing values (geom_abline).

rm(m.s)

Scatterplot mit unterschiedlichen Werten für Geschlecht und den beiden Regressionsgeraden für das jeweilige Geschlecht, inkl. Interaktionsterm. Random slope model. Die unterschiedlichen Regressionslinien entstehen durch die Multiplikation des jeweiligen Levels von Geschlecht mit dem Gewicht für Geschlecht plus der Multiplikation des jeweiligen Levels von Geschlecht mit dem Gewicht des Interaktionsterms.

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
#ddf$gender <- as.numeric(ddf$gender) - 1
require("ggplot2")

# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
m.s <- lm(bodysatisf ~ pacs + gender, data=ddf)
m.s
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender, data = ddf)
## 
## Coefficients:
## (Intercept)         pacs      genderm  
##       4.869       14.265        2.128
m.i <- lm(bodysatisf ~ pacs + gender + pacs:gender, data=ddf)
m.i
## 
## Call:
## lm(formula = bodysatisf ~ pacs + gender + pacs:gender, data = ddf)
## 
## Coefficients:
##  (Intercept)          pacs       genderm  pacs:genderm  
##       27.996         6.629       -37.003        12.468
# create base plot object 
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# color management
cbPalette = c('blue', 'red')
# add scatterplot and gender specific regression lines as estimated by model above - interaction included in model, less distance in parallel regression lines
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  scale_color_manual(values=cbPalette) +
  geom_abline(intercept = (m.s$coefficients['(Intercept)'] + 0 * m.s$coefficients['genderm']), slope =  m.s$coefficients['pacs'], color=cbPalette[1]) + 
  geom_abline(intercept = (m.s$coefficients['(Intercept)'] + 1 * m.s$coefficients['genderm']), slope =  m.s$coefficients['pacs'], color=cbPalette[2]) 

# add scatterplot and gender specific regression lines as estimated by model above, different slope included
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  scale_color_manual(values=cbPalette) +
  geom_abline(intercept = m.i$coefficients['(Intercept)'] + 0 * m.i$coefficients['genderm'], slope =  m.i$coefficients['pacs'] + 0 * m.i$coefficients['pacs:genderm'], color=cbPalette[1]) + 
  geom_abline(intercept = m.i$coefficients['(Intercept)'] + 1 * m.i$coefficients['genderm'], slope =  m.i$coefficients['pacs'] + 1 * m.i$coefficients['pacs:genderm'], color=cbPalette[2]) 

rm(m.s, m.i)

Grafische Annäherung

ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
require("ggplot2")
# raw scatterplot 
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
pplot + geom_point() 

# graphical look at gender specifity. Gender must have type factor to be used adequately. This is given already
ddf$gender
##  [1] "m" "m" "f" "f" "f" "f" "f" "f" "f" "m" "f" "m" "f" "m" "m" "f" "f" "m" "m"
## [20] "f" "m" "m" "m" "m" "f" "m" "m" "m" "m" "f"
# plot values in different color and shape 
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# add a geom_point to base plot object
pplot + geom_point(aes(color=ddf$gender, shape=ddf$gender))
## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.

## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.

# include global regression line
pplot + 
  geom_point(aes(color=ddf$gender, shape=ddf$gender)) +
  stat_smooth(method = "lm", se=FALSE)
## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.

## Warning: Use of `ddf$gender` is discouraged. Use `gender` instead.
## `geom_smooth()` using formula 'y ~ x'

# include group specific regression lines without differences in slope (linear model without interaction)
# for this we need the coefficients of the model to plot
ddf <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/virt/v_bmi_preprocessed.txt", fileEncoding = "UTF-8")
attach(ddf)
# the model
m.2 <- lm(bodysatisf ~ pacs + gender, data=ddf)
#m.e_s <- lm(time ~ extro + sex)
# create base plot object 
pplot <- ggplot(ddf, aes(x=pacs, y=bodysatisf))
# color management
cbPalette = c('blue', 'red')

# add scatterplot and gender specific regression lines as estimated by model above 
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  scale_color_manual(values=cbPalette) +
  geom_abline(intercept = m.2$coefficients['(Intercept)'], slope =  m.2$coefficients['pacs'], color=cbPalette[1]) + 
  geom_abline(intercept = m.2$coefficients['(Intercept)'] + m.2$coefficients['genderm'], slope =  m.2$coefficients['pacs'], color=cbPalette[2]) 

# include group specific regression lines - this is NOT the statistical model 'time~extro+sex' discussed here
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  stat_smooth(aes(fill = gender), method = "lm")
## `geom_smooth()` using formula 'y ~ x'

# include both regression lines
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  stat_smooth(method = "lm", se=FALSE) +
  stat_smooth(aes(fill = gender, color=gender), method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

# include both regression lines, global confidence interval
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  stat_smooth(method = "lm") +
  stat_smooth(aes(fill = gender, color=gender), method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

# include both regression lines and confidence intervals for gender
pplot + 
  geom_point(aes(color=gender, shape=gender)) +
  stat_smooth(method = "lm", se=FALSE) +
  stat_smooth(aes(fill = gender, color=gender), method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Die Steigung der Regressionsgeraden für die Vorhersage der Zufriedenheit mit dem eigenen Körper aufgrund der Vergleichsintensität hängt also ab von der Ausprägung einer dritten Variablen - dem Geschlecht.

Hier handelt es sich um eine dichotome Variable. Allerdings gilt das Prinzip der Abhängigkeit der Vorhersage von einer dritten Variable auch für stetige Variablen.

Version: 06 Mai, 2021 07:55