Chapter 9 Gruppenweise Operationen

Häufig soll dieselbe Operation für jede einzelne Subgruppe durchgeführt werden.

9.1 Base R:

require(tidyverse)
dd <- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/div/df_dplyr.txt")
## Rows: 12 Columns: 7
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): gender
## dbl (6): subj, age, grp, v1, v2, v3
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# we would like to know the mean of the fun for every gender or group

# using tapply()
tapply(dd$v1, dd$gender, mean)
##        f        m 
## 48.66667 55.50000
tapply(dd$v1, dd$grp, mean)
##     1     2     3 
## 43.75 47.50 65.00
# using a for loop and insert value for every observation
for(i in unique(dd$gender)){
  dd$ms_v1[dd$gender == i] <- mean(dd$v1[dd$gender == i])
}
## Warning: Unknown or uninitialised column: `ms_v1`.
for(i in unique(dd$grp)){
  dd$mgr_v1[dd$grp == i] <- mean(dd$v1[dd$grp == i])
}
## Warning: Unknown or uninitialised column: `mgr_v1`.

9.2 Tidyverse: dplyr::group_by()

Mit dplyr::group_by() wird eine Datentabelle (tibble) aufgeteilt in Subgruppen von Beobachtungen. Alle Berechnungen danach werden für jede Subgruppe separat durchgeführt.

require(tidyverse)
dd <- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/div/df_dplyr.txt")
## Rows: 12 Columns: 7
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): gender
## dbl (6): subj, age, grp, v1, v2, v3
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# require(dplyr)

dd %>% dplyr::group_by(gender) %>% dplyr::summarize(ms_v1 = mean(v1))
## # A tibble: 2 × 2
##   gender ms_v1
##   <chr>  <dbl>
## 1 f       48.7
## 2 m       55.5
dd %>% dplyr::group_by(grp) %>% dplyr::summarize(mg_v1 = mean(v1))
## # A tibble: 3 × 2
##     grp mg_v1
##   <dbl> <dbl>
## 1     1  43.8
## 2     2  47.5
## 3     3  65
dd %>% dplyr::group_by(gender) %>% dplyr::mutate(ms_v1 = mean(v1))
## # A tibble: 12 × 8
## # Groups:   gender [2]
##     subj gender   age   grp    v1    v2    v3 ms_v1
##    <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 f         17     1     9    16     8  48.7
##  2     2 f         33     2    67    73    66  48.7
##  3     3 f         47     3    86    87    91  48.7
##  4     4 f         10     1    64    68    67  48.7
##  5     5 f         21     2    40    46    44  48.7
##  6     6 f         30     3    26    34    28  48.7
##  7     7 m         51     1    64    66    64  55.5
##  8     8 m         13     2    61    66    64  55.5
##  9     9 m         17     3    67    67    67  55.5
## 10    10 m         25     1    38    36    35  55.5
## 11    11 m         33     2    22    25    21  55.5
## 12    12 m         27     3    81    86    81  55.5
dd %>% dplyr::group_by(grp) %>% dplyr::mutate(mg_v1 = mean(v1))
## # A tibble: 12 × 8
## # Groups:   grp [3]
##     subj gender   age   grp    v1    v2    v3 mg_v1
##    <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 f         17     1     9    16     8  43.8
##  2     2 f         33     2    67    73    66  47.5
##  3     3 f         47     3    86    87    91  65  
##  4     4 f         10     1    64    68    67  43.8
##  5     5 f         21     2    40    46    44  47.5
##  6     6 f         30     3    26    34    28  65  
##  7     7 m         51     1    64    66    64  43.8
##  8     8 m         13     2    61    66    64  47.5
##  9     9 m         17     3    67    67    67  65  
## 10    10 m         25     1    38    36    35  43.8
## 11    11 m         33     2    22    25    21  47.5
## 12    12 m         27     3    81    86    81  65

9.3 Referenzen