Chapter 9 Gruppenweise Operationen
Häufig soll dieselbe Operation für jede einzelne Subgruppe durchgeführt werden.
9.1 Base R:
require(tidyverse)
<- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/div/df_dplyr.txt") dd
## Rows: 12 Columns: 7
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): gender
## dbl (6): subj, age, grp, v1, v2, v3
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# we would like to know the mean of the fun for every gender or group
# using tapply()
tapply(dd$v1, dd$gender, mean)
## f m
## 48.66667 55.50000
tapply(dd$v1, dd$grp, mean)
## 1 2 3
## 43.75 47.50 65.00
# using a for loop and insert value for every observation
for(i in unique(dd$gender)){
$ms_v1[dd$gender == i] <- mean(dd$v1[dd$gender == i])
dd }
## Warning: Unknown or uninitialised column: `ms_v1`.
for(i in unique(dd$grp)){
$mgr_v1[dd$grp == i] <- mean(dd$v1[dd$grp == i])
dd }
## Warning: Unknown or uninitialised column: `mgr_v1`.
9.2 Tidyverse: dplyr::group_by()
Mit dplyr::group_by()
wird eine Datentabelle (tibble) aufgeteilt in Subgruppen von Beobachtungen.
Alle Berechnungen danach werden für jede Subgruppe separat durchgeführt.
require(tidyverse)
<- readr::read_tsv("https://md.psych.bio.uni-goettingen.de/mv/data/div/df_dplyr.txt") dd
## Rows: 12 Columns: 7
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): gender
## dbl (6): subj, age, grp, v1, v2, v3
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# require(dplyr)
%>% dplyr::group_by(gender) %>% dplyr::summarize(ms_v1 = mean(v1)) dd
## # A tibble: 2 × 2
## gender ms_v1
## <chr> <dbl>
## 1 f 48.7
## 2 m 55.5
%>% dplyr::group_by(grp) %>% dplyr::summarize(mg_v1 = mean(v1)) dd
## # A tibble: 3 × 2
## grp mg_v1
## <dbl> <dbl>
## 1 1 43.8
## 2 2 47.5
## 3 3 65
%>% dplyr::group_by(gender) %>% dplyr::mutate(ms_v1 = mean(v1)) dd
## # A tibble: 12 × 8
## # Groups: gender [2]
## subj gender age grp v1 v2 v3 ms_v1
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 f 17 1 9 16 8 48.7
## 2 2 f 33 2 67 73 66 48.7
## 3 3 f 47 3 86 87 91 48.7
## 4 4 f 10 1 64 68 67 48.7
## 5 5 f 21 2 40 46 44 48.7
## 6 6 f 30 3 26 34 28 48.7
## 7 7 m 51 1 64 66 64 55.5
## 8 8 m 13 2 61 66 64 55.5
## 9 9 m 17 3 67 67 67 55.5
## 10 10 m 25 1 38 36 35 55.5
## 11 11 m 33 2 22 25 21 55.5
## 12 12 m 27 3 81 86 81 55.5
%>% dplyr::group_by(grp) %>% dplyr::mutate(mg_v1 = mean(v1)) dd
## # A tibble: 12 × 8
## # Groups: grp [3]
## subj gender age grp v1 v2 v3 mg_v1
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 f 17 1 9 16 8 43.8
## 2 2 f 33 2 67 73 66 47.5
## 3 3 f 47 3 86 87 91 65
## 4 4 f 10 1 64 68 67 43.8
## 5 5 f 21 2 40 46 44 47.5
## 6 6 f 30 3 26 34 28 65
## 7 7 m 51 1 64 66 64 43.8
## 8 8 m 13 2 61 66 64 47.5
## 9 9 m 17 3 67 67 67 65
## 10 10 m 25 1 38 36 35 43.8
## 11 11 m 33 2 22 25 21 47.5
## 12 12 m 27 3 81 86 81 65
9.3 Referenzen
- Data Wrangling Cheatsheet
- Beispiele und Erklärungen: Unit transformation base
- Beispiele und Erklärungen: Unit transformation dplyr