Rmd

Es geht um das Phänomen, dass Daten unterschiedlicher Art und Verteilung dieselbe Korrelation produzieren können.

Sie finden die Daten zu Ascombe’s Quartet unter [http://md.psych.bio.uni-goettingen.de/mv/data/div/ascombe_quartet.txt]

Berechnen Sie die assen Sie die entsprechenden Modelle an und erstellen Sie aussagekräftig Grafiken, um das Phänomen zu demonstrieren.

[res_begin:res_aq]

Der Lösungsansatz berechnet die Korrelationskoeffizienten und erstellt Scattergramme mit der eingezeichneten Regressionsgeraden via ggplot().

# read Ascombe's quartet data
df.aq <- read.delim("http://md.psych.bio.uni-goettingen.de/mv/data/div/ascombe_quartet.txt")
require("psych")
## Loading required package: psych
psych:::describe(df.aq[grep("x",names(df.aq))])
##    vars  n mean   sd median trimmed  mad min max range skew kurtosis se
## x1    1 11    9 3.32      9       9 4.45   4  14    10 0.00    -1.53  1
## x2    2 11    9 3.32      9       9 4.45   4  14    10 0.00    -1.53  1
## x3    3 11    9 3.32      9       9 4.45   4  14    10 0.00    -1.53  1
## x4    4 11    9 3.32      8       8 0.00   8  19    11 2.47     4.52  1
psych:::describe(df.aq[grep("y",names(df.aq))])
##    vars  n mean   sd median trimmed  mad  min   max range  skew kurtosis   se
## y1    1 11  7.5 2.03   7.58    7.49 1.82 4.26 10.84  6.58 -0.05    -1.20 0.61
## y2    2 11  7.5 2.03   8.14    7.79 1.47 3.10  9.26  6.16 -0.98    -0.51 0.61
## y3    3 11  7.5 2.03   7.11    7.15 1.53 5.39 12.74  7.35  1.38     1.24 0.61
## y4    4 11  7.5 2.03   7.04    7.20 1.90 5.25 12.50  7.25  1.12     0.63 0.61
# correlations
message(paste('x1*y1: ', round(cor(df.aq['x1'], df.aq['y1'] ), digits=3)))
## x1*y1:  0.816
message(paste('x2*y2: ', round(cor(df.aq['x2'], df.aq['y2'] ), digits=3)))
## x2*y2:  0.816
message(paste('x3*y3: ', round(cor(df.aq['x3'], df.aq['y3'] ), digits=3)))
## x3*y3:  0.816
message(paste('x4*y4: ', round(cor(df.aq['x4'], df.aq['y4'] ), digits=3)))
## x4*y4:  0.817
# compare a lm
lm(x1~y1, data=df.aq)$coefficients
## (Intercept)          y1 
##  -0.9975311   1.3328426
lm(x2~y2, data=df.aq)$coefficients
## (Intercept)          y2 
##  -0.9948419   1.3324841
lm(x3~y3, data=df.aq)$coefficients
## (Intercept)          y3 
##   -1.000315    1.333375
lm(x4~y4, data=df.aq)$coefficients
## (Intercept)          y4 
##   -1.003640    1.333657
require("ggplot2")
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
# create base plot object 
pplot <- ggplot(df.aq, aes(x=x1, y=y1))
# plot the 4 it include global regression line
ggplot(df.aq, aes(x=x1, y=y1)) + geom_point(size=4) + stat_smooth(method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

ggplot(df.aq, aes(x=x2, y=y2)) + geom_point(size=4) + stat_smooth(method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

ggplot(df.aq, aes(x=x3, y=y3)) + geom_point(size=4) + stat_smooth(method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

ggplot(df.aq, aes(x=x4, y=y4)) + geom_point(size=4) + stat_smooth(method = "lm", se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

[res_end]