Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Cool with statistical independence

Basic bivariate statistical tests (and false positives)

Gina Reynolds

September/October, 2019

1 / 20
2 / 20

Statistical independence

In probability theory, two events are independent, statistically independent, or stochastically independent if the occurrence of one does not affect the probability of occurrence of the other.

Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

3 / 20

"A statistically significant relationship"

  • When you have evidence of statistical dependence. The probability of observing of observing such a strong relationship --- or stronger --- under statistical independence (often the null) is quite small.

https://www.youtube.com/watch?v=gSyGVDMcg-U

4 / 20

Hypotheses

Hypothesis testing:

  • A null hypothsis: Ho (Often statistical independence)
  • Alternate hypothesis: Ha (Statistical dependence)
5 / 20

Rejecting the null hypothesis

  • Comparison will be made in relation to what we would expect to observe under the null (no relationship). Such a strong relationship is unlikely to be observed by chance in a sample, if there is not dependence.
6 / 20
  • If data deviates from expecations about null

    • we reject the null (there is no relationship - statistical independence)
    • and accept the alternative (there is a relationship - statistical dependence)
7 / 20

P-Value

The probability that a relationship as strong or stronger would have been observed under the null.

http://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/ - but they all can...

8 / 20

Fisher's exact test

number_correct = rep(NA, 1000000)
for (i in 1:1000000){
guess <- sample(c(rep("tea first", 4), rep("milk first", 4)))
truth <- sample(c(rep("tea first", 4), rep("milk first", 4)))
number_correct[i] = sum(guess == truth)
}
table(number_correct)
table(number_correct)/10000
9 / 20

Two statistically independent continuous variables

10 / 20
library(tidyverse)
set.seed(98735)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.64 -0.530
2 -0.107 -0.744
3 0.185 0.476
4 1.36 1.32
5 0.210 -0.0852
6 -0.658 -1.36
7 1.53 -1.18
8 1.40 -0.867
9 1.54 0.115
10 0.131 0.960
# … with 40 more rows
10 / 20
library(tidyverse)
set.seed(98735)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.64 -0.530
2 -0.107 -0.744
3 0.185 0.476
4 1.36 1.32
5 0.210 -0.0852
6 -0.658 -1.36
7 1.53 -1.18
8 1.40 -0.867
9 1.54 0.115
10 0.131 0.960
# … with 40 more rows

10 / 20
library(tidyverse)
set.seed(98735)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8)
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.64 -0.530
2 -0.107 -0.744
3 0.185 0.476
4 1.36 1.32
5 0.210 -0.0852
6 -0.658 -1.36
7 1.53 -1.18
8 1.40 -0.867
9 1.54 0.115
10 0.131 0.960
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.14569, df = 48, p-value = 0.8848
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2976302 0.2588382
sample estimates:
cor
-0.02102428
10 / 20
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 1.14 0.271
2 0.252 0.359
3 1.27 0.704
4 -0.332 -3.60
5 -0.555 0.0577
6 0.632 -0.812
7 0.202 -0.206
8 -0.819 -0.866
9 1.08 0.319
10 0.403 -1.62
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 0.15018, df = 48, p-value = 0.8812
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2582334 0.2982208
sample estimates:
cor
0.02167209
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 0.555 0.689
2 1.37 0.408
3 -1.20 0.0598
4 -0.824 0.0650
5 0.398 0.109
6 -0.713 -0.918
7 0.267 0.293
8 0.0750 0.308
9 0.231 -0.183
10 -0.837 0.868
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.53715, df = 48, p-value = 0.5936
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3481556 0.2054699
sample estimates:
cor
-0.07729874
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.93 0.591
2 1.37 1.60
3 -0.728 0.625
4 -0.0645 -0.812
5 0.297 -0.337
6 -0.912 -1.63
7 -1.13 -1.09
8 -2.04 -0.756
9 -1.32 1.28
10 0.409 2.17
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 1.073, df = 48, p-value = 0.2886
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1308697 0.4137734
sample estimates:
cor
0.1530533
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.231 0.291
2 -0.575 0.113
3 0.251 -1.28
4 -1.38 -3.90
5 0.129 0.405
6 0.677 -1.29
7 -0.532 -1.13
8 -0.699 -0.00250
9 -1.51 -0.845
10 0.992 -0.560
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 2.5832, df = 48, p-value = 0.01289
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.07865544 0.57207274
sample estimates:
cor
0.3493545
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.25 0.429
2 0.754 0.143
3 -0.0323 -0.173
4 0.353 -0.931
5 0.872 0.226
6 -0.963 0.293
7 0.178 0.890
8 0.694 -2.32
9 -0.717 0.470
10 0.866 -1.29
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 0.3823, df = 48, p-value = 0.7039
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2267286 0.3284075
sample estimates:
cor
0.05509622
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.390 0.383
2 -0.577 2.63
3 0.639 0.0420
4 -0.921 0.0238
5 -0.195 1.01
6 0.934 0.209
7 -1.55 0.261
8 -1.05 1.04
9 1.32 0.478
10 -1.74 -1.21
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.83665, df = 48, p-value = 0.4069
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3853765 0.1639290
sample estimates:
cor
-0.1198892
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.29 1.83
2 -0.538 -0.658
3 -0.439 0.410
4 0.884 0.383
5 -0.826 0.593
6 -0.645 -0.643
7 -0.277 0.526
8 1.33 -0.0301
9 -1.26 2.05
10 -0.329 -1.18
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.43527, df = 48, p-value = 0.6653
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3351996 0.2194761
sample estimates:
cor
-0.06270212
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 1.73 -0.403
2 0.769 0.460
3 0.625 -1.42
4 0.251 0.940
5 -0.139 -0.347
6 1.85 -1.04
7 -0.538 0.448
8 1.14 0.727
9 -0.00658 -2.39
10 0.226 0.783
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.95076, df = 48, p-value = 0.3465
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3991969 0.1479922
sample estimates:
cor
-0.135956
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.36 0.237
2 -0.674 -1.29
3 0.820 0.439
4 0.126 -1.36
5 0.531 -0.706
6 -0.255 -0.168
7 0.620 -0.152
8 0.0415 -0.0826
9 0.977 -1.27
10 1.88 1.38
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.038026, df = 48, p-value = 0.9698
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2834033 0.2732767
sample estimates:
cor
-0.005488532
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.785 0.111
2 0.674 1.56
3 0.0216 0.976
4 0.223 1.04
5 -0.575 -0.0359
6 0.796 -1.27
7 -2.24 -0.921
8 0.817 1.73
9 -2.11 2.84
10 0.453 -1.24
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 0.53597, df = 48, p-value = 0.5945
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2056319 0.3480069
sample estimates:
cor
0.07713057
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 0.631 -0.544
2 0.236 -1.47
3 -0.583 1.90
4 -0.148 -0.00957
5 -0.616 1.11
6 1.08 -1.39
7 0.325 0.557
8 -1.24 0.323
9 -0.0400 -0.637
10 -0.726 1.64
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -1.9265, df = 48, p-value = 0.05998
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.50834031 0.01129057
sample estimates:
cor
-0.2678991
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.470 -0.470
2 -1.46 -0.600
3 -0.134 -0.0555
4 -1.39 0.183
5 0.874 -1.34
6 -1.12 -1.46
7 -0.208 1.53
8 -1.01 -0.175
9 -0.209 0.402
10 -0.222 -0.242
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 1.3943, df = 48, p-value = 0.1697
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.08576768 0.45087766
sample estimates:
cor
0.19729
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.55 1.33
2 0.320 0.882
3 -0.0949 0.483
4 0.756 -1.69
5 0.804 -0.390
6 -0.251 1.00
7 -1.86 0.358
8 1.06 -0.00587
9 0.912 -0.265
10 -2.25 -0.117
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 0.52319, df = 48, p-value = 0.6032
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2073927 0.3463893
sample estimates:
cor
0.07530195
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.491 0.554
2 -0.351 -0.724
3 0.995 -0.559
4 -1.80 0.949
5 -0.798 0.337
6 0.582 -0.0307
7 -0.415 -0.574
8 -1.07 1.24
9 1.13 -0.163
10 -0.788 0.201
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 1.5728, df = 48, p-value = 0.1223
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.06071377 0.47071754
sample estimates:
cor
0.2213751
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.745 0.0542
2 -0.625 -0.667
3 0.343 -0.405
4 -0.672 1.08
5 0.631 -0.909
6 -0.312 -0.718
7 0.796 -0.437
8 0.712 0.141
9 -1.28 0.238
10 0.181 0.0345
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.43976, df = 48, p-value = 0.6621
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3357731 0.2188609
sample estimates:
cor
-0.06334581
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -1.24 0.123
2 -1.61 0.382
3 0.511 0.507
4 -0.593 0.637
5 -1.15 -0.474
6 0.228 1.22
7 -0.232 -0.509
8 -0.102 -0.244
9 0.367 -1.80
10 -0.300 0.101
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = 0.25132, df = 48, p-value = 0.8026
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2445639 0.3114566
sample estimates:
cor
0.03625162
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 1.33 1.17
2 1.09 0.724
3 -0.651 -1.40
4 -1.05 1.61
5 -0.334 -1.50
6 -0.216 -2.62
7 0.839 -0.204
8 -1.34 2.19
9 -0.578 0.490
10 -1.42 -0.466
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.22808, df = 48, p-value = 0.8206
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3084259 0.2477135
sample estimates:
cor
-0.03290287
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -2.18 0.751
2 -0.254 0.185
3 -0.484 0.471
4 0.742 -0.597
5 1.45 0.672
6 1.56 1.18
7 1.34 2.17
8 2.67 -0.410
9 0.877 0.266
10 0.0153 1.09
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.50468, df = 48, p-value = 0.6161
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3440425 0.2099411
sample estimates:
cor
-0.07265216
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.145 -0.873
2 2.29 0.622
3 -1.47 -0.162
4 -0.285 -1.73
5 -0.598 -0.736
6 0.782 0.662
7 0.102 -0.312
8 0.970 -1.42
9 0.422 0.370
10 -1.24 2.63
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -0.78761, df = 48, p-value = 0.4348
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3793744 0.1707621
sample estimates:
cor
-0.1129545
11 / 20
library(tidyverse)
tibble(tail_length = rnorm(n = 50),
brain_weight = rnorm(n = 50)) ->
my_data; my_data
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(data = my_data) +
aes(x = tail_length) +
aes(y = brain_weight) +
geom_point(color = "steelblue",
alpha = .8) +
theme(panel.background =
element_rect(color =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, "red", "black"),
size =
ifelse(cor.test(my_data$tail_length, my_data$brain_weight)[[3]]<.05, 8, 3)))
cor.test(x = my_data$tail_length,
y = my_data$brain_weight)
# A tibble: 50 x 2
tail_length brain_weight
<dbl> <dbl>
1 -0.0717 -0.736
2 0.724 0.141
3 -0.562 -1.57
4 -0.189 0.808
5 0.957 -0.553
6 -0.489 -0.244
7 1.09 0.141
8 0.437 -0.872
9 -0.0162 0.0422
10 -2.04 -0.420
# … with 40 more rows

Pearson's product-moment correlation
data: my_data$tail_length and my_data$brain_weight
t = -1.4716, df = 48, p-value = 0.1477
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.45954213 0.07490899
sample estimates:
cor
-0.2077709
11 / 20
12 / 20
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 6.05
2 female 5.76
3 female 5.18
4 female 4.40
5 female 4.66
6 male 6.07
7 female 6.66
8 female 5.78
9 female 6.39
10 male 5.47
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = 0.17273, df = 46.923, p-value = 0.8636
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.277471 0.329592
sample estimates:
mean in group female mean in group male
5.569354 5.543294
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 4.80
2 male 5.75
3 male 5.35
4 female 5.43
5 male 5.07
6 male 5.88
7 male 5.00
8 female 4.07
9 male 5.83
10 female 6.18
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.92503, df = 47.468, p-value = 0.3596
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4019456 0.1486889
sample estimates:
mean in group female mean in group male
5.414265 5.540893
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.00
2 female 5.09
3 female 5.05
4 female 5.11
5 male 4.95
6 male 5.83
7 female 5.28
8 female 5.16
9 male 5.42
10 male 5.52
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.69666, df = 46.762, p-value = 0.4895
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3692719 0.1793219
sample estimates:
mean in group female mean in group male
5.399995 5.494970
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 6.40
2 male 5.95
3 male 6.93
4 female 5.58
5 female 5.68
6 female 5.46
7 female 5.37
8 male 5.43
9 male 5.86
10 female 5.05
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -1.4178, df = 35.013, p-value = 0.1651
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.48788008 0.08663595
sample estimates:
mean in group female mean in group male
5.428950 5.629572
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.04
2 female 5.91
3 male 5.05
4 female 6.51
5 male 5.42
6 male 5.58
7 female 5.67
8 female 5.10
9 male 5.68
10 female 5.52
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = 2.6639, df = 40.441, p-value = 0.01104
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.08008769 0.58298641
sample estimates:
mean in group female mean in group male
5.671535 5.339998
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 male 5.84
2 male 6.31
3 female 5.65
4 male 5.29
5 male 5.80
6 male 5.84
7 female 5.72
8 male 5.46
9 female 5.50
10 male 5.72
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -1.0138, df = 47.933, p-value = 0.3158
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4228784 0.1393819
sample estimates:
mean in group female mean in group male
5.469518 5.611266
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 male 5.01
2 female 5.68
3 female 5.44
4 female 5.10
5 female 5.81
6 male 5.19
7 male 5.12
8 male 6.21
9 male 5.56
10 male 5.26
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -1.0251, df = 47.828, p-value = 0.3105
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4501601 0.1461517
sample estimates:
mean in group female mean in group male
5.567211 5.719215
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.66
2 female 5.72
3 female 5.63
4 female 5.48
5 male 5.47
6 male 5.89
7 male 6.00
8 female 5.57
9 male 5.41
10 male 4.77
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -1.58, df = 47.842, p-value = 0.1207
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.55526540 0.06662354
sample estimates:
mean in group female mean in group male
5.347316 5.591637
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 4.23
2 male 5.99
3 male 5.43
4 female 5.40
5 male 6.16
6 male 5.97
7 female 5.78
8 female 6.03
9 female 5.52
10 female 5.15
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.80309, df = 41.183, p-value = 0.4265
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4026129 0.1734895
sample estimates:
mean in group female mean in group male
5.387418 5.501980
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 6.20
2 male 5.04
3 female 5.22
4 female 5.69
5 female 5.97
6 female 5.64
7 male 5.48
8 female 5.64
9 male 5.33
10 female 5.83
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.19366, df = 47.782, p-value = 0.8473
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3081422 0.2540043
sample estimates:
mean in group female mean in group male
5.500872 5.527941
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.02
2 male 6.63
3 male 5.35
4 male 5.46
5 male 5.53
6 female 5.55
7 male 6.33
8 female 5.02
9 male 5.49
10 male 6.09
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -2.0574, df = 47.703, p-value = 0.04513
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.569594465 -0.006505921
sample estimates:
mean in group female mean in group male
5.491601 5.779651
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.47
2 female 5.71
3 male 6.34
4 male 5.81
5 male 4.99
6 female 4.89
7 female 6.02
8 male 5.77
9 male 5.47
10 male 5.42
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.025281, df = 48, p-value = 0.9799
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2517547 0.2455024
sample estimates:
mean in group female mean in group male
5.495759 5.498885
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.76
2 female 4.89
3 female 5.64
4 male 5.47
5 female 6.03
6 male 6.47
7 female 5.97
8 female 5.50
9 male 6.20
10 male 5.62
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.81926, df = 35.107, p-value = 0.4182
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3814890 0.1620987
sample estimates:
mean in group female mean in group male
5.550585 5.660281
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 male 5.74
2 female 4.58
3 male 5.47
4 male 5.26
5 male 5.01
6 male 4.95
7 female 5.86
8 male 5.98
9 female 5.51
10 male 6.12
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.39696, df = 47.977, p-value = 0.6932
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3193852 0.2140676
sample estimates:
mean in group female mean in group male
5.374303 5.426962
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 male 4.97
2 male 5.02
3 female 4.54
4 female 5.16
5 female 4.75
6 female 5.71
7 male 5.69
8 male 6.76
9 male 5.10
10 male 5.73
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -0.28826, df = 39.644, p-value = 0.7747
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3477257 0.2609390
sample estimates:
mean in group female mean in group male
5.460529 5.503923
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.11
2 male 6.06
3 female 5.21
4 female 4.66
5 male 5.61
6 male 5.05
7 male 5.36
8 male 5.47
9 male 5.43
10 male 5.42
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = 0.36447, df = 47.974, p-value = 0.7171
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2070206 0.2986901
sample estimates:
mean in group female mean in group male
5.447395 5.401561
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 male 6.16
2 female 5.33
3 male 6.05
4 female 5.52
5 female 5.36
6 male 4.47
7 male 5.40
8 female 5.78
9 female 5.38
10 male 4.96
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = 0.43687, df = 37.401, p-value = 0.6647
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2420049 0.3751106
sample estimates:
mean in group female mean in group male
5.467101 5.400549
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 male 5.36
2 male 4.54
3 male 5.56
4 male 5.19
5 female 5.13
6 male 5.08
7 male 6.45
8 male 4.87
9 male 4.93
10 female 5.59
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = 0.54448, df = 36.519, p-value = 0.5894
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2020496 0.3504542
sample estimates:
mean in group female mean in group male
5.442526 5.368323
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 male 5.44
2 male 5.01
3 female 5.36
4 female 5.22
5 female 5.10
6 female 5.04
7 female 6.61
8 male 6.52
9 female 5.36
10 female 5.35
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -1.5886, df = 43.93, p-value = 0.1193
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.51175262 0.06060527
sample estimates:
mean in group female mean in group male
5.504991 5.730564
13 / 20
tibble(sex = sample(x = c("male","female"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50, sd = .5) + 5.5) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# visualization
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
# statistical test
t.test(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 female 5.18
2 female 5.06
3 male 5.97
4 male 4.80
5 female 5.91
6 male 5.65
7 male 4.61
8 female 5.30
9 female 5.17
10 female 5.36
# … with 40 more rows

Welch Two Sample t-test
data: height_data$height by height_data$sex
t = -1.6498, df = 45.988, p-value = 0.1058
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.45846363 0.04544978
sample estimates:
mean in group female mean in group male
5.348123 5.554630
13 / 20

ANOVA (analysis of Variance)

14 / 20
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 2nd 0.0646
2 3rd -0.280
3 3rd 1.01
4 2nd 1.21
5 3rd 0.358
6 1st 0.483
7 2nd 0.420
8 Crew 0.739
9 3rd -1.01
10 3rd 0.688
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.1690278 -1.6428611 1.3048055 0.9899535
3rd-1st 0.1520384 -1.3724568 1.6765336 0.9933337
Crew-1st -0.2781325 -1.7785234 1.2222585 0.9600103
3rd-2nd 0.3210662 -0.6493905 1.2915229 0.8142269
Crew-2nd -0.1091047 -1.0412387 0.8230293 0.9893361
Crew-3rd -0.4301709 -1.4405049 0.5801631 0.6701215
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew 0.184
2 2nd -0.549
3 Crew 1.10
4 1st -1.08
5 Crew -0.236
6 3rd 0.260
7 2nd 0.336
8 Crew 1.44
9 2nd 0.860
10 1st -1.20
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.4340208 -0.6991114 1.5671529 0.7380964
3rd-1st 0.9033191 -0.1924056 1.9990438 0.1391734
Crew-1st 0.6144885 -0.4812362 1.7102132 0.4488395
3rd-2nd 0.4692983 -0.5718003 1.5103970 0.6290138
Crew-2nd 0.1804677 -0.8606310 1.2215664 0.9669081
Crew-3rd -0.2888306 -1.2890859 0.7114246 0.8676859
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew -1.79
2 Crew -0.276
3 2nd 0.341
4 3rd 1.48
5 3rd -0.942
6 3rd -2.00
7 2nd -0.0154
8 2nd 0.243
9 Crew 0.127
10 Crew 0.291
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.02417633 -1.087864 1.1362170 0.9999291
3rd-1st -0.21625886 -1.373707 0.9411898 0.9591147
Crew-1st -0.61673597 -1.834915 0.6014435 0.5368909
3rd-2nd -0.24043519 -1.397884 0.9170134 0.9450743
Crew-2nd -0.64091230 -1.859092 0.5772672 0.5043507
Crew-3rd -0.40047711 -1.660245 0.8592904 0.8315205
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 3rd 1.00
2 3rd 0.506
3 2nd 0.321
4 3rd -1.49
5 Crew 0.676
6 Crew -0.890
7 3rd -0.247
8 2nd 0.278
9 3rd 0.299
10 2nd 0.243
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.31775146 -1.0543358 1.689839 0.9260481
3rd-1st 0.24293466 -0.9923450 1.478214 0.9528147
Crew-1st 0.51033616 -0.8617511 1.882423 0.7550174
3rd-2nd -0.07481679 -1.2313865 1.081753 0.9981509
Crew-2nd 0.19258470 -1.1090916 1.494261 0.9789518
Crew-3rd 0.26740149 -0.8891682 1.423971 0.9263765
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 1st -0.508
2 3rd -0.794
3 2nd 0.469
4 2nd -1.14
5 Crew 0.975
6 2nd 0.762
7 3rd -1.69
8 Crew 0.0958
9 Crew -1.93
10 3rd 0.190
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.06671792 -1.0922126 1.2256484 0.9986935
3rd-1st -0.08218738 -1.2549426 1.0905678 0.9976538
Crew-1st 0.46171866 -0.8162617 1.7396990 0.7709304
3rd-2nd -0.14890529 -1.1485458 0.8507352 0.9785367
Crew-2nd 0.39500074 -0.7262308 1.5162323 0.7840762
Crew-3rd 0.54390603 -0.5916093 1.6794214 0.5821262
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 2nd -0.569
2 2nd 0.361
3 1st -0.292
4 Crew -0.483
5 2nd -0.160
6 3rd 1.58
7 2nd 0.301
8 3rd 0.224
9 2nd -0.427
10 2nd -0.445
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.4976309 -1.4783810 0.4831193 0.5350414
3rd-1st -0.3340012 -1.3719295 0.7039272 0.8263871
Crew-1st -0.1880390 -1.2087153 0.8326373 0.9606999
3rd-2nd 0.1636297 -0.7352426 1.0625020 0.9620011
Crew-2nd 0.3095919 -0.5693030 1.1884868 0.7841348
Crew-3rd 0.1459622 -0.7963121 1.0882364 0.9759842
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew -0.0726
2 3rd -1.14
3 1st 0.540
4 3rd 0.0692
5 1st -0.481
6 3rd 1.32
7 2nd 0.153
8 3rd 0.197
9 2nd 0.120
10 2nd -0.0825
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.73135897 -1.5974410 0.1347231 0.1249395
3rd-1st -0.28882958 -1.2317003 0.6540411 0.8463537
Crew-1st -0.03505996 -0.8660633 0.7959434 0.9994835
3rd-2nd 0.44252939 -0.5435484 1.4286072 0.6323153
Crew-2nd 0.69629901 -0.1834230 1.5760210 0.1652531
Crew-3rd 0.25376962 -0.7016454 1.2091846 0.8934204
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew -0.264
2 2nd -0.280
3 1st -0.191
4 3rd -0.154
5 3rd -0.902
6 3rd -1.61
7 3rd -1.01
8 2nd 0.315
9 3rd 0.762
10 1st 0.612
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.16275305 -0.9710578 1.2965639 0.9807077
3rd-1st -0.65568092 -1.7682972 0.4569353 0.4050145
Crew-1st -0.02476586 -1.2824543 1.2329226 0.9999473
3rd-2nd -0.81843397 -1.9522448 0.3153769 0.2323237
Crew-2nd -0.18751891 -1.4639954 1.0889576 0.9793780
Crew-3rd 0.63091506 -0.6267734 1.8886035 0.5445379
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew -0.178
2 1st 0.332
3 3rd -0.350
4 1st -0.0232
5 1st -0.731
6 Crew -1.50
7 2nd -1.15
8 3rd 0.671
9 1st -1.18
10 3rd 0.281
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.08690935 -0.9197195 1.0935382 0.9956444
3rd-1st 0.11918501 -0.8874438 1.1258139 0.9889722
Crew-1st -0.02914881 -1.1112423 1.0529447 0.9998652
3rd-2nd 0.03227565 -0.9928250 1.0573763 0.9997849
Crew-2nd -0.11605816 -1.2153561 0.9832398 0.9921183
Crew-3rd -0.14833382 -1.2476318 0.9509642 0.9838674
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew -0.709
2 3rd 1.27
3 Crew -1.57
4 2nd -0.763
5 1st -0.0208
6 1st -0.784
7 3rd 0.644
8 3rd 1.23
9 2nd 0.224
10 Crew 0.623
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.72308614 -1.792492 0.3463200 0.2853879
3rd-1st -0.64836299 -1.780116 0.4833901 0.4300447
Crew-1st -0.88161058 -1.951017 0.1877955 0.1391813
3rd-2nd 0.07472315 -0.994683 1.1441293 0.9976743
Crew-2nd -0.15852444 -1.161716 0.8446674 0.9745747
Crew-3rd -0.23324759 -1.302654 0.8361585 0.9371878
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew -1.17
2 1st 2.08
3 3rd -0.628
4 1st -0.872
5 3rd 0.537
6 1st -2.35
7 3rd 1.02
8 2nd -1.97
9 Crew 1.31
10 1st 0.931
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.08727205 -1.1507634 0.9762194 0.9962523
3rd-1st -0.16521339 -1.2287048 0.8982780 0.9757855
Crew-1st 0.17966865 -0.8179747 1.1773120 0.9631347
3rd-2nd -0.07794134 -1.2034348 1.0475522 0.9977354
Crew-2nd 0.26694070 -0.7965507 1.3304321 0.9082050
Crew-3rd 0.34488204 -0.7186094 1.4083734 0.8230653
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew 0.433
2 2nd -0.0234
3 2nd -1.69
4 1st 0.0612
5 1st 0.266
6 1st -0.403
7 3rd -0.762
8 Crew -1.09
9 3rd -0.0799
10 1st -0.412
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.49469057 -1.2611645 0.2717834 0.3250737
3rd-1st -0.03844713 -0.9414338 0.8645395 0.9994692
Crew-1st -0.15392662 -0.9355000 0.6276467 0.9526267
3rd-2nd 0.45624344 -0.4578916 1.3703785 0.5487530
Crew-2nd 0.34076396 -0.4536635 1.1351914 0.6649455
Crew-3rd -0.11547948 -1.0423115 0.8113525 0.9872030
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 2nd -0.793
2 3rd 0.574
3 1st 0.479
4 3rd -1.03
5 Crew 0.484
6 2nd 0.562
7 1st -0.0160
8 3rd 2.51
9 1st 0.0715
10 3rd 1.17
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.06107997 -1.0764645 1.1986244 0.9989386
3rd-1st -0.06861325 -1.1088647 0.9716382 0.9980408
Crew-1st 0.49735172 -0.6027324 1.5974358 0.6268092
3rd-2nd -0.12969321 -1.0666063 0.8072199 0.9826290
Crew-2nd 0.43627175 -0.5666580 1.4392015 0.6550294
Crew-3rd 0.56596497 -0.3250927 1.4570227 0.3389832
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew 0.689
2 Crew 1.05
3 2nd 0.769
4 1st 0.302
5 3rd 1.17
6 Crew 0.616
7 3rd 0.435
8 2nd -0.143
9 1st 1.10
10 3rd 1.72
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.2637570 -1.326638 0.7991240 0.9109553
3rd-1st 0.1261369 -1.010699 1.2629724 0.9908810
Crew-1st -0.4605313 -1.488838 0.5677751 0.6338422
3rd-2nd 0.3898939 -0.726890 1.5066778 0.7886407
Crew-2nd -0.1967743 -1.202868 0.8093198 0.9535344
Crew-3rd -0.5866682 -1.670598 0.4972619 0.4799025
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 2nd 1.03
2 1st -0.306
3 3rd 0.939
4 3rd 1.04
5 2nd -0.974
6 1st 0.0136
7 Crew 0.117
8 1st 1.13
9 Crew -0.444
10 1st 0.0134
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.5442886 -0.4866955 1.5752728 0.5014093
3rd-1st 0.1250536 -0.7715174 1.0216246 0.9822438
Crew-1st 0.3688955 -0.8731682 1.6109593 0.8578628
3rd-2nd -0.4192350 -1.3639132 0.5254432 0.6405627
Crew-2nd -0.1753931 -1.4526166 1.1018303 0.9830263
Crew-3rd 0.2438419 -0.9275722 1.4152560 0.9447585
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 2nd -0.868
2 1st -1.10
3 Crew -0.887
4 3rd -0.274
5 2nd -2.02
6 1st -1.20
7 3rd -0.0952
8 Crew 1.77
9 2nd 0.518
10 2nd 0.373
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.71301214 -1.6919907 0.2659665 0.2253828
3rd-1st -0.39014505 -1.3290118 0.5487217 0.6867008
Crew-1st -0.01296422 -0.9355957 0.9096673 0.9999808
3rd-2nd 0.32286709 -0.6379348 1.2836690 0.8070837
Crew-2nd 0.70004792 -0.2448957 1.6449915 0.2123895
Crew-3rd 0.37718083 -0.5261409 1.2805025 0.6834704
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 1st 1.24
2 3rd -2.13
3 2nd 1.14
4 Crew -0.408
5 3rd -0.528
6 3rd 0.415
7 1st -0.142
8 Crew -0.286
9 1st 0.114
10 1st -1.34
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.0832625535 -1.0657594 0.8992343 0.9958776
3rd-1st -0.4965960818 -1.4790929 0.4859007 0.5382667
Crew-1st 0.0006832311 -0.9460744 0.9474408 1.0000000
3rd-2nd -0.4133335283 -1.3958303 0.5691633 0.6783714
Crew-2nd 0.0839457846 -0.8628118 1.0307034 0.9952876
Crew-3rd 0.4972793128 -0.4494783 1.4440369 0.5057813
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 Crew 0.610
2 Crew 0.110
3 1st -1.97
4 1st 0.0303
5 3rd -0.793
6 2nd -1.08
7 Crew -0.413
8 1st -1.92
9 Crew -0.307
10 3rd -0.426
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.1580195 -0.7723181 1.0883571 0.9687642
3rd-1st -0.3578741 -1.3885239 0.6727757 0.7913457
Crew-1st 0.3603245 -0.4811975 1.2018465 0.6661884
3rd-2nd -0.5158935 -1.5076368 0.4758497 0.5140344
Crew-2nd 0.2023051 -0.5910895 0.9956996 0.9042889
Crew-3rd 0.7181986 -0.1907491 1.6271463 0.1664027
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 2nd -0.768
2 1st -0.901
3 Crew -0.449
4 3rd -0.306
5 3rd -1.12
6 2nd -0.625
7 3rd 0.0779
8 Crew 0.0111
9 2nd -0.747
10 2nd -0.825
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st -0.47334284 -1.3710309 0.4243452 0.5024502
3rd-1st -0.29575403 -1.1934421 0.6019340 0.8161022
Crew-1st -0.26589576 -1.1454468 0.6136553 0.8513662
3rd-2nd 0.17758881 -0.7378769 1.0930546 0.9545842
Crew-2nd 0.20744708 -0.6902410 1.1051351 0.9264729
Crew-3rd 0.02985827 -0.8678298 0.9275463 0.9997464
15 / 20
tibble(sex = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(height = rnorm(n = 50)) ->
height_data; height_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
ggplot(height_data) +
aes(x = sex) +
aes(y = height) +
aes(group = sex) +
geom_boxplot() +
geom_jitter(height = 0,
width = .02) +
stat_summary(fun.y = mean,
geom = "point",
col = "goldenrod3",
size = 5)
#
#
#
#
#
#
#
#
#
#
#
#
# statistical test
TukeyHSD(aov(height_data$height ~ # the continuous variable by (~) # the continuous variable by (~)
height_data$sex)) # the discrete variable
# A tibble: 50 x 2
sex height
<chr> <dbl>
1 2nd 0.233
2 2nd -0.927
3 1st 0.840
4 1st 0.101
5 1st -0.175
6 3rd -0.687
7 1st -2.54
8 3rd -2.06
9 3rd 0.858
10 2nd 0.351
# … with 40 more rows

Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = height_data$height ~ height_data$sex)
$`height_data$sex`
diff lwr upr p adj
2nd-1st 0.27839893 -0.9749367 1.531735 0.9339733
3rd-1st 0.15182387 -1.0293735 1.333021 0.9859901
Crew-1st 0.19796508 -1.0040174 1.399948 0.9713849
3rd-2nd -0.12657506 -1.3363374 1.083187 0.9923242
Crew-2nd -0.08043385 -1.3104988 1.149631 0.9980906
Crew-3rd 0.04614121 -1.1103348 1.202617 0.9995630
15 / 20

Chi-squared test

χ

Two-by-two

16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female right
2 female left
3 male right
4 male right
5 female right
6 female right
7 male right
8 male right
9 female right
10 female left
# … with 90 more rows

y
x left right
female 16 35
male 13 36
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.09797, df = 1, p-value = 0.7543
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male right
2 male right
3 female right
4 male right
5 male right
6 male right
7 male right
8 male left
9 male left
10 female right
# … with 90 more rows

y
x left right
female 12 31
male 24 33
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 1.5726, df = 1, p-value = 0.2098
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female right
2 female left
3 male right
4 male right
5 male right
6 male right
7 male right
8 male left
9 male right
10 male left
# … with 90 more rows

y
x left right
female 16 39
male 21 24
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 2.5692, df = 1, p-value = 0.109
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male left
2 male left
3 female right
4 female right
5 female right
6 male left
7 male right
8 male right
9 male left
10 female left
# … with 90 more rows

y
x left right
female 13 40
male 15 32
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.35756, df = 1, p-value = 0.5499
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female right
2 female right
3 male right
4 female right
5 male right
6 male right
7 male right
8 female left
9 male right
10 male right
# … with 90 more rows

y
x left right
female 8 43
male 13 36
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 1.1781, df = 1, p-value = 0.2777
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female left
2 female left
3 female right
4 male left
5 female right
6 female right
7 male right
8 male left
9 male right
10 female left
# … with 90 more rows

y
x left right
female 18 33
male 15 34
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.081245, df = 1, p-value = 0.7756
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male right
2 female right
3 male right
4 female right
5 female left
6 male right
7 male left
8 female left
9 male right
10 female right
# … with 90 more rows

y
x left right
female 10 34
male 16 40
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.18638, df = 1, p-value = 0.6659
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female left
2 female right
3 female left
4 female right
5 female left
6 male left
7 female right
8 female right
9 female left
10 female left
# … with 90 more rows

y
x left right
female 21 33
male 13 33
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.82159, df = 1, p-value = 0.3647
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male left
2 male right
3 female left
4 male left
5 male right
6 female right
7 female right
8 female left
9 male right
10 male right
# … with 90 more rows

y
x left right
female 22 33
male 14 31
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.5068, df = 1, p-value = 0.4765
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female right
2 female right
3 female right
4 male right
5 male left
6 female left
7 male right
8 female right
9 female left
10 female right
# … with 90 more rows

y
x left right
female 16 35
male 20 29
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.60087, df = 1, p-value = 0.4382
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male right
2 male left
3 male left
4 female left
5 male right
6 female left
7 male right
8 female right
9 female left
10 male right
# … with 90 more rows

y
x left right
female 19 35
male 18 28
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.039791, df = 1, p-value = 0.8419
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female right
2 female right
3 female right
4 female left
5 female right
6 female right
7 female right
8 female right
9 female right
10 female left
# … with 90 more rows

y
x left right
female 14 46
male 13 27
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.61094, df = 1, p-value = 0.4344
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male right
2 male right
3 male right
4 male right
5 female left
6 male left
7 male right
8 male right
9 female right
10 female left
# … with 90 more rows

y
x left right
female 15 34
male 15 36
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 1.6836e-30, df = 1, p-value = 1
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male right
2 female right
3 female right
4 female right
5 female right
6 female left
7 male left
8 male right
9 male right
10 female right
# … with 90 more rows

y
x left right
female 19 35
male 14 32
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.084193, df = 1, p-value = 0.7717
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male right
2 female left
3 male right
4 male right
5 male left
6 female right
7 male right
8 female right
9 female left
10 male right
# … with 90 more rows

y
x left right
female 11 35
male 16 38
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.17288, df = 1, p-value = 0.6776
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female right
2 female right
3 female right
4 female left
5 male left
6 male right
7 female right
8 male right
9 male left
10 female right
# … with 90 more rows

y
x left right
female 13 44
male 14 29
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.73942, df = 1, p-value = 0.3898
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male left
2 female right
3 male left
4 female right
5 male right
6 male left
7 female right
8 male right
9 female right
10 female left
# … with 90 more rows

y
x left right
female 9 38
male 22 31
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 4.8243, df = 1, p-value = 0.02806
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 male right
2 female right
3 male left
4 male right
5 male right
6 male right
7 female right
8 female right
9 male right
10 female right
# … with 90 more rows

y
x left right
female 8 33
male 17 42
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.67521, df = 1, p-value = 0.4112
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female left
2 female left
3 male left
4 male right
5 male right
6 male right
7 female left
8 male right
9 male left
10 male left
# … with 90 more rows

y
x left right
female 20 29
male 11 40
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 3.4752, df = 1, p-value = 0.0623
16 / 20
library(tidyverse)
tibble(sex = sample(x = c("male","female"),
size = 100,
replace = T),
handed = sample(x = c("left", "right"),
size = 100,
replace = T,
prob = c(.3,.7))) ->
my_data ; my_data
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = my_data) +
geom_mosaic(aes(x = product(sex),
fill = handed))
#
#
#
#
#
#
#
#
#
# contingency table or two by two table
table(x = my_data$sex, y = my_data$handed)
#
#
#
#
#
chisq.test(x = my_data$sex,
y = my_data$handed)
# A tibble: 100 x 2
sex handed
<chr> <chr>
1 female right
2 male right
3 female right
4 male right
5 male right
6 female right
7 female right
8 male right
9 female left
10 female right
# … with 90 more rows

y
x left right
female 13 41
male 9 37
Pearson's Chi-squared test with Yates' continuity correction
data: my_data$sex and my_data$handed
X-squared = 0.090181, df = 1, p-value = 0.7639
16 / 20
17 / 20
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk milk
2 milk milk
3 milk tea
4 tea tea
5 milk tea
6 tea tea
7 tea milk
8 tea milk

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea milk
2 tea tea
3 milk milk
4 milk milk
5 milk tea
6 milk milk
7 tea tea
8 tea tea

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea tea
2 tea milk
3 tea milk
4 milk milk
5 tea tea
6 milk tea
7 milk tea
8 milk milk

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk tea
2 tea tea
3 milk milk
4 tea milk
5 milk tea
6 tea milk
7 tea tea
8 milk milk

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk milk
2 milk tea
3 milk milk
4 tea milk
5 tea tea
6 tea milk
7 tea tea
8 milk tea

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk milk
2 milk tea
3 tea milk
4 milk tea
5 tea tea
6 tea milk
7 milk milk
8 tea tea

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea tea
2 milk tea
3 tea tea
4 tea milk
5 milk milk
6 milk milk
7 tea tea
8 milk milk

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea tea
2 milk tea
3 milk tea
4 tea milk
5 milk milk
6 tea milk
7 milk tea
8 tea milk

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea tea
2 milk milk
3 milk tea
4 tea milk
5 tea tea
6 tea tea
7 milk milk
8 milk milk

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk milk
2 milk tea
3 tea tea
4 milk milk
5 milk milk
6 tea tea
7 tea tea
8 tea milk

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea milk
2 milk tea
3 tea tea
4 milk milk
5 milk tea
6 milk milk
7 tea milk
8 tea tea

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea tea
2 tea milk
3 milk milk
4 milk tea
5 milk milk
6 milk milk
7 tea tea
8 tea tea

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk tea
2 tea tea
3 tea milk
4 tea tea
5 milk milk
6 milk milk
7 tea tea
8 milk milk

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk tea
2 tea milk
3 milk tea
4 tea tea
5 milk milk
6 milk tea
7 tea milk
8 tea milk

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk tea
2 milk milk
3 milk tea
4 tea milk
5 tea milk
6 milk milk
7 tea tea
8 tea tea

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk milk
2 tea milk
3 milk milk
4 tea milk
5 tea tea
6 milk tea
7 milk tea
8 tea tea

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk tea
2 tea tea
3 tea milk
4 tea tea
5 tea milk
6 milk milk
7 milk tea
8 milk milk

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea milk
2 milk tea
3 tea tea
4 milk milk
5 tea milk
6 milk milk
7 milk tea
8 tea tea

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 milk tea
2 tea milk
3 tea milk
4 milk tea
5 tea tea
6 tea milk
7 milk tea
8 milk milk

Pearson's Chi-squared test with Yates' continuity correction
data: tea_data$truth and tea_data$guess
X-squared = 0.5, df = 1, p-value = 0.4795
18 / 20
library(tidyverse)
tibble(truth = sample(rep(c("tea", "milk"), 4))) %>%
mutate(guess = sample(rep(c("tea", "milk"), 4))) ->
tea_data ; tea_data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = tea_data) +
geom_mosaic(aes(x = product(truth),
fill = guess))
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = tea_data$truth,
y = tea_data$guess)
# A tibble: 8 x 2
truth guess
<chr> <chr>
1 tea milk
2 milk tea
3 milk milk
4 tea tea
5 milk milk
6 tea tea
7 milk tea
8 tea milk

Pearson's Chi-squared test
data: tea_data$truth and tea_data$guess
X-squared = 0, df = 1, p-value = 1
18 / 20
19 / 20
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 2nd Survive
2 1st Survive
3 2nd Perish
4 Crew Perish
5 1st Perish
6 2nd Survive
7 1st Perish
8 3rd Survive
9 2nd Perish
10 Crew Survive
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 2.3518, df = 3, p-value = 0.5027
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 3rd Survive
2 Crew Perish
3 Crew Perish
4 3rd Perish
5 2nd Perish
6 Crew Survive
7 3rd Perish
8 3rd Perish
9 2nd Perish
10 3rd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 12.751, df = 3, p-value = 0.005207
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 2nd Perish
2 2nd Perish
3 2nd Survive
4 3rd Survive
5 Crew Survive
6 Crew Perish
7 3rd Survive
8 Crew Survive
9 Crew Perish
10 3rd Survive
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 8.9383, df = 3, p-value = 0.03012
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 2nd Perish
2 3rd Perish
3 1st Perish
4 2nd Survive
5 Crew Perish
6 3rd Perish
7 2nd Perish
8 Crew Perish
9 2nd Perish
10 1st Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 3.1174, df = 3, p-value = 0.3739
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 2nd Survive
2 Crew Survive
3 2nd Perish
4 Crew Perish
5 2nd Perish
6 1st Perish
7 2nd Perish
8 3rd Perish
9 3rd Survive
10 3rd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 2.4046, df = 3, p-value = 0.4928
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 2nd Perish
2 Crew Perish
3 3rd Perish
4 3rd Perish
5 1st Survive
6 2nd Perish
7 3rd Perish
8 Crew Perish
9 3rd Survive
10 Crew Survive
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 0.42067, df = 3, p-value = 0.9359
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 3rd Perish
2 Crew Perish
3 Crew Perish
4 2nd Perish
5 2nd Perish
6 3rd Perish
7 Crew Perish
8 1st Perish
9 3rd Perish
10 3rd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 0.47864, df = 3, p-value = 0.9236
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 3rd Perish
2 Crew Perish
3 1st Perish
4 3rd Perish
5 2nd Perish
6 2nd Survive
7 3rd Perish
8 3rd Survive
9 3rd Perish
10 3rd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 0.50755, df = 3, p-value = 0.9172
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 Crew Perish
2 1st Survive
3 2nd Survive
4 Crew Perish
5 3rd Perish
6 3rd Perish
7 3rd Survive
8 Crew Perish
9 3rd Perish
10 2nd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 1.7228, df = 3, p-value = 0.6319
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 2nd Survive
2 Crew Perish
3 2nd Perish
4 3rd Perish
5 2nd Perish
6 2nd Survive
7 3rd Perish
8 Crew Perish
9 3rd Perish
10 3rd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 0.94684, df = 3, p-value = 0.8141
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 Crew Survive
2 3rd Perish
3 Crew Perish
4 3rd Survive
5 1st Survive
6 2nd Survive
7 Crew Perish
8 1st Perish
9 3rd Perish
10 1st Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 1.976, df = 3, p-value = 0.5774
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 1st Perish
2 2nd Perish
3 3rd Survive
4 3rd Survive
5 1st Survive
6 2nd Survive
7 Crew Survive
8 3rd Perish
9 1st Perish
10 Crew Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 3.8352, df = 3, p-value = 0.2798
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 Crew Perish
2 3rd Perish
3 3rd Survive
4 2nd Perish
5 1st Survive
6 2nd Perish
7 Crew Perish
8 2nd Perish
9 Crew Survive
10 1st Survive
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 1.295, df = 3, p-value = 0.7303
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 3rd Perish
2 1st Perish
3 1st Perish
4 1st Perish
5 3rd Perish
6 1st Perish
7 3rd Survive
8 1st Perish
9 3rd Survive
10 Crew Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 1.6422, df = 3, p-value = 0.6498
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 3rd Survive
2 1st Perish
3 1st Perish
4 2nd Perish
5 Crew Perish
6 3rd Perish
7 2nd Survive
8 3rd Perish
9 Crew Perish
10 2nd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 2.4201, df = 3, p-value = 0.4899
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 Crew Perish
2 3rd Survive
3 Crew Perish
4 2nd Perish
5 1st Survive
6 Crew Perish
7 2nd Perish
8 1st Survive
9 Crew Perish
10 3rd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 5.7481, df = 3, p-value = 0.1245
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 Crew Perish
2 2nd Survive
3 Crew Perish
4 Crew Perish
5 Crew Perish
6 1st Survive
7 2nd Perish
8 Crew Survive
9 2nd Perish
10 3rd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 2.5353, df = 3, p-value = 0.4689
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 3rd Perish
2 1st Survive
3 3rd Perish
4 1st Perish
5 2nd Perish
6 Crew Perish
7 1st Perish
8 3rd Survive
9 1st Perish
10 1st Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 2.4223, df = 3, p-value = 0.4895
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 2nd Perish
2 1st Survive
3 2nd Perish
4 3rd Perish
5 1st Perish
6 Crew Perish
7 1st Perish
8 3rd Perish
9 3rd Survive
10 2nd Perish
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 0.4343, df = 3, p-value = 0.9331
20 / 20
library(tidyverse)
tibble(class = sample(x = c("1st","2nd","3rd","Crew"),
size = 50,
replace = T)) %>%
mutate(survive = sample(x = c("Survive", "Perish"),
size = 50,
replace = T,
prob = c(.3, .7))) ->
fake_titanic ; fake_titanic
#
#
#
#
#
#
#
#
#
#
library(ggmosaic) # geom_mosaic() and product() # geom_mosaic() and product()
ggplot(data = fake_titanic) +
geom_mosaic(aes(x = product(class),
fill = survive)) -> g1
ggplot(data = fake_titanic) +
aes(x = class) +
aes(y = survive) +
geom_count(alpha = .5, color = "blue") +
geom_jitter(width = .2, height = .2) -> g2
library(patchwork)
g1 + g2
#
#
#
#
#
#
#
#
#
#
#
#
#
chisq.test(x = fake_titanic$class,
y = fake_titanic$survive)
# A tibble: 50 x 2
class survive
<chr> <chr>
1 3rd Perish
2 2nd Perish
3 Crew Perish
4 1st Perish
5 3rd Perish
6 3rd Survive
7 1st Survive
8 Crew Perish
9 3rd Survive
10 Crew Survive
# … with 40 more rows

Pearson's Chi-squared test
data: fake_titanic$class and fake_titanic$survive
X-squared = 2.4098, df = 3, p-value = 0.4918
20 / 20
2 / 20
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow