01:00
Which pair of plots would have the greatest chi-squared distance between them? (consider one of them the “observed” and the other the “expected”)
01:00
\[ \frac{(1-1)^2}{1} + \frac{(10 - 1)^2}{1} + \frac{(1 - 10)^2}{10} \\ 0 + 81 + \frac{81}{10} = 89.1 \]
\[ \frac{(3-5)^2}{5} + \frac{(4-4)^2}{4} + \frac{(5-3)^2}{3} \\ \frac{4}{5} + 0 + \frac{4}{3} = 2.13 \]
In order to demonstrate how to conduct a hypothesis test through simulation, we will be collecting data from this class using a poll.
You will have only 15 seconds to answer the following multiple choice question, so please get ready at pollev.com
…
The two shapes above have simple first names:
Which of the two names belongs to the shape on the left?
00:15
What is a statement of the null hypothesis that corresponds to the notion the link between names and shapes is arbitrary?
01:00
\[\hat{p}_k = \frac{\textrm{Number who chose "Kiki"}}{\textrm{Total number of people}}\]
Note: you could also simply \(n_k\), the number of people who chose “Kiki”.
Our technique: simulate data from a world in which the null is true, then calculate the test statistic on the simulated data.
Which simulation method(s) align with the null hypothesis and our data collection process?
01:00
infer
library(tidyverse)
library(infer)
# update these based on the poll
n_k <- 40
n_b <- 20
shapes <- data.frame(name = c(rep("Kiki", n_k),
rep("Booba", n_b)))
shapes |>
specify(response = name,
success = "Kiki") |>
hypothesize(null = "point", p = .5) |>
generate(reps = 1, type = "draw") |>
calculate(stat = "prop")
null <- shapes |>
specify(response = name,
success = "Kiki") |>
hypothesize(null = "point", p = .5) |>
generate(reps = 500, type = "draw") |>
calculate(stat = "prop")
obs_p_hat <- shapes |>
specify(response = name,
success = "Kiki") |>
# hypothesize(null = "point", p = .5) |>
# generate(reps = 500, type = "simulate") |>
calculate(stat = "prop")
What is the proper interpretation of this p-value?
01:00
05:00
50:00