The goal of this lab is to introduce the RMarkdown file format. This
format is convenient for including R code and output into an html or pdf
document. By default the Rmd file you downloaded produces an html
output, because that is the format you will find the lab in on the
external course website. We can change the output type to
pdf_document
if we want to create a pdf.
We will need the following packages for this file.
library("tidyverse")
## ── Attaching core tidyverse packages ─────────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4.9000 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ───────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("Sleuth3") # data
library("rmarkdown") # compiling RMarkdown files
If you want to compile to a pdf, you will need a LaTeX installation. On a Mac, I use MacTex. On a PC, I have used MikTex. On either, you should be able to use the R package tinytex which is available on CRAN and following the instructions.
A binomial random variable is one where you have the count of the number of successes out of some number of attempts where each attempt is independent and has the same probability of success.
Many datasets can be converted to a binomial random variable. For
example, consider the case0401
data set in the Sleuth3 R package (make sure you have installed and loaded
it). You can find information about this data set using
?case0401
.
case0401
## Incidents Launch
## 1 1 Cool
## 2 1 Cool
## 3 1 Cool
## 4 3 Cool
## 5 0 Warm
## 6 0 Warm
## 7 0 Warm
## 8 0 Warm
## 9 0 Warm
## 10 0 Warm
## 11 0 Warm
## 12 0 Warm
## 13 0 Warm
## 14 0 Warm
## 15 0 Warm
## 16 0 Warm
## 17 0 Warm
## 18 0 Warm
## 19 0 Warm
## 20 0 Warm
## 21 0 Warm
## 22 1 Warm
## 23 1 Warm
## 24 2 Warm
Let \(y\) the number of non-incident Warm launches and \(n\) be the total number of warm launches. Assume \(Y \sim Bin(n,\theta)\).
n <- sum(case0401$Launch == "Warm")
y <- sum(case0401$Incidents[case0401$Launch == "Warm"] == 0)
We can use some built-in functions to calculate quantities for us.
## Frequentist analysis
binom.test(y, n)
##
## Exact binomial test
##
## data: y and n
## number of successes = 17, number of trials = 20, p-value = 0.002577
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.6210732 0.9679291
## sample estimates:
## probability of success
## 0.85
prop.test( y, n) # based on CLT (not reasonable here)
##
## 1-sample proportions test with continuity correction
##
## data: y out of n, null probability 0.5
## X-squared = 8.45, df = 1, p-value = 0.00365
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.6113749 0.9604337
## sample estimates:
## p
## 0.85
On homework, rather than printing out the entire analysis, you should print out relevant quantities for the questions asked. For example, if question 1.5 requested an approximate 95% confidence interval, you could use the following
binom.test(y, n)$conf.int # Q1.5
## [1] 0.6210732 0.9679291
## attr(,"conf.level")
## [1] 0.95
We can also do more manual calculations, like those for a Bayesian analysis.
# Bayesian analysis
(1 + y) / (2 + n)
## [1] 0.8181818
qbeta(c(.025, .975), 1 + y, 1 + n - y) # Credible interval
## [1] 0.6365760 0.9455364
1 - pbeta(0.9, 1 + y, 1 + n - y) # P(theta > 0.9 | y)
## [1] 0.1519653
Make sure your code and text help the reader understand what you are doing.
We use a normal model when our data are continuous. Consider the data
in case0101
.
Let \(y_i\) be the creativity score for the \(i\)th individual in the Extrinsic group. Assume \(Y_i \stackrel{ind}{\sim} N(\mu,\sigma^2)\).
extrinsic <- case0101 |>
filter(Treatment == "Extrinsic")
summary(extrinsic)
## Score Treatment
## Min. : 5.00 Extrinsic:23
## 1st Qu.:12.15 Intrinsic: 0
## Median :17.20
## Mean :15.74
## 3rd Qu.:18.95
## Max. :24.00
Frequentist and Bayesian analyses are both accomplished through the
t.test
function.
t.test(extrinsic$Score)
##
## One Sample t-test
##
## data: extrinsic$Score
## t = 14.37, df = 22, p-value = 1.16e-12
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 13.46774 18.01052
## sample estimates:
## mean of x
## 15.73913
The default hypothesized value for the mean is \(0\). Since the possible range for creativity scores is likely 0 to 30, this hypothesized value is completely unreasonable. Therefore, the p-value in this output is meaningless.
Analyses get more interesting when we compare two different
populations. Using the shuttle launch data (case0401
), we
might be interested in comparing the probability of no O-ring issues
depending on temperature which is recorded as Warm or Cool. Thus, we
have two populations (Warm and Cool) and within each we have the number
of successes out of some number of attempts.
Let \(Y_g\) be the number of non-incident launches out of \(n_g\) attempts in group \(g\). Define \(g=1\) to be Cool and \(g=2\) to be Warm. Assume \(Y_g \stackrel{ind}{\sim} Bin(n_g, \theta_g)\). We are interested in the quantity \(\theta_1 - \theta_2\) (or the reverse).
# Compute summary statistics from data
d <- case0401 |>
group_by(Launch) |>
summarize(
n = n(),
y = sum(Incidents == 0),
p = y/n
)
d
## # A tibble: 2 × 4
## Launch n y p
## <fct> <int> <int> <dbl>
## 1 Cool 4 0 0
## 2 Warm 20 17 0.85
Unfortunately, frequentist analyses generally require data asymptotics, i.e. large sample sizes.
prop.test(d$y, d$n)
## Warning in prop.test(d$y, d$n): Chi-squared approximation may be incorrect
##
## 2-sample test for equality of proportions with continuity correction
##
## data: d$y out of d$n
## X-squared = 7.9059, df = 1, p-value = 0.004927
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -1.0000000 -0.5435094
## sample estimates:
## prop 1 prop 2
## 0.00 0.85
Fortunately, prop.test
warns us (with a cryptic message)
that the sample size is not large enough.
We can perform a Bayesian analysis with any sample size, but our prior will have more impact on the analysis when the sample size is small.
# Simulate from the posterior for each probability of success
nreps <- 1e5 # make this large
prob_cool <- rbeta(nreps, 1 + d$y[1], 1 + d$n[1] - d$y[1])
prob_warm <- rbeta(nreps, 1 + d$y[2], 1 + d$n[2] - d$y[2])
diff <- prob_cool - prob_warm
# Calculate quantities of interest
mean(diff < 0) # P(prob_warm > prob_cool | y)
## [1] 0.99928
quantile(-diff, probs = c(.025, 0.975)) # 95% credible interval for prob_warm - prob_cool
## 2.5% 97.5%
## 0.2628982 0.8907850
This interval is pretty wide for the difference between two probabilities, but with only 24 binomial observations, what did you expect?
When we have continuous data that arises from two populations, we can often use two normal distributions to compare them.
Consider the case0101
data with creativity scores that
may depend on treatment an individual received. We can compute summary
statistics.
# Summary statistics
case0101 |>
group_by(Treatment) |>
summarize(
n = n(),
mean = mean(Score),
sd = sd(Score)
)
## # A tibble: 2 × 4
## Treatment n mean sd
## <fct> <int> <dbl> <dbl>
## 1 Extrinsic 23 15.7 5.25
## 2 Intrinsic 24 19.9 4.44
We can also plot the data
ggplot(case0101,
aes(x = Treatment,
y = Score,
color = Treatment,
shape = Treatment)) +
geom_jitter(width = 0.1) # jitter the points makes sure points don't overlap
A common assumption is whether the two populations have the same population variance. We can check this by looking at the sample standard deviations as well as the plots above. here the variability looks pretty similar between the two groups.
We can perform both frequentist and Bayesian analyses using
t.test
.
(tt <- t.test(Score ~ Treatment, data = case0101, var.equal = TRUE))
##
## Two Sample t-test
##
## data: Score by Treatment
## t = -2.9259, df = 45, p-value = 0.005366
## alternative hypothesis: true difference in means between group Extrinsic and group Intrinsic is not equal to 0
## 95 percent confidence interval:
## -6.996973 -1.291432
## sample estimates:
## mean in group Extrinsic mean in group Intrinsic
## 15.73913 19.88333
Here hypothesized difference between the means of the two groups is zero and thus the p-value is relevant. Before we are rash in our conclusions, we should check and make sure independence and normality are reasonable.
For the above two populations, consider whether the data are a random sample and whether the treatment has been randomly applied.
The shuttle launch data is probably the entire population over some time period. With that being said, we can consider that it is a sample of shuttle launches that could have taken place. But it is unlikely that this sample is random. Ultimately, for these data the statistics provide us a reasonable summary of the data, but cannot be used to infer something about the larger population.
Psychology experiments like case0101
are typically
performed with an convenience sample of undergraduate students at the
institution where the researchers are. Thus, the sample is not random
and certainly not representative of the population at large. Thus, we
cannot infer results about some larger population.
The shuttle launch data collection does not have a randomized treatment. That is, we did not randomly assign the temperature to a given day since that is impossible. Thus we may only say there is an association (a strong one here) between temperature and probability of no O-ring incidents on a given launch. Even though the association is strong, that does not (on its own) imply there is a causal relationship between temperature and probability of no incident.
As indicated in the helpfile for case0101
, individuals
were randomized to the two treatment groups. Since treatment was
randomized, we can make a causal claim about the effect of motivation
type (extrinsic vs intrinsic) on creativity score. In this case,
intrinsic motivation score appears to increase creativity scores on
average by (1.3,7.0) on a 30 point scale.