Lab07 - RMarkdown

Statistical analyses
Inference
- Random sample
- Randomized treatment

The goal of this lab is to introduce the RMarkdown file format. This format is convenient for including R code and output into an html or pdf document. By default the Rmd file you downloaded produces an html output, because that is the format you will find the lab in on the external course website. We can change the output type to pdf_document if we want to create a pdf.

We will need the following packages for this file.

library("tidyverse")

## ── Attaching core tidyverse packages ─────────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4.9000     ✔ readr     2.1.5     
## ✔ forcats   1.0.0          ✔ stringr   1.5.1     
## ✔ ggplot2   3.5.1          ✔ tibble    3.2.1     
## ✔ lubridate 1.9.3          ✔ tidyr     1.3.1     
## ✔ purrr     1.0.2          
## ── Conflicts ───────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("Sleuth3")   # data
library("rmarkdown") # compiling RMarkdown files

If you want to compile to a pdf, you will need a LaTeX installation. On a Mac, I use MacTex. On a PC, I have used MikTex. On either, you should be able to use the R package tinytex which is available on CRAN and following the instructions.

Statistical analyses

One binomial population

A binomial random variable is one where you have the count of the number of successes out of some number of attempts where each attempt is independent and has the same probability of success.

Many datasets can be converted to a binomial random variable. For example, consider the case0401 data set in the Sleuth3 R package (make sure you have installed and loaded it). You can find information about this data set using ?case0401.

case0401

##    Incidents Launch
## 1          1   Cool
## 2          1   Cool
## 3          1   Cool
## 4          3   Cool
## 5          0   Warm
## 6          0   Warm
## 7          0   Warm
## 8          0   Warm
## 9          0   Warm
## 10         0   Warm
## 11         0   Warm
## 12         0   Warm
## 13         0   Warm
## 14         0   Warm
## 15         0   Warm
## 16         0   Warm
## 17         0   Warm
## 18         0   Warm
## 19         0   Warm
## 20         0   Warm
## 21         0   Warm
## 22         1   Warm
## 23         1   Warm
## 24         2   Warm

Let \(y\) the number of non-incident Warm launches and \(n\) be the total number of warm launches. Assume \(Y \sim Bin(n,\theta)\).

n <- sum(case0401$Launch == "Warm")
y <- sum(case0401$Incidents[case0401$Launch == "Warm"] == 0)

We can use some built-in functions to calculate quantities for us.

## Frequentist analysis
binom.test(y, n)

## 
##  Exact binomial test
## 
## data:  y and n
## number of successes = 17, number of trials = 20, p-value = 0.002577
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.6210732 0.9679291
## sample estimates:
## probability of success 
##                   0.85

prop.test( y, n) # based on CLT (not reasonable here)

## 
##  1-sample proportions test with continuity correction
## 
## data:  y out of n, null probability 0.5
## X-squared = 8.45, df = 1, p-value = 0.00365
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.6113749 0.9604337
## sample estimates:
##    p 
## 0.85

On homework, rather than printing out the entire analysis, you should print out relevant quantities for the questions asked. For example, if question 1.5 requested an approximate 95% confidence interval, you could use the following

binom.test(y, n)$conf.int # Q1.5

## [1] 0.6210732 0.9679291
## attr(,"conf.level")
## [1] 0.95

We can also do more manual calculations, like those for a Bayesian analysis.

# Bayesian analysis
(1 + y) / (2 + n)

## [1] 0.8181818

qbeta(c(.025, .975), 1 + y, 1 + n - y) # Credible interval

## [1] 0.6365760 0.9455364

1 - pbeta(0.9, 1 + y, 1 + n - y)       # P(theta > 0.9 | y)

## [1] 0.1519653

Make sure your code and text help the reader understand what you are doing.

One normal population

We use a normal model when our data are continuous. Consider the data in case0101.

Let \(y_i\) be the creativity score for the \(i\)th individual in the Extrinsic group. Assume \(Y_i \stackrel{ind}{\sim} N(\mu,\sigma^2)\).

extrinsic <- case0101 |> 
  filter(Treatment == "Extrinsic") 

summary(extrinsic)

##      Score           Treatment 
##  Min.   : 5.00   Extrinsic:23  
##  1st Qu.:12.15   Intrinsic: 0  
##  Median :17.20                 
##  Mean   :15.74                 
##  3rd Qu.:18.95                 
##  Max.   :24.00

Frequentist and Bayesian analyses are both accomplished through the t.test function.

t.test(extrinsic$Score)

## 
##  One Sample t-test
## 
## data:  extrinsic$Score
## t = 14.37, df = 22, p-value = 1.16e-12
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  13.46774 18.01052
## sample estimates:
## mean of x 
##  15.73913

The default hypothesized value for the mean is \(0\). Since the possible range for creativity scores is likely 0 to 30, this hypothesized value is completely unreasonable. Therefore, the p-value in this output is meaningless.

Two binomial populations

Analyses get more interesting when we compare two different populations. Using the shuttle launch data (case0401), we might be interested in comparing the probability of no O-ring issues depending on temperature which is recorded as Warm or Cool. Thus, we have two populations (Warm and Cool) and within each we have the number of successes out of some number of attempts.

Let \(Y_g\) be the number of non-incident launches out of \(n_g\) attempts in group \(g\). Define \(g=1\) to be Cool and \(g=2\) to be Warm. Assume \(Y_g \stackrel{ind}{\sim} Bin(n_g, \theta_g)\). We are interested in the quantity \(\theta_1 - \theta_2\) (or the reverse).

# Compute summary statistics from data
d <- case0401 |>
  group_by(Launch) |>
  summarize(
    n = n(),
    y = sum(Incidents == 0),
    p = y/n
  )

d

## # A tibble: 2 × 4
##   Launch     n     y     p
##   <fct>  <int> <int> <dbl>
## 1 Cool       4     0  0   
## 2 Warm      20    17  0.85

Unfortunately, frequentist analyses generally require data asymptotics, i.e. large sample sizes.

prop.test(d$y, d$n)

## Warning in prop.test(d$y, d$n): Chi-squared approximation may be incorrect

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  d$y out of d$n
## X-squared = 7.9059, df = 1, p-value = 0.004927
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -1.0000000 -0.5435094
## sample estimates:
## prop 1 prop 2 
##   0.00   0.85

Fortunately, prop.test warns us (with a cryptic message) that the sample size is not large enough.

We can perform a Bayesian analysis with any sample size, but our prior will have more impact on the analysis when the sample size is small.

# Simulate from the posterior for each probability of success
nreps      <- 1e5                                           # make this large
prob_cool <- rbeta(nreps, 1 + d$y[1], 1 + d$n[1] - d$y[1]) 
prob_warm <- rbeta(nreps, 1 + d$y[2], 1 + d$n[2] - d$y[2])
diff       <- prob_cool - prob_warm

# Calculate quantities of interest
mean(diff < 0)                          # P(prob_warm > prob_cool | y)

## [1] 0.99928

quantile(-diff, probs = c(.025, 0.975)) # 95% credible interval for prob_warm - prob_cool

##      2.5%     97.5% 
## 0.2628982 0.8907850

This interval is pretty wide for the difference between two probabilities, but with only 24 binomial observations, what did you expect?

Two normal populations

When we have continuous data that arises from two populations, we can often use two normal distributions to compare them.

Consider the case0101 data with creativity scores that may depend on treatment an individual received. We can compute summary statistics.

# Summary statistics
case0101 |>
  group_by(Treatment) |>
  summarize(
    n    = n(),
    mean = mean(Score),
    sd   = sd(Score)
  )

## # A tibble: 2 × 4
##   Treatment     n  mean    sd
##   <fct>     <int> <dbl> <dbl>
## 1 Extrinsic    23  15.7  5.25
## 2 Intrinsic    24  19.9  4.44

We can also plot the data

ggplot(case0101,
       aes(x = Treatment, 
           y = Score,
           color = Treatment,
           shape = Treatment)) +
  geom_jitter(width = 0.1)       # jitter the points makes sure points don't overlap

A common assumption is whether the two populations have the same population variance. We can check this by looking at the sample standard deviations as well as the plots above. here the variability looks pretty similar between the two groups.

We can perform both frequentist and Bayesian analyses using t.test.

(tt <- t.test(Score ~ Treatment, data = case0101, var.equal = TRUE))

## 
##  Two Sample t-test
## 
## data:  Score by Treatment
## t = -2.9259, df = 45, p-value = 0.005366
## alternative hypothesis: true difference in means between group Extrinsic and group Intrinsic is not equal to 0
## 95 percent confidence interval:
##  -6.996973 -1.291432
## sample estimates:
## mean in group Extrinsic mean in group Intrinsic 
##                15.73913                19.88333

Here hypothesized difference between the means of the two groups is zero and thus the p-value is relevant. Before we are rash in our conclusions, we should check and make sure independence and normality are reasonable.

Inference

For the above two populations, consider whether the data are a random sample and whether the treatment has been randomly applied.

Random sample

The shuttle launch data is probably the entire population over some time period. With that being said, we can consider that it is a sample of shuttle launches that could have taken place. But it is unlikely that this sample is random. Ultimately, for these data the statistics provide us a reasonable summary of the data, but cannot be used to infer something about the larger population.

Psychology experiments like case0101 are typically performed with an convenience sample of undergraduate students at the institution where the researchers are. Thus, the sample is not random and certainly not representative of the population at large. Thus, we cannot infer results about some larger population.

Randomized treatment

The shuttle launch data collection does not have a randomized treatment. That is, we did not randomly assign the temperature to a given day since that is impossible. Thus we may only say there is an association (a strong one here) between temperature and probability of no O-ring incidents on a given launch. Even though the association is strong, that does not (on its own) imply there is a causal relationship between temperature and probability of no incident.

As indicated in the helpfile for case0101, individuals were randomized to the two treatment groups. Since treatment was randomized, we can make a causal claim about the effect of motivation type (extrinsic vs intrinsic) on creativity score. In this case, intrinsic motivation score appears to increase creativity scores on average by (1.3,7.0) on a 30 point scale.