Lab05 - Estimator evaluation

Binomial probability of success estimation
Parameter estimation

library("tidyverse")

## ── Attaching core tidyverse packages ─── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4.9000     ✔ readr     2.1.5     
## ✔ forcats   1.0.0          ✔ stringr   1.5.1     
## ✔ ggplot2   3.5.1          ✔ tibble    3.2.1     
## ✔ lubridate 1.9.3          ✔ tidyr     1.3.1     
## ✔ purrr     1.0.2          
## ── Conflicts ───────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

For reproducibility, set the seed.

set.seed(20190219)

Binomial probability of success estimation

Let’s let \(y\sim Bin(n,\theta)\). Recall that the MLE is \(\widehat{\theta}_{MLE} = \frac{y}{n}\). If we assume the prior \(\theta \sim Be(a,b)\), then the posterior is \(\theta|y \sim Be(a+y,b+n-y)\) and the posterior expectation is \[ \widehat{\theta}_{Bayes} = E[\theta|y] = \frac{a+y}{a+b+n}. \] Specifically, if we assume a Uniform prior on (0, 1), i.e. \(a=b=1\), then \(\theta|y \sim Be(1+y,1+n-y)\) and the posterior expectation is \[ \widehat{\theta}_{Bayes} = E[\theta|y] = \frac{1+y}{2+n}. \] This is equivalent to adding one success and one failure to the data.

Suppose you randomly sample 22 buildings in Ames and find that 15 of them are residential buildings. Then the MLE for the probability of a building in Ames being residential is

n <- 22
y <- 15
(mle <- y/n)

## [1] 0.6818182

the Bayes estimator is

a <- b <- 1
(bayes <- (a+y)/(a+b+n))

## [1] 0.6666667

and a 95% credible interval is

qbeta(c(.025,.975), a+y, b+n-y)

## [1] 0.4708083 0.8362364

Binomial posterior activity

Suppose that you take a larger sample of buildings and find that out of 222 buildings 135 of them are residential. Calculate the MLE, Bayes estimator, and a 90% credible interval for the probability of a house in Ames being residential.

Click for solution

n <- 222
y <- 135
(mle <- y/n)

## [1] 0.6081081

(bayes <- (a+y)/(a+b+n))

## [1] 0.6071429

qbeta(c(.05,.95), a+y, b+n-y)

## [1] 0.5530313 0.6601642

Can you calculate a posterior median?

Parameter estimation

Recall that estimators have the properties

In addition, uncertainty intervals, i.e. credible intervals, have a property called

Coverage

Bias

The bias of an estimator is the expected value of the estimator minus the true value. Generally, we don’t know the true value, but we can simulate data with a known truth. We can estimate the bias, by performing repeated simulations and taking the average value of the estimator across those simulations. This average value is an estimate of the expected value. If we take enough values in the average, then the Central Limit Theorem tells us the distribution of this average. Specifically

\[\overline{x} \sim N(\mu,\sigma^2/n)\] where \(\mu=E[\overline{x}]\) and we estimate \(\sigma^2\) with the sample variance.

Binomial model

Let’s use Monte Carlo simulations to estimate the bias of the MLE and Bayes estimator. To do this, we are going to repeatedly

simulate binomial random variables with \(n=10\) and \(\theta=0.5\),
compute the MLE and Bayes estimator

Then we will

take an average of all the MLE estimates and subtract \(\theta\) (the true value)
take an average of all the Bayes estimates and subtract \(\theta\) (the true value)

# Biomial model with
# n attempts and probability of success theta
n      <- 10
theta  <- 0.5

# Storage for estimators
n_reps <- 1e4
mle    <- numeric(n_reps)
bayes  <- numeric(n_reps)

# Simulate n_reps binomial random variables
# compute MLE and Bayes estimators
for (i in 1:n_reps) {
  y        <- rbinom(1, size = n, prob = theta)
  mle[i]   <- y/n
  bayes[i] <- (a + y) / (a + b + n)
}


mean(mle)  - theta # estimate of MLE bias

## [1] 0.00096

mean(bayes)- theta # estimate of Bayes bias

## [1] 8e-04

Now, we probably want some idea of how close we are to the true bias. We will use the CLT. \[ \overline{\widehat{\theta}} \pm z_{0.025} SE(\overline{\widehat{\theta}}) \] where the \(SE(\overline{\widehat{\theta}})\) is estimated by the sample standard deviation of \(\widehat{\theta}\) divided by the square root of the number of values in the average, i.e. the sample size. The formula for a 95% interval is then \[ \overline{\widehat{\theta}} \pm z_{0.025} s_{\widehat{\theta}}/\sqrt{n} \]

# Binomial bias with Monte Carlo uncertainty
mean(mle  ) + c(-1, 1) * qnorm(.975) * sd(mle  ) / sqrt(length(mle  )) - theta

## [1] -0.002153667  0.004073667

mean(bayes) + c(-1, 1) * qnorm(.975) * sd(bayes) / sqrt(length(bayes)) - theta

## [1] -0.001794722  0.003394722

We could have written the code a bit more succinctly (and, perhaps, obtusely).

# Succinct binomial Monte Carlo study
y     <- rbinom(n_reps, size = n, prob = theta)
mle   <- y / n
bayes <- (a + y) / (a + b + n)

# Calculate bias with Monte Carlo uncertainty
mean(mle  ) + c(-1, 1) *qnorm(.975) * sd(mle  ) / sqrt(length(mle  )) - theta

## [1] -0.002545906  0.003605906

mean(bayes) + c(-1, 1) *qnorm(.975) * sd(bayes) / sqrt(length(bayes)) - theta

## [1] -0.002121588  0.003004922

Binomial bias activity

Repeat the simulation procedure we had above but using \(n=5\) and \(\theta=0.2\). What do you find for the bias of the MLE and the Bayes estimator?

Click for solution

# Adjust n and theta
n     <- 5
theta <- 0.2

# Simulate binomials and calculate estimators
y     <- rbinom(n_reps, size = n, prob = theta)
mle   <- y/n
bayes <- (a+y)/(a+b+n)

# Calculate bias with Monte Carlo uncertainty
mean(mle  ) + c(-1, 1) * qnorm(.975) * sd(mle  ) / sqrt(length(mle  )) - theta

## [1] -0.004077641  0.002877641

mean(bayes) + c(-1, 1) * qnorm(.975) * sd(bayes) / sqrt(length(bayes)) - theta

## [1] 0.08280169 0.08776974

Now the Bayes estimator appears to be quite biased.

Binomial bias simulation study

Let’s study what happens to the bias of the Bayes estimator as we change \(n\) and \(\theta\). The following will look at \(n=1,10,100,1000\) and \(\theta\) from \(0\) to \(1\) in increments of 0.1.

# Perform the bias calculation for different combinations
# of n and theta
settings <- expand.grid(n     = 10^(0:3),
                        theta = seq(0, 1, by = 0.1))

We will use the dplyr package to help us iterate over all of these values of \(n\) and \(\theta\). First we need to construct the function to do a simulation study for every value of n and theta.

# A function to estimate the bias based on a data.frame x
# that contains a single row with two columns: n and theta
sim_function <- function(x) {
  y     <- rbinom(1e4, size = x$n, prob = x$theta)
  mle   <- y / x$n
  bayes <- (a + y) / (a + b + x$n)
  
  d <- data.frame(
    estimator = c("mle", "bayes"),
    bias      = c(mean(mle), mean(bayes)) - x$theta,
    var       = c(var( mle), var( bayes)))
  
  # d$se    <- sqrt(d$var / x$n)
  # d$lower <- d$bias-qnorm(.975)*d$se
  # d$upper <- d$bias+qnorm(.975)*d$se
  
  return(d)
}

Then we can loop over all the values of n and theta.

# Run the simulation study
sim_study <- settings |>
  group_by(n, theta) |> # for each combination of n and theta
  do(sim_function(.))

Let’s plot the Monte Carlo estimated bias.

# Plot estimated bias from the simulation study
ggplot(sim_study, 
       aes(
         x        = theta, 
         y        = bias,         # plotting bias
         color    = estimator,
         linetype = estimator)) +
  geom_line() +
  facet_wrap(~n, scales = "free_y") + 
  theme_bw()

We can see that the Bayes estimator has the most bias when 1) the sample size is small and 2) \(\theta\) is far away from 0.5. But the variance of the Bayes estimator is smaller than the variance of the MLE estimator.

# Plot estimated bias from the simulation study
ggplot(sim_study, 
       aes(
        x        = theta, 
        y        = var,          # plotting variance
        color    = estimator,
        linetype = estimator)) +
  geom_line() +
  facet_wrap(~n, scales = "free_y") + 
  theme_bw()

We can see that the Bayes estimator always has smaller variance than the MLE.

Thus, we often use a new property of an estimator called the mean squared error (MSE): \[ MSE(\widehat{\theta}) = E[(\widehat\theta-\theta)^2] = Var(\widehat{\theta}) + Bias(\widehat{\theta},\theta)^2 \] This property combines both the bias and the variance into a single term. Generally the estimator with the smallest MSE should be utilized.

# Plot estimated mse from the simulation study
sim_study$mse <- sim_study$var + sim_study$bias^2

ggplot(sim_study, 
       aes(
        x        = theta, 
        y        = mse,          # plotting MSE
        color    = estimator,
        linetype = estimator)) +
  geom_line() +
  facet_wrap(~n, scales = "free_y") + 
  theme_bw()

Binomial MSE activity

Generate a curve for the MSE for the MLE and Bayes estimators for a binomial probability of success when \(n=13\) for various values of \(\theta\).

Click for solution

settings <- expand.grid(n = 13,
                        theta = seq(0,1,by=0.1))

sim_study <- settings |> 
  group_by(n, theta) |>
  do(sim_function(.))

sim_study$mse <- sim_study$var + sim_study$bias^2

ggplot(sim_study, aes(x=theta, y=mse, color=estimator)) +
  geom_line() +
  facet_wrap(~n) + # This line isn't necessary
  theme_bw()

Why does the alternative approach give the same answer? (I’m not sure what I mean by this.)

Consistency

Recall that an estimator is consistent if \[ P(|\widehat\theta_n-\theta|<\epsilon) \to 1 \quad\mbox{ as }\quad n\to\infty \] for any \(\epsilon>0\). Here I have made explicit that the estimator depends on \(n\).

To evaluate consistency by simulation, we will perform a sequence of experiments, i.e. one coin flip at a time, and calculate the estimator after each coin flip. Then we will see how many times the estimator is more than \(\epsilon\) away from \(\theta\).

theta <- 0.51
n_max <- 999

sim_function <- function(d) {
  # Simulate binomials
  x <- rbinom(n_max, size = 1, prob = theta)
  
  # Storage structure for estimators
  mle <- bayes <- numeric(n_max)
  
  # Compute estimators for all from 1 to n_max
  for (n in 1:n_max) {
    y        <- sum(x[1:n])
    mle[n]   <- y / n
    bayes[n] <- (a + y) / (a + b + n)
  }
  
  # Record results
  data.frame(n     = 1:n_max,
             mle   = mle,
             bayes = bayes)
}

# Repeat 1000 times
d <- data.frame(rep=1:1e3) |>
  group_by(rep) |>
  do(sim_function(.))

# Calculate probability
epsilon <- 0.05 # feel free to change this
sum <- d |> 
  gather(estimator, estimate, mle, bayes) |>
  group_by(n, estimator) |>
  summarize(prob = mean(abs(estimate - theta) < epsilon))

## `summarise()` has grouped output by 'n'. You can override
## using the `.groups` argument.

Now plot the results

ggplot(sum, 
       aes(
        x        = n, 
        y        = prob, 
        color    = estimator, 
        linetype = estimator)) + 
  geom_line() +
  theme_bw()

This is an illustration of the property called consistency. For both the Bayes estimator and the MLE, the probability of being within \(\epsilon\) of \(\theta\) (the true success probability) converges to 1 as the number of observations increases.

Binomial consistency activity

Via simulation, show that if \(\theta=0.01\) that both the MLE and Bayes estimators are consistent.

Click for solution

theta = 0.01

d <- data.frame(rep=1:1e3) |>
  group_by(rep) |>
  do(sim_function(.))

sum <- d |> 
  gather(estimator, estimate, mle, bayes) |>
  group_by(n, estimator) |>
  summarize(prob = mean(abs(estimate - theta) < epsilon))

## `summarise()` has grouped output by 'n'. You can override
## using the `.groups` argument.

ggplot(sum, aes(x = n, y = prob, color = estimator, linetype = estimator)) + 
  geom_line() +
  theme_bw()

Coverage

Uncertainty intervals have a property called coverage. Coverage is the probability the interval will contain the truth when the method used to construct the interval is used repeatedly on different data sets.

We will evaluate coverage using simulations by repeatedly

Simulating data
Creating a credible interval
Determining how often that credible interval contains the truth

# Binomial parameters
n     <- 100
theta <- 0.5

# Simulate binomial random variables
n_reps <- 1e4
y      <- rbinom(n_reps, size = n, prob = theta)

# Calculate all endpoints
lower <- qbeta(.025, a + y, b + n - y)
upper <- qbeta(.975, a + y, b + n - y)

# Calculate coverage
mean( lower <= theta & theta <= upper )

## [1] 0.9407

If we want to understand our Monte Carlo uncertainty in this coverage, we can use the CLT for a proportion.

# Monte Carlo uncertainty
p <- mean( lower <= theta & theta <= upper)

p + c(-1, 1) * qnorm(.975) * sqrt(p * (1 - p) / n_reps)

## [1] 0.9360709 0.9453291

Binomial coverage activity

Via simulation, determine the coverage for a credible interval when \(n=10\) and \(\theta = 0.2\) and use the CLT to provide an uncertainty on this coverage.

Click for solution

n     <- 10
theta <- 0.2

n_reps <- 1e4
y <- rbinom(n_reps, size = n, prob = theta)

lower <- qbeta(.025, a + y, b + n - y)
upper <- qbeta(.975, a + y, b + n - y)

(p <- mean( lower <= theta & theta <= upper))

## [1] 0.9695

p + c(-1, 1) * qnorm(.975) * sqrt(p * (1 - p)/n_reps)

## [1] 0.9661297 0.9728703

Now try with \(\theta=0.01\).

Binomial coverage simulation study

Let’s now look at coverage across a wide range of \(n\) and \(\theta\).

# Binomial coverage function
binomial_coverage_function <- function(d) {
  y     <- rbinom(1e4, size = d$n, prob = d$theta)
  
  # Calculate endpoints of credible intervals
  lower <- qbeta(.025, a + y, b + d$n - y)
  upper <- qbeta(.975, a + y, b + d$n - y)
  
  data.frame(coverage = mean( lower <= d$theta & d$theta <= upper))
}

sim_study <- expand.grid(n = 10^(0:3),
                        theta = seq(0,1,by=0.1)) |>
  group_by(n,theta) |>
  do(binomial_coverage_function(.))

Now plot the results

# Binomial coverage plot
ggplot(sim_study, 
       aes(
         x = theta, 
         y = coverage)) +
  geom_line() +
  facet_wrap(~n) + 
  geom_hline(
    yintercept = 0.95, 
    color = "red") +
  ylim(0,1) + 
  theme_bw()

The coverage is extremely poor when \(\theta\) is 0 (or 1) because \(y\) is always 0 (or \(n\)). In these cases, a simple way to fix the interval is to use a one-sided interval when \(y\) is 0 (or \(n\)). Specifically,

if \(y=0\), then use the one-sided interval (0,U)
if \(y=n\), then use the one-sided interval (L,1)

binomial_coverage_function_fixed <- function(d) {
  n <- d$n
  theta <- d$theta
  y     <- rbinom(1e4, size = n, prob = theta)
  mle   <- y/d$n
  bayes <- (a+y)/(a+b+n)
  
  lower <- qbeta(.025, a+y, b+n-y)
  upper <- qbeta(.975, a+y, b+n-y)
  
  # Fix intervals when y=0
  lower[y==0] <- 0
  upper[y==0] <- qbeta(.95, a+0, b+n)
  
  # Fix intervals when y=n
  upper[y==n] <- 1
  lower[y==n] <- qbeta(.05, a+n, b+0)
  
  data.frame(coverage = mean( lower <= theta & theta <= upper))
}

sim_study <- expand.grid(n = 10^(0:3),
                        theta = seq(0,1,by=0.1)) |>
  group_by(n,theta) |>
  do(binomial_coverage_function_fixed(.))

Now plot the results

ggplot(sim_study, 
       aes(
         x = theta, 
         y = coverage)) +
  geom_line() +
  facet_wrap(~n) + 
  geom_hline(
    yintercept = 0.95, 
    color = "red") +
  ylim(0,1) + 
  theme_bw()

The plot here has been constructed to have y-axis limits set to 0 and 1 for comparison with the previous plot.

Coverage for Jeffreys prior

In this class, we will use a uniform prior for the probability. An alternative is to use Jeffreys prior which is \(Be(0.5,0.5)\). Run a simulation study to look at coverage for Jeffreys prior.

Click for solution

a <- b <- 0.5

sim_study <- expand.grid(n = 10^(0:3),
                        theta = seq(0,1,by=0.1)) |>
  group_by(n,theta) |>
  do(binomial_coverage_function_fixed(.))

ggplot(sim_study, aes(x=theta, y=coverage)) +
  geom_line() +
  facet_wrap(~n) + 
  geom_hline(yintercept = 0.95, color = "red") +
  ylim(0,1) + 
  theme_bw()

Lab05 - Estimator evaluation

Jarad Niemi

2024-10-03

Binomial probability of success estimation

Binomial posterior activity

Parameter estimation

Bias

Binomial model

Binomial bias activity

Binomial bias simulation study

Binomial MSE activity

Consistency

Binomial consistency activity

Coverage

Binomial coverage activity

Binomial coverage simulation study

Coverage for Jeffreys prior