Many of the standard probability distributions have functions in R to calculate:
The letter after each indicates the letter the function will start with for all distributions.
If the support of the random variable is a finite or countably infinite number of values, then the random variable is discrete. Discrete random variables have a probability mass function (pmf). This pmf gives the probability that a random variable will take on each value in its support. The cumulative distribution function (cdf) provides the probability the random variable is less than or equal to a particular value. The quantile function is the inverse of the cumulative distribution function, i.e. you provide a probability and the quantile function returns the value of the random variable such that the cdf will return that probability. (This is not very useful for discrete distributions.)
Whenever you repeat an experiment and you assume
Then, if you record the number of success out of the total, you have a binomial distribution.
For example, consider an experiment with probability of success of 0.7 and 13 trials, i.e. \(X\sim Bin(13,0.7)\).
We can use the pmf to calculate the probability of a particular
outcome of the experiment. For example, what is the probability of
seeing 6 successes? We can use the dbinom
function.
n <- 13
p <- 0.7
dbinom(6, size = n, prob = p)
## [1] 0.0441524
The entire pmf is
x <- 0:n
plot(x, dbinom(x, size = n, prob = p),
main = "Probability mass function for Bin(13,0.7)")
If we want to calculate the probability of observing an outcome less than or equal to a particular value, we can use the cumulative distribution function. For example, what is the probability of observing 9 or fewer successes?
pbinom(9, size = n, prob = p)
## [1] 0.5793944
Here is the entire cdf.
plot(x, pbinom(x, size = n, prob = p),
type="s",
main = "Cumulative distribution function for Bin(13,0.7)")
For a discrete random variable, the cdf is a step function since the function jumps whenever it comes across a value in the support for the random variable.
Finally, we can draw random values from this binomial distribution
using the rbinom
function.
draws <- rbinom(100, size = n, prob = p) # draw 100
brks <- (0:(n+1)) - 0.5
hist(draws, breaks = brks,
main = "Random draws from Bin(13,0.7)")
Rather than having the number of draws, we often want the percentage of draws. For reference, we can add the true probabilities using the pmf (shown in red).
hist(draws, breaks = brks, probability = TRUE)
points(x, dbinom(x, size = n, prob = p), col="red")
Plot the pmf and cdf function for the binomial distribution with probability of success 0.25 and 39 trials, i.e. \(X\sim Bin(39,0.25)\). Then sample 999 random binomials with 39 trials and probability of success 0.25 and plot them on a histogram with the true probability mass function.
n <- 39
p <- 0.25
x <- 0:n
plot(x, dbinom(x, size = n, prob = p),
main = 'Probability mass function for Bin(39,0.25)')
plot(x, pbinom(x, size = n, prob = p), type = 's',
main = 'Cumulative distribution function for Bin(39, 0.25)')
draws <- rbinom(999, size = n, prob = p)
brks <- (0:(n+1)) - 0.5
hist(draws, breaks = brks,
probability = TRUE,
main = "Random draws from Bin(39,0.25)")
points(x, dbinom(x, size = n, prob = p), col="red")
While the binomial distribution has an upper limit (\(n\)), we sometimes run an experiment and are counting successes without any technical upper limit. These experiments are usually run for some amount of time or over some amount of space or both. For example,
When there is no technical upper limit (even if the probability of more is extremely small), then a Poisson distribution can be used. The Poisson distribution has a single parameter, the rate that describes, on average, how many of the things are expected to be observed. We write \(X\sim Po(\lambda)\) where \(\lambda\) is the rate parameter.
Suppose we record the number of network failures in a day and on average we see 2 failures per day. The number of network failures in a day has no upper limit, so we’ll use the Poisson distribution.
Here is the pmf:
rate <- 2
x <- 0:10 # with no upper limit we need to decide on an upper limit
plot(x, dpois(x, lambda = rate),
main = "Probability mass function for Po(2)")
Here is the cdf:
plot(x, ppois(x, lambda = rate),
type="s",
main = "Cumulative distribution function for Po(2)")
And random draws
draws <- rpois(100, lambda = rate)
hist(draws,
breaks = (0:(max(draws)+1)) - 0.5,
probability = TRUE,
main = "Random draws from Po(2)")
points(x, dpois(x, lambda = rate), col="red")
Plot the pmf and cdf function for a Poisson distribution with rate 23, i.e. \(X\sim Po(23)\). Then sample 999 Poisson random variables with rate 23 and plot them on a histogram with the true probability mass function.
rate <- 23
x <- 5:40 # with no upper limit we need to decide on an upper limit
plot(x, dpois(x, lambda = rate),
main = 'Probability mass function for Po(23)')
plot(x, ppois(x, lambda = rate), type="s",
main = 'Cumulative distribution function for Po(23)')
draws <- rpois(999, lambda = rate)
hist(draws,
breaks = (min(draws):(max(draws)+1)) - 0.5,
probability = TRUE,
main = "Random draws from Po(23)")
points(x, dpois(x, lambda = rate), col="red")
Suppose a manufacturing line produces temperature sensors and has a sensor failure proportion equal to 1.5%. In a given day, the plant tests 70 sensors.
If we assume the sensors are independent (given the failure proportion), then \(X\sim Bin(70,0.015)\).
Answer the following questions:
The second question is tricky, because we need \(P(X\ge 3) = 1-P(X<3) = 1-P(X\le 2)\).
n <- 70
p <- 0.015
dbinom( 0, size = n, prob = p) # Probability of no failures
## [1] 0.3471652
1 - pbinom(2, size = n, prob = p) # Probability of 3 or more sensor failures
## [1] 0.08833028
If we approximate the binomial distribution with a Poisson distribution,
then \(X\stackrel{\cdot}{\sim} Po(70\times
0.015)\).
Using this Poisson distribution, answer the following questions.
The second question is tricky, because we need \(P(X\ge 3) = 1-P(X<3) = 1-P(X\le 2)\).
rate <- n*p
dpois( 0, lambda = rate) # Approximate probability of no failures
## [1] 0.3499377
1 - ppois(2, lambda = rate) # Approximate probability of 3 or more sensor failures
## [1] 0.08972443
How good is the Poisson approximation to the binomial? Can you plot the
pmf for both?