Many of the standard probability distributions have functions in R to calculate:

The letter after each indicates the letter the function will start with for all distributions.

Discrete distributions

If the support of the random variable is a finite or countably infinite number of values, then the random variable is discrete. Discrete random variables have a probability mass function (pmf). This pmf gives the probability that a random variable will take on each value in its support. The cumulative distribution function (cdf) provides the probability the random variable is less than or equal to a particular value. The quantile function is the inverse of the cumulative distribution function, i.e. you provide a probability and the quantile function returns the value of the random variable such that the cdf will return that probability. (This is not very useful for discrete distributions.)

Binomial distribution

Whenever you repeat an experiment and you assume

  1. the probability of success is the same each time and
  2. each trial of the experiment is independent of the rest (as long as you know the probability of success).

Then, if you record the number of success out of the total, you have a binomial distribution.

For example, consider an experiment with probability of success of 0.7 and 13 trials, i.e. \(X\sim Bin(13,0.7)\).

We can use the pmf to calculate the probability of a particular outcome of the experiment. For example, what is the probability of seeing 6 successes? We can use the dbinom function.

n <- 13
p <- 0.7
dbinom(6, size = n, prob = p)
## [1] 0.0441524

The entire pmf is

x <- 0:n
plot(x, dbinom(x, size = n, prob = p), 
     main = "Probability mass function for Bin(13,0.7)")

If we want to calculate the probability of observing an outcome less than or equal to a particular value, we can use the cumulative distribution function. For example, what is the probability of observing 9 or fewer successes?

pbinom(9, size = n, prob = p)
## [1] 0.5793944

Here is the entire cdf.

plot(x, pbinom(x, size = n, prob = p), 
     type="s", 
     main = "Cumulative distribution function for Bin(13,0.7)")

For a discrete random variable, the cdf is a step function since the function jumps whenever it comes across a value in the support for the random variable.

Finally, we can draw random values from this binomial distribution using the rbinom function.

draws <- rbinom(100, size = n, prob = p) # draw 100
brks  <- (0:(n+1)) - 0.5
hist(draws, breaks = brks, 
     main = "Random draws from Bin(13,0.7)") 

Rather than having the number of draws, we often want the percentage of draws. For reference, we can add the true probabilities using the pmf (shown in red).

hist(draws, breaks = brks, probability = TRUE)
points(x, dbinom(x, size = n, prob = p), col="red")

Activity

Plot the pmf and cdf function for the binomial distribution with probability of success 0.25 and 39 trials, i.e. \(X\sim Bin(39,0.25)\). Then sample 999 random binomials with 39 trials and probability of success 0.25 and plot them on a histogram with the true probability mass function.

Click for solution
n <- 39
p <- 0.25
x <- 0:n

plot(x, dbinom(x, size = n, prob = p),                     
     main = 'Probability mass function for Bin(39,0.25)')

plot(x, pbinom(x, size = n, prob = p), type = 's', 
     main = 'Cumulative distribution function for Bin(39, 0.25)')

draws <- rbinom(999, size = n, prob = p)

brks  <- (0:(n+1)) - 0.5
hist(draws, breaks = brks,
     probability = TRUE, 
     main = "Random draws from Bin(39,0.25)")

points(x, dbinom(x, size = n, prob = p), col="red")






Poisson distribution

While the binomial distribution has an upper limit (\(n\)), we sometimes run an experiment and are counting successes without any technical upper limit. These experiments are usually run for some amount of time or over some amount of space or both. For example,

  • the number of photos observed by a detector in a minute
  • the number of times an a/c unit comes on in an hour
  • the number of buildings in a square mile
  • the number of transistors on a circuit board

When there is no technical upper limit (even if the probability of more is extremely small), then a Poisson distribution can be used. The Poisson distribution has a single parameter, the rate that describes, on average, how many of the things are expected to be observed. We write \(X\sim Po(\lambda)\) where \(\lambda\) is the rate parameter.

Suppose we record the number of network failures in a day and on average we see 2 failures per day. The number of network failures in a day has no upper limit, so we’ll use the Poisson distribution.

Here is the pmf:

rate <- 2
x <- 0:10 # with no upper limit we need to decide on an upper limit

plot(x, dpois(x, lambda = rate), 
     main = "Probability mass function for Po(2)") 

Here is the cdf:

plot(x, ppois(x, lambda = rate), 
     type="s", 
     main = "Cumulative distribution function for Po(2)")

And random draws

draws <- rpois(100, lambda = rate)

hist(draws, 
     breaks = (0:(max(draws)+1)) - 0.5, 
     probability = TRUE, 
     main = "Random draws from Po(2)")

points(x, dpois(x, lambda = rate), col="red")

Activity

Plot the pmf and cdf function for a Poisson distribution with rate 23, i.e. \(X\sim Po(23)\). Then sample 999 Poisson random variables with rate 23 and plot them on a histogram with the true probability mass function.

Click for solution
rate <- 23
x <- 5:40 # with no upper limit we need to decide on an upper limit

plot(x, dpois(x, lambda = rate), 
     main = 'Probability mass function for Po(23)')

plot(x, ppois(x, lambda = rate), type="s", 
     main = 'Cumulative distribution function for Po(23)')

draws <- rpois(999, lambda = rate) 

hist(draws, 
     breaks = (min(draws):(max(draws)+1)) - 0.5, 
     probability = TRUE, 
     main = "Random draws from Po(23)")

points(x, dpois(x, lambda = rate), col="red")






Application

Suppose a manufacturing line produces temperature sensors and has a sensor failure proportion equal to 1.5%. In a given day, the plant tests 70 sensors.

Binomial

If we assume the sensors are independent (given the failure proportion), then \(X\sim Bin(70,0.015)\).

Answer the following questions:

  • What is the probability no sensors fail in one day?
  • What is the probability 3 or more sensors fail in one day?
Click for solution

The second question is tricky, because we need \(P(X\ge 3) = 1-P(X<3) = 1-P(X\le 2)\).

n <- 70
p <- 0.015

dbinom(    0, size = n, prob = p) # Probability of no failures
## [1] 0.3471652
1 - pbinom(2, size = n, prob = p) # Probability of 3 or more sensor failures
## [1] 0.08833028

Poisson



If we approximate the binomial distribution with a Poisson distribution, then \(X\stackrel{\cdot}{\sim} Po(70\times 0.015)\).

Using this Poisson distribution, answer the following questions.

  • What is the probability no sensors fail in one day?
  • What is the probability 3 or more sensors fail in one day?
Click for solution

The second question is tricky, because we need \(P(X\ge 3) = 1-P(X<3) = 1-P(X\le 2)\).

rate <- n*p

dpois(    0, lambda = rate) # Approximate probability of no failures
## [1] 0.3499377
1 - ppois(2, lambda = rate) # Approximate probability of 3 or more sensor failures
## [1] 0.08972443



How good is the Poisson approximation to the binomial? Can you plot the pmf for both?