4. Probability and the Binomial Distributions

4.2 Binomial Distribution

Given a success/failure situation (or yes/no, black/white, any 2 outcome, dichotomous situation) and a probability of success P(S)=p (and so a probability of failure P(F) = q = 1 -p), what is the probability of achieving x successes in n trials? In symbols[1] what is P(x successes | n trials)? Or with simpler notation, what is P(x \mid n)? The answer is :

(4.2)   \begin{equation*} P(x \mid n) = \left( \begin{array}{c} n \\ x \end{array} \right) p^{x} q^{n-x}. \end{equation*}

**Proof of the P(x \mid n) formula

Use the boxes we used in defining the fundamental counting rule to represent each trial.

Consider n = 1.

The probability that a success occurs is the definition of p. So

    \[P(1 \mid 1) = p = \left( \begin{array}{c} 1 \\ 1 \end{array} \right) p^{1} q^{0}.\]

Consider n = 2. What is P(0 \mid 2)? This is all failures :

The probability of each failure is q so the probability of getting FF is q \cdot q = q^{2}. So

    \[ P(0 \mid 2) = q^{2} = \left( \begin{array}{c} 2 \\ 0 \end{array} \right) p^{0} q^{2}. \]

(Note that \left( \begin{array}{c} 2 \\ 0 \end{array} \right) = 1 by definition. There is exactly one way to draw no things from a collection of 2.)

What is P(1 \mid 2)? Each probability of p \cdot q (p \cdot q for the first one, q \cdot p for the second one). So

    \[ P(1 \mid 2) = 2 \cdot p \cdot q = \left( \begin{array}{c} 2 \\ 1 \end{array} \right) p^{1} q^{1}. \]

For x = 2 we have

    \[ P(2 \mid 2) = \left( \begin{array}{c} 2 \\ 2 \end{array} \right) p^{2} q^{0}. \]

We can continue this way for n = 3, 4, \ldots but this is clearly tedious. The way of “mathematical induction” is the formal way to proceed but let’s try a more intuitive approach.

For x successes in n trials, consider our n boxes, then any given sequence with x successes will have n-x failures and so that given sequence will have a probability of p^{x}q^{n-x}. But how many specific sequences with x successes are there? Think of it this way. Of the n boxes, how many ways are there to write x S’s in the n boxes? There are n possibilities (n boxes are available) to write the first S, n-1 ways after that to write the second S, etc. But we don’t care which order we wrote the S’s into the boxes so divide by n!. In other words there are \left( \begin{array}{c} n\\ x \end{array} \right) specific sequences with x successes. Putting it all together :

    \[ P(x \mid n) = \left( \begin{array}{c} n \\ x \end{array} \right) p^{x} q^{n-x}. \]

Example 4.6 : In bucket of 100 toys with 20 dinosaurs and 80 bugs, consider drawing a dinosaur a success. So P(S)=p=0.2 and P(F)=q = 1-p=0.8. Let us make an approximation and assume that p does not change with each draw[2]

Say we want to know P(3 successes \mid10 trials). In other words, what is the probability that if I take 10 toys out of the bucket that exactly 3 of them are dinosaurs? Using Equation (4.2) we find

    \[ P(3 \mid 10) = \left( \begin{array}{c} 10 \\ 3 \end{array} \right) 0.2^{3} 0.8^{7} = 0.201. \]

The actual process of doing this calculation is somewhat tedious and therefore error prone. So in a test, for example, you will want to use the Binomial Distribution Table included in this text in the Appendix. In the Binomial Distribution Table, you simply find the appropriate n and then x in the column on the left and then look under the appropriate p column to find P(x \mid n) for the given p.

The complete binomial distribution specifies the probabilities of all x successes from 0 to n, and can be plotted as a histogram. Note that there is a binomial distribution for each x and p. Let’s plot the binomial distribution for getting x successes (dinosaurs) in forming a sample of n=10 toys with p=0.2. The Binomial Distribution Table contains the relative frequency table for the histogram that represents the binomial distribution shown in Figure 4.1.

Figure 4.1 : The binomial distribution for the example of forming samples of n=10 toys with x representing the number of dinosaurs in the sample and p = 0.2 being the probability of selecting a dinosaur in forming the sample. Note that the probability of x = 8, 9 or 10 is not zero, just less than 0.001.

The binomial distribution is an example of a discrete probability distribution. It is a histogram of relative frequencies obtained by counting possibilities in sample space.[3]

The mean and variance of any discrete distribution are given by

    \[ \mu = \sum_{x} x \cdot P(x) \]

    \[ \sigma^{2} = \sum_{x}(x-\mu)^2 \cdot P(x) = \left [ \sum_{x} x^2 \cdot P(x) \right] - \mu^2 \]

These two formulae come from the grouped data expressions \mu = \sum f(x) x/n and \sigma^{2} = \sum f(x)(x - \mu)^2/n, by substituting P(x) = f(x)/n. If we substitute Equation 4.2 for P(x) in these general equations we get

    \[ \mu = n p \]

    \[ \sigma^2 = npq \]

which are the mean and variance for a binomial distribution with parameters n and p. The mean is the expected value.

Example 4.7 : For the bucket of toys example:

    \[ \mu = n \cdot p = 10 \cdot 0.20 = 2 \]

So given any random sample of 10 toys we expect that 2 of them will be red.

4.2.1 Practical Binomial Distribution Examples

The examples given here illustrate the sampling theory for forming samples from a dichotomous (with success/fail items; items of interest and no interest) population. In this situation we know exactly what is in the population and ask questions about what kind of samples can be formed and what is their probability. The sampling theory is completely described by the binomial distribution. Later, we will have a sampling theory based on the Central Limit Theorem which will lead us to the normal distribution.

In practically solving these kinds of problems keep in mind that you need to identify: n, p and x.

Example 4.8 : It was reported that 5\% of Americans are afraid of being alone in a house at night. In a random sample of 20 Americans, what are the probabilities that the sample contains

  1. exactly 5 afraid people?
  2. at most 3 afraid people?
  3. at least 3 afraid people?

Solution : First identify: n=20, p=0.05 and the x as specific to each question :

  1. For this case, x = 5, so from the Binomial Distribution Table get P(x=5)=0.002.
  2. For this case x = 0, 1, 2 and 3 and we have to add up the probabilities
    From the Binomial Distribution Table:
    P(x=0) = 0.358
    P(x=1) = 0.377
    P(x=2) = 0.189
    P(x=3) = 0.060
    So P(x \textrm{ is at most 3}) = 0.358 + 0.377 + 0.189 + 0.060 = 0.989
  3. x = 3, 4, 5, 6, 7, \hdots, 20
    From the Binomial Distribution Table:
    P(x=3) = 0.060
    P(x=4) = 0.013
    P(x=5) = 0.002
    P(x=6 \textrm{ or more}) = \textrm{ approximately zero}

Since the probabilities of high x are too small to appear in the Binomial Distribution Table (and there would be many terms to consider if they weren’t) we should use the following trick :

    \begin{eqnarray*} P(x=3 \mbox{ or more}) &=& 1 - P(x \mbox{ is less than } 3) \\ & = & 1 - [P(0)+P(1)+P(2)]\\ & = & 1-[0.358+0.377+0.189]=0.076 \end{eqnarray*}


  1. Here the | is read as "given".
  2. **By assuming that p does not change, we will be lead to the binomial distribution. If we more accurately assume that P(S) changes with each draw we will be lead to the hypergeometic distribution. For fun, let's consider the case where P(S) changes with each draw. It's just another application of the fundamental counting rule. To begin, there are \left( \begin{array}{c} 100 \\ 10 \end{array}\right) = 17.3 \times 10^{12} ways of drawing 10 toys from the bucket without caring if it is a dinosaur or a bug. This is the size of the sample space; it is how many ways there are to make a sample of size 10 from the bucket of 100 choices; it is n(S) in Equation (4.1). There are 17.3 \times 10^{12} samples of 10 in the bucket. If we want 3 dinosaurs in our sample, as in the example in text then of the 20 dinosaurs in the bucket, there are \left( \begin{array}{c} 20 \\ 3 \end{array} \right) = 1140 ways to get 3 dinosaurs and \left( \begin{array}{c} 80 \\ 7 \end{array} \right) = 3.18 \times 10^{9} ways to get 7 bugs from the 80 in the bucket. So there are \left( \begin{array}{c} 20 \\ 3 \end{array} \right) \cdot \left( \begin{array}{c} 80 \\ 7 \end{array} \right) = 3.62 \times 10^{12} ways to draw 3 dinosaurs and 7 bugs from the bucket. This number is n(E) in Equation (4.1). And so

        \[ P(3 \mbox{ dinosaurs} \mid 10 \mbox{ toys}) = \frac{\left( \begin{array}{c} 20 \\ 3 \end{array} \right) \left( \begin{array}{c} 80 \\ 7 \end{array} \right)}{\left( \begin{array}{c} 100 \\ 10 \end{array} \right)} = \frac{3.62 \times 10^{12}}{17.3 \times 10^12} = 0.209 \]

    Note how close this is to the answer from the binomial distribution of 0.201.
  3. Sample space is the set of all possible samples.

License

Share This Book