4. Probability and the Binomial Distributions

# 4.2 Binomial Distribution

Given a success/failure situation (or yes/no, black/white, any 2 outcome, dichotomous situation) and a probability of success (and so a probability of failure ), what is the probability of achieving *successes in trials? *In symbols^{[1]} what is ( successes | trials)? Or with simpler notation, what is ? The answer is :

(4.2)

****Proof of the formula**

Use the boxes we used in defining the fundamental counting rule to represent each trial.

Consider .

The probability that a success occurs is the definition of . So

Consider . What is ? This is all failures :

The probability of each failure is so the probability of getting FF is . So

(Note that *by definition*. There is exactly one way to draw no things from a collection of 2.)

What is ? Each probability of ( for the first one, for the second one). So

For we have

We can continue this way for but this is clearly tedious. The way of “mathematical induction” is the formal way to proceed but let’s try a more intuitive approach.

For successes in trials, consider our boxes, then any given sequence with successes will have failures and so that given sequence will have a probability of . But how many specific sequences with successes are there? Think of it this way. Of the boxes, how many ways are there to write S’s in the boxes? There are possibilities ( boxes are available) to write the first S, ways after that to write the second S, etc. But we don’t care which order we wrote the S’s into the boxes so divide by . In other words there are *specific* sequences with successes. Putting it all together :

▢

**Example 4.6** : In bucket of 100 toys with 20 dinosaurs and 80 bugs, consider drawing a dinosaur a success. So and . Let us make an approximation and assume that does not change with each draw^{[2]}

Say we want to know (3 successes 10 trials). In other words, what is the probability that if I take 10 toys out of the bucket that exactly 3 of them are dinosaurs? Using Equation (4.2) we find

The actual process of doing this calculation is somewhat tedious and therefore error prone. So in a test, for example, you will want to use the **Binomial Distribution Table** included in this text in the Appendix. In the **Binomial Distribution Table**, you simply find the appropriate and then in the column on the left and then look under the appropriate column to find for the given .

▢

The complete binomial distribution specifies the probabilities of all successes from 0 to , and can be plotted as a histogram. Note that there is a binomial distribution for each and . Let’s plot the binomial distribution for getting successes (dinosaurs) in forming a sample of toys with . The **Binomial Distribution Table** contains the relative frequency table for the histogram that represents the binomial distribution shown in Figure 4.1.

The binomial distribution is an example of a *discrete probability distribution*. It is a histogram of relative frequencies obtained by counting possibilities in sample space.^{[3]}

The mean and variance of any discrete distribution are given by

These two formulae come from the grouped data expressions and , by substituting . If we substitute Equation 4.2 for in these general equations we get

which are the mean and variance for a binomial distribution with parameters and . The mean is the *expected value*.

**Example 4.7** : For the bucket of toys example:

So given any random sample of 10 toys we *expect* that 2 of them will be red.

▢

**4.2.1 Practical Binomial Distribution Examples**

The examples given here illustrate the *sampling theory* for forming samples from a dichotomous (with success/fail items; items of interest and no interest) population. In this situation we know exactly what is in the population and ask questions about what kind of samples can be formed and what is their probability. The sampling theory is completely described by the binomial distribution. Later, we will have a sampling theory based on the Central Limit Theorem which will lead us to the normal distribution.

In practically solving these kinds of problems keep in mind that you need to identify: , and .

**Example 4.8** : It was reported that 5 of Americans are afraid of being alone in a house at night. In a random sample of 20 Americans, what are the probabilities that the sample contains

- exactly 5 afraid people?
- at most 3 afraid people?
- at least 3 afraid people?

**Solution** : First identify: , and the as specific to each question :

- For this case, , so from the
**Binomial Distribution Table**get . - For this case , 1, 2 and 3 and we have to add up the probabilities

From the**Binomial Distribution Table**:

So - , 4, 5, 6, 7, , 20

From the**Binomial Distribution Table**:

Since the probabilities of high are too small to appear in the **Binomial Distribution Table** (and there would be many terms to consider if they weren’t) we should use the following trick :

▢

- Here the | is read as "given". ↵
- **By assuming that does not change, we will be lead to the binomial distribution. If we more accurately assume that changes with each draw we will be lead to the hypergeometic distribution. For fun, let's consider the case where changes with each draw. It's just another application of the fundamental counting rule. To begin, there are ways of drawing 10 toys from the bucket without caring if it is a dinosaur or a bug. This is the size of the
*sample space*; it is how many ways there are to make a sample of size 10 from the bucket of 100 choices; it is in Equation (4.1). There are samples of 10 in the bucket. If we want 3 dinosaurs in our sample, as in the example in text then of the 20 dinosaurs in the bucket, there are ways to get 3 dinosaurs and ways to get 7 bugs from the 80 in the bucket. So there are ways to draw 3 dinosaurs and 7 bugs from the bucket. This number is in Equation (4.1). And so - Sample space is the set of all possible samples. ↵