5. The Normal Distributions
5.2 **The Normal Distribution as a Limit of Binomial Distributions
The results of the derivation given here may be used to understand the origin of the Normal Distribution as a limit of Binomial Distributions[1]. A mathematical “trick” using logarithmic differentiation will be used.
First, recall the definition of the Binomial Distribution[2] as
(5.2)
where is the probability of success,
is probability of failure and
(5.3)
is the binomial coefficient that counts the number of ways to select items from
items without caring about the order of selection. Here
is a discrete variable,
, with
.
The trick is to find a way to deal with the fact that (
is a discrete variable) for the Binomial Distribution and
(
is a continuous variable) for the Normal Distribution[3] In other words as we let
we need to come up with a way to let
shrink[4] so that a probability density limit (the Normal Distribution) is reached from a sequence of probability distributions (modified Binomial Distributions). So let
represent the Normal Distribution with mean
and variance
. We will show how
where each Binomial Distribution
also has mean
and variance
.
The heart of the trick is to notice[5] that
(5.4)
This is perfectly true for the density . The trick is to substitute the distribution
for the density
in the RHS of Equation (5.4) to get :
(5.5)
because . The trick is to now pretend that
is a continuous function defined at all
; we just don’t know what its values should be for non-integer
. With such a “continuation” of
we can write[6]
(..)
Equation (5.8) has no limit; it blows up as . We need to transform
in such a way to gain control on
(getting it to shrink as
) and to get something that converges. To do that we introduce
and a new variable
. With this transformation of variables, the chain rule gives
(5.9)
and the RHS of Equation (5.8) becomes, using
(..)
Using Equation (5.9), for the LHS, and Equation (5.14), for the RHS, Equation (5.8) becomes
(..)
where means terms that will go to zero as
, and we have used the relation
to get Equation (5.16}) and
to go from Equation (5.17) to Equation (5.18). Dividing both sides of Equation (5.19) by
leaves
(5.20)
Our transformation, with its , has given us the exact control we need to keep the limit from disappearing or blowing up. Integrating Equation (5.20) gives
(5.21)
where is the a constant of integration. Switching back to the
variable
(..)
To evaluate the constant of integration, , we impose
because we want
to be a probability distribution. So
(5.25)
so
(5.26)
and
(5.27)
which is the Normal Distribution that approximates Binomial Distributions with the same mean and variance as gets large.












You may be wondering why that transformation worked because it seems to have been pulled from the air. According to Lindsay & Margenau, it was Laplace who first used this transformation and derivation in 1812. What this transformation does is pull the Binomial Distribution
back to have a mean of zero (by subtracting
) which keeps
from running off to infinity and, more importantly, allows us to define a function
with
that has a constant variance of
that we can match to
when we transform back to
at each
, see Figure 5.1. Looking at it the other way around, the Normal Distribution[7]
with
is an approximation for Binomial Distribution
that “asymptotically” approaches
as
.
This is not the only way to form a probability density limit from a sequence of Binomial distributions. It is one that gives a good approximation of the Binomial Distribution when is fairly small if the term
in Equation (5.18) becomes small quickly. If
is very small, this does not happen and another limit of Binomial Distributions that leads to the Poisson Distribution is more appropriate. When
and
are close to 0.5 or more generally when
and
then the Normal approximation is a good one. Either way, the density limit is a mathematical idealization, a convenience really, that is based on a discrete probability distribution that just summarizes the result of counting outcomes. Counting gives the foundation for probability theory.
- The formula for the Binomial Distribution was apparently derived by Newton according to: Lindsay RB, Margenau. Foundations of Physics. Dover, New York, 1957 (originally published 1936). For that claim, Lindsay & Margenau quote: von Mises R. Probability, Statistics, and Truth. Macmillan, New York, 1939 (originally published 1928). The derivation of the Normal Distribution presented here largely follows that given in Lindsay & Margenau's book. ↵
- In class we denoted the Binomial distribution as
. Here we use
to avoid using too many P's and p's. ↵
- Remember that the Normal Distribution is technically a probability density but we slur the use of the word distribution between probability distribution (discrete
) and probability density (continuous
) like everyone else. ↵
for the Binomial Distribution. ↵
- Remember that
and use the chain rule to notice this. ↵
- You can probably imagine many ways to continue the Binomial Distribution from
to
. It doesn't matter which one you pick as long as the behaviour of your new function is not too crazy between the integers; that is,
should exist at all
. ↵
- Our symbols here are not mathematically clean; we should write something like
instead of
or
composed with
at
,
, instead of
. But to emphasize the intuition we use
. In clean symbols, the function
asymptotically approaches
where
. ↵