Poison distribution is discrete probability
distribution which is used to describe the number of events occurring within a
given time interval, length, volume or area. In many practical situations we
are interested in measuring how many times a certain event occurs in a specific
time interval, specific length, volume or area.
The number of
phone calls received at a call Centre in an hour,
The number of
cases of a disease in a week,
The number of
flaws on a length of cable,
The number of
cases of a disease in different town,
The number of
defects per square yard, …etc.
distribution is based on four assumptions 1
The probability of observing a single event over a
small interval is approximately proportional to the size of that interval.
The probability of two events occurring in the same
narrow interval is negligible.
The probability of an event within a certain interval
does not change over different intervals.
The probability of an event in one interval is
independent of the probability of an event in any other non-overlapping
Under the above assumptions,
? be the rate at which event
occur, t be the length of a time interval, and Y be the total number of events
occurred in that time interval. Then, Y is called a Poisson random variable and
the probability distribution of Y is called the Poisson distribution.
Then, the probability mass
function of Y is:
mean and variance of the poison distribution are both equal to ?.
E(Y)=? and Var(Y)= ?2= ?
Poison distribution can be identified as the
limiting case of binomial distribution under the following conditions. If the
binomial distribution Bin(n,p) met the following condition, then Bin( n, p) can be well-approximated by the
Poison distribution Poi(?).
trials(n) gets larger and probability of successes (p) gets smaller
distributions have the same means; i.e. ?=np.
Negative binomial distribution
negative binomial distribution is a discrete probability distribution that is
used with discrete random variable. It is also known as the Pascal distribution
or as Polya’s distribution.
negative binomial random variables and distribution are based on the following
experiment consists of a sequence of independent trials.
trial can result in either a success (S) or a failure (F).
probability of success is constant from trial to trial, so for i = 1, 2, 3, …
experiment continues (trials are performed) until a total of r successes have
been observed, where r is a specified positive integer.
the above conditions, there are r Bernoulli trials with probability of success
p, and where r is fixed integer. X is number of trials needed to get to the rth
success. X is called negative binomial random variable and probability
distribution of X is called negative binomial distribution with parameter r and
probability mass function of X is:
x= r+1, r+2, ….
form of the negative binomial distribution
Y is the number of failures before rth success. Sometimes, the
negative binomial distribution defined in terms of the random variable Y. Y
probability mass function of Y is:
y= 0,1,2, ….
negative binomial distribution with parameter r and p has mean(µ) and variance() ;
The Negative Binomial distribution as a Gamma–Poisson distribution
mixture of a family of Poison distribution with Gamma distribution is one of
the most important application of the Negative binomial distribution. Then the
negative binomial distribution can be viewed as a poison distribution where the
poison parameter(?) is a random variable, distributed to a Gamma distribution.
Y be the number event occurred in a given time interval. The conditional
probability mass function of Y given that the rate ? is the Poisson
distribution defined by
has gamma distribution with scale parameter ?
and shape parameter ?. Then probability density function of is given by
unconditional distribution of Y is obtained by summing out ? in;
is of the form of negative binomial distribution. Y is called negative binomial
random variable and distributed as negative binomial distribution with
parameter and .
negative binomial distribution with parameter and has mean(µ) and variance() ;
poison- gamma mixture distribution was developed to account for over-dispersion
that is commonly observed in real life discrete or count data.
poison distribution requires the mean and variance to be equal, it is
unsuitable for data with larger variance than mean. The conditional variance is
always larger than the conditional mean for negative binomial distribution.
Therefore, this negative binomial distribution appropriate in such settings.
and Negative Binomial Distributions”, online: http://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.5.pdf
Generalized Linear Models(GLM)
Generalized linear models are a class
of non-linear regression models that can be used in certain cases where linear
models are not appropriate. It is a powerful generalization of linear
regression to more general exponential family. In generalized linear models,
the dependent variable is linearly related to the factors and covariates via a
specific link function. Further, the model allows for the dependent variable to
have a non-normal distribution. Linear regression, ANOVA, logistic models, log-linear
models, Poison regression and multinomial response models are most common generalized
Generalized linear model is specified
by three components, they are random component which is the response and an
associated probability distribution, systematic component which is include
explanatory variable and relationship among them, and finally link component
which is provide relationship between the systematic component and random
independent observations having a distribution which belongs
to the exponential family. Example of distributions belonging to the exponential
family: exponential, poison, binomial, gamma, normal, negative binomial …etc.
systematic component specifies the explanatory variables (X1, X2, …, Xk) as
linear predictor(?) in the model. In a generalized linear model, this always
? is model
parameters and Xi is explanatory variables
component of a GLM is a link between the random component and systematic
component. Suppose then µ is linked to by where g(.)
is any monotonic differentiable function and is known as the link function. The
generalized linear model takes the form
The link function you choose will
depend on which exponential family distribution you are choosing for dependent
variable. Here are some examples of link function: identity link function used
with any distribution, log link function also used with any distribution, logit
link function used with binomial distribution, Probit link function apocopate
only binomial distribution, … etc.
James K. Lindsey ,”Applying Generalized Linear Models”