METHODOLOGY

1.

Poison Distribution

Poison distribution is discrete probability

distribution which is used to describe the number of events occurring within a

given time interval, length, volume or area. In many practical situations we

are interested in measuring how many times a certain event occurs in a specific

time interval, specific length, volume or area.

For instance:

·

The number of

phone calls received at a call Centre in an hour,

·

The number of

cases of a disease in a week,

·

The number of

flaws on a length of cable,

·

The number of

cases of a disease in different town,

·

The number of

defects per square yard, …etc.

The Poison

distribution is based on four assumptions 1

1.

The probability of observing a single event over a

small interval is approximately proportional to the size of that interval.

2.

The probability of two events occurring in the same

narrow interval is negligible.

3.

The probability of an event within a certain interval

does not change over different intervals.

4.

The probability of an event in one interval is

independent of the probability of an event in any other non-overlapping

interval.

Under the above assumptions,

? be the rate at which event

occur, t be the length of a time interval, and Y be the total number of events

occurred in that time interval. Then, Y is called a Poisson random variable and

the probability distribution of Y is called the Poisson distribution.

Then, the probability mass

function of Y is:

The

mean and variance of the poison distribution are both equal to ?.

E(Y)=? and Var(Y)= ?2= ?

Poison distribution can be identified as the

limiting case of binomial distribution under the following conditions. If the

binomial distribution Bin(n,p) met the following condition, then Bin( n, p) can be well-approximated by the

Poison distribution Poi(?).

·

number of

trials(n) gets larger and probability of successes (p) gets smaller

·

the

distributions have the same means; i.e. ?=np.

References

1.

http://www.pmean.com/definitions/poisson.htm

2.

Negative binomial distribution

The

negative binomial distribution is a discrete probability distribution that is

used with discrete random variable. It is also known as the Pascal distribution

or as Polya’s distribution.

The

negative binomial random variables and distribution are based on the following

conditions: 1

1. The

experiment consists of a sequence of independent trials.

2. Each

trial can result in either a success (S) or a failure (F).

3. The

probability of success is constant from trial to trial, so for i = 1, 2, 3, …

4. The

experiment continues (trials are performed) until a total of r successes have

been observed, where r is a specified positive integer.

Under

the above conditions, there are r Bernoulli trials with probability of success

p, and where r is fixed integer. X is number of trials needed to get to the rth

success. X is called negative binomial random variable and probability

distribution of X is called negative binomial distribution with parameter r and

p.

The

probability mass function of X is:

Where

x= r+1, r+2, ….

Alternative

form of the negative binomial distribution

Let

Y is the number of failures before rth success. Sometimes, the

negative binomial distribution defined in terms of the random variable Y. Y

= X?r

The

probability mass function of Y is:

Where

y= 0,1,2, ….

The

negative binomial distribution with parameter r and p has mean(µ) and variance() ;

and

3.

The Negative Binomial distribution as a Gamma–Poisson distribution

A

mixture of a family of Poison distribution with Gamma distribution is one of

the most important application of the Negative binomial distribution. Then the

negative binomial distribution can be viewed as a poison distribution where the

poison parameter(?) is a random variable, distributed to a Gamma distribution.

Let

Y be the number event occurred in a given time interval. The conditional

probability mass function of Y given that the rate ? is the Poisson

distribution defined by

Suppose

has gamma distribution with scale parameter ?

and shape parameter ?. Then probability density function of is given by

The

unconditional distribution of Y is obtained by summing out ? in;

It

is of the form of negative binomial distribution. Y is called negative binomial

random variable and distributed as negative binomial distribution with

parameter and .

The

negative binomial distribution with parameter and has mean(µ) and variance() ;

and

This

poison- gamma mixture distribution was developed to account for over-dispersion

that is commonly observed in real life discrete or count data.

The

poison distribution requires the mean and variance to be equal, it is

unsuitable for data with larger variance than mean. The conditional variance is

always larger than the conditional mean for negative binomial distribution.

Therefore, this negative binomial distribution appropriate in such settings.

1. “Hypergeometric

and Negative Binomial Distributions”, online: http://www.stat.purdue.edu/~zhanghao/STAT511/handout/Stt511%20Sec3.5.pdf

4.

Generalized Linear Models(GLM)

Generalized linear models are a class

of non-linear regression models that can be used in certain cases where linear

models are not appropriate. It is a powerful generalization of linear

regression to more general exponential family. In generalized linear models,

the dependent variable is linearly related to the factors and covariates via a

specific link function. Further, the model allows for the dependent variable to

have a non-normal distribution. Linear regression, ANOVA, logistic models, log-linear

models, Poison regression and multinomial response models are most common generalized

linear models.

Generalized linear model is specified

by three components, they are random component which is the response and an

associated probability distribution, systematic component which is include

explanatory variable and relationship among them, and finally link component

which is provide relationship between the systematic component and random

component.

i.

The

random component

The

independent observations having a distribution which belongs

to the exponential family. Example of distributions belonging to the exponential

family: exponential, poison, binomial, gamma, normal, negative binomial …etc.

ii.

Systematic

component

The

systematic component specifies the explanatory variables (X1, X2, …, Xk) as

linear predictor(?) in the model. In a generalized linear model, this always

done via

? is model

parameters and Xi is explanatory variables

iii.

The

link component

This

component of a GLM is a link between the random component and systematic

component. Suppose then µ is linked to by where g(.)

is any monotonic differentiable function and is known as the link function. The

generalized linear model takes the form

The link function you choose will

depend on which exponential family distribution you are choosing for dependent

variable. Here are some examples of link function: identity link function used

with any distribution, log link function also used with any distribution, logit

link function used with binomial distribution, Probit link function apocopate

only binomial distribution, … etc.

Reference

1.

James K. Lindsey ,”Applying Generalized Linear Models”