Difference between revisions of "Forest ErrAna StatDist"

From New IAC Wiki
Jump to navigation Jump to search
Line 783: Line 783:
 
:<math>\sigma^2 = \mu</math>
 
:<math>\sigma^2 = \mu</math>
  
replacing dummy variable x with r - \mu
+
replacing dummy variable x with r - <math>\mu</math>
  
 
:<math>P_P(r) =  \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{-(r - \mu)^2}{2\sigma^2} } =\frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{r -\mu}{\sigma} \right) ^2}</math> = Gaussian distribution when <math>\mu \gg 1</math>
 
:<math>P_P(r) =  \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{-(r - \mu)^2}{2\sigma^2} } =\frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{r -\mu}{\sigma} \right) ^2}</math> = Gaussian distribution when <math>\mu \gg 1</math>

Revision as of 05:23, 21 February 2014

Parent Distribution

Let xi represent our ith attempt to measurement the quantity x

Due to the random errors present in any experiment we should not expect xi=x.

If we neglect systematic errors, then we should expect xi to, on average, follow some probability distribution around the correct value x.

This probability distribution can be referred to as the "parent population".


Average and Variance

Average

The word "average" is used to describe a property of a "parent" probability distribution or a set of observations/measurements made in an experiment which gives an indication of a likely outcome of an experiment.

The symbol

μ

is usually used to represent the "mean" of a known probability (parent) distribution (parent mean) while the "average" of a set of observations/measurements is denoted as

ˉx

and is commonly referred to as the "sample" average or "sample mean".



Definition of the mean

μlimNxiN


Here the above average of a parent distribution is defined in terms of an infinite sum of observations (x_i) of an observable x divided by the number of observations.

ˉx is a calculation of the mean using a finite number of observations

ˉxxiN


This definition uses the assumption that the result of an experiment, measuring a sample average of (ˉx), asymptotically approaches the "true" average of the parent distribution μ .

Variance

The word "variance" is used to describe a property of a probability distribution or a set of observations/measurements made in an experiment which gives an indication how much an observation will deviate from and average value.

A deviation (di) of any measurement (xi) from a parent distribution with a mean μ can be defined as

dixiμ

the deviations should average to ZERO for an infinite number of observations by definition of the mean.

Definition of the average

μlimNxiN
limN(xiμ)N
=(limN(xiN)μ
=(limN(xiN)limNxiN=0


But the AVERAGE DEVIATION (ˉd) is given by an average of the magnitude of the deviations given by

ˉd=limN|(xiμ)|N = a measure of the dispersion of the expected observations about the mean

Taking the absolute value though is cumbersome when performing a statistical analysis so one may express this dispersion in terms of the variance

A typical variable used to denote the variance is

σ2

and is defined as

σ2=limN[(xiμ)2N]


Standard Deviation

The standard deviation is defined as the square root of the variance

S.D. = σ2


The mean should be thought of as a parameter which characterizes the observations we are making in an experiment. In general the mean specifies the probability distribution that is representative of the observable we are trying to measure through experimentation.


The variance characterizes the uncertainty associated with our experimental attempts to determine the "true" value. Although the mean and true value may not be equal, their difference should be less than the uncertainty given by the governing probability distribution.

Another Expression for Variance

Using the definition of variance (omitting the limit as n)

Evaluating the definition of variance
σ2(xiμ)2N=(x2i2xiμ+μ2)N=x2iN2μxiN+Nμ2N
=x2iN2μ2+μ2=x2iNμ2


(xiμ)2N=x2iNμ2

You can recast the above in terms of expectation value where

E[x]xiPx(x)

σ2=E[(xμ)2]=nx=0(xiμ)2P(xi)
=E[x2](E[x])2=nx=0x2iP(xi)(nx=0xiP(xi))2

Average for an unknown probability distribution (parent population)

If the "Parent Population" is not known, you are just given a list of numbers with no indication of the probability distribution that they were drawn from, then the average and variance may be calculate as shown below.

Arithmetic Mean and variance

If n observables are mode in an experiment then the arithmetic mean of those observables is defined as

ˉx=i=Ni=1xiN


The "unbiased" variance of the above sample is defined as

s2=i=Ni=1(xiˉx)2N1
If you were told that the average is ˉx then you can calculate the

"true" variance of the above sample as

σ2=i=Ni=1(xiˉx)2N = RMS Error= Root Mean Squared Error
Note
RMS = Root Mean Square = nix2iN =

Statistical Variance decreases with N

The repetition of an experiment can decrease the STATISTICAL error of the experiment

Consider the following:

The average value of the mean of a sample of n observations drawn from the parent population is the same as the average value of each observation. (The average of the averages is the same as one of the averages)

ˉx=xiN= sample mean
¯(ˉx)=ˉxiN=1NN¯xi=ˉx if all means are the same

This is the reason why the sample mean is a measure of the population average ( ˉxμ)

Now consider the variance of the average of the averages (this is not the variance of the individual measurements but the variance of their means)

σ2ˉx=(ˉx¯(ˉx))2N=¯xi2N(¯(ˉx))2
=¯xi2N(ˉx)2
=(xiN)2N(ˉx)2
=1N2(xi)2N(ˉx)2
=1N2(x2i+ijxixj)N(ˉx)2
=1N2[(x2i)N+(ijxixj)N](ˉx)2


If the measurements are all independent
Then ijxixjN=xiNxjN : if xi is independent of xj (ij)
=(xiN)2=ˉx2

example:

(x_1x_2 + x_1x_3 + x_2x_1+x_2x_3+x_3x_1+x_3x_2+ ...) = (x_1+x_2+x_3)
The above part of the proof needs work
σ2ˉx=1N2[(x2i)N+ijˉx2](ˉx)2

I use the expression σ2=E[x2](E[x])2 again, except forxi and not ˉx and turn it around so

(x2i)N=σ2+(xiN)2

Now I have

σ2ˉx=1N2[(σ2+(xiN)2)+ijˉx2](ˉx)2
=1N2[Nσ2+N(xiN)2+ijˉx2](ˉx)2
=1N2[Nσ2+N(xiN)2+N(N1)ˉx2](ˉx)2 Number of cross terms is N*(N-1)
=1N2[Nσ2+N(xiN)2+(N2N)(xiN)2](ˉx)2 Number of cross terms is N*(N-1)
=[σ2N+(xiN)2](ˉx)2
=[σ2N+(ˉx)2](ˉx)2
=σ2N


The above is the essence of counting statistics.

It says that the STATISTICAL error in an experiment decreases as a function of 1N

Biased and Unbiased variance

Where does this idea of an unbiased variance come from?


Using the same procedure as the previous section let's look at the average variance of the variances.

A sample variance of n measurements of xi is

σ2n=(xiˉx)2n=E[x2](E[x])2=x2in(ˉx)2


To determine the "true" variance consider taking average of several sample variances (this is the same argument used above which let to ¯(ˉx)=ˉx )

j[σ2n]jN=j[ix2in(ˉx)2]jN
=1ni(jx2jN)ij(ˉx)2N
=1ni(jx2jN)i[(jˉxN)2+σ2ˉx] : as shown previously E[ˉx2]=(E[ˉx])2+σ2ˉx
=1ni([(jxjN)2+σ2])i[(jxjN)2+σ2n] : also shown previously¯(ˉx)=ˉx the universe average is the same as the sample average
=1n(n[(jxjN)2+nσ2])i[(jxjN)2+σ2n]
=σ2σ2n
=n1nσ2
σ2=nn1σ2iN


Here

σ2= the sample variance
σ2iN= an average of all possible sample variance which should be equivalent to the "true" population variance.
σ2iNxiˉxn : if all the variances are the same this would be equivalent
σ2=nn1(xiˉx)n
=(xiˉx)n1= unbiased sample variance

Probability Distributions

Mean(Expectation value) and variance

Mean of Discrete Probability Distribution

In the case that you know the probability distribution you can calculate the mean(μ) or expectation value E(x) and standard deviation as

For a Discrete probability distribution

μ=E[x]=limNni=1xiP(xi)N

where

N= number of observations

n= number of different possible observable variables

xi= ith observable quantity

P(xi)= probability of observing xi = Probability Mass Distribution for a discrete probability distribution

Mean of a continuous probability distibution

The average (mean) of a sample drawn from any probability distribution is defined in terms of the expectation value E(x) such that

The expectation value for a continuous probability distribution is calculated as

μ=E(x)=xP(x)dx

Variance

Variance of a discrete PDF

σ2=ni=1[(xiμ)2P(xi)]

Variance of a Continuous PDF

σ2=[(xμ)2P(x)]dx

Expectation of Arbitrary function

If f(x) is an arbitrary function of a variable x governed by a probability distribution P(x)

then the expectation value of f(x) is

E[f(x)]=Ni=1f(xi)P(xi)

or if a continuous distribtion

E[f(x)]=f(x)P(x)dx

Uniform

The Uniform probability distribution function is a continuous probability function over a specified interval in which any value within the interval has the same probability of occurring.

Mathematically the uniform distribution over an interval from a to b is given by


PU(x)={1bax>a and x<b0x>b or x<a

Mean of Uniform PDF

μ=xPU(x)dx=baxbadx=x22(ba)|ba=12b2a2ba=12(b+a)

Variance of Uniform PDF

σ2=(xμ)2PU(x)dx=ba(xb+a2)2badx=(xb+a2)33(ba)|ba
=13(ba)[(bb+a2)3(ab+a2)3]
=13(ba)[(ba2)3(ab2)3]
=124(ba)[(ba)3(1)3(ba)3]
=112(ba)2


Now use ROOT to generate uniform distributions. http://wiki.iac.isu.edu/index.php/TF_ErrAna_InClassLab#Day_3

Binomial Distribution

Binomial random variable describes experiments in which the outcome has only 2 possibilities. The two possible outcomes can be labeled as "success" or "failure". The probabilities may be defined as

p
the probability of a success

and

q
the probability of a failure.


If we let X represent the number of successes after repeating the experiment n times

Experiments with n=1 are also known as Bernoulli trails.

Then X is the Binomial random variable with parameters n and p.

The number of ways in which the x successful outcomes can be organized in n repeated trials is

n![(nx)!x!] where the ! denotes a factorial such that 5!=5×4×3×2×1.

The expression is known as the binomial coefficient and is represented as

(nx)=n!x!(nx)!


The probability of any one ordering of the success and failures is given by

P(experimental ordering)=pxqnx


This means the probability of getting exactly k successes after n trials is

PB(x)=(nx)pxqnx

Mean

It can be shown that the Expectation Value of the distribution is

μ=np


μ=nx=0xPB(x)=nx=0xn!x!(nx)!pxqnx
=nx=1n!(x1)!(nx)!pxqnx :summation starts from x=1 and not x=0 now
=npnx=1(n1)!(x1)!(nx)!px1qnx :factor out np : replace n-1 with m everywhere and it looks like binomial distribution
=npn1y=0(n1)!(y)!(ny1)!pyqny1 :change summation index so y=x-1, now n become n-1
=npn1y=0(n1)!(y)!(n1y)!pyqn1y :
=np(q+p)n1 :definition of binomial expansion
=np1n1 :q+p =1
=np


variance

σ2=npq
Remember
(xiμ)2N=(x2i2xiμ+μ2)N=x2iN2μxiN+Nμ2N
=x2iN2μ2+μ2=x2iNμ2


(xiμ)2N=x2iNμ2
σ2=E[(xμ)2]=nx=0(xiμ)2PB(xi)
=E[x2](E[x])2=nx=0x2iPB(xi)(nx=0xiPB(xi))2


To calculate the variance of the Binomial distribution I will just calculate E[x2] and then subtract off (E[x])2.

E[x2]=nx=0x2PB(x)
=nx=1x2PB(x) : x=0 term is zero so no contribution
=nx=1x2n!x!(nx)!pxqnx
=npnx=1x(n1)!(x1)!(nx)!px1qnx

Let m=n-1 and y=x-1

=npny=0(y+1)m!(y)!(m1y+1)!pyqm1y+1
=npny=0(y+1)P(y)
=np(ny=0yP(y)+ny=0(1)P(y))
=np(mp+1)
=np((n1)p+1)


σ2=E[x2](E[x])2=np((n1)p+1)(np)2=np(1p)=npq

Examples

The number of times a coin toss is heads.

The probability of a coin landing with the head of the coin facing up is

P=number of desired outcomesnumber of possible outcomes=12 = Uniform distribution with a=0 (tails) b=1 (heads).

Suppose you toss a coin 4 times. Here are the possible outcomes


order Number Trial # # of Heads
1 2 3 4
1 t t t t 0
2 h t t t 1
3 t h t t 1
4 t t h t 1
5 t t t h 1
6 h h t t 2
7 h t h t 2
8 h t t h 2
9 t h h t 2
10 t h t h 2
11 t t h h 2
12 t h h h 3
13 h t h h 3
14 h h t h 3
15 h h h t 3
16 h h h h 4


The probability of order #1 happening is

P( order #1) = (12)0(12)4=116

P( order #2) = (12)1(12)3=116

The probability of observing the coin land on heads 3 times out of 4 trials is.

P(x=3)=416=14=(nx)pxqnx=4![(43)!3!](12)3(12)43=241×6116=14

A 6 sided die

A die is a 6 sided cube with dots on each side. Each side has a unique number of dots with at most 6 dots on any one side.

P=1/6 = probability of landing on any side of the cube.

Expectation value :

The expected (average) value for rolling a single die.
E(Roll With 6 Sided Die)=ixiP(xi)=1(16)+2(16)+3(16)+4(16)+5(16)+6(16)=1+2+3+4+5+66=3.5

The variance:

E(Roll With 6 Sided Die)=i(xiμ)2P(xi)
=(13.5)2(16)+(23.5)2(16)+(33.5)2(16)+(43.5)2(16)+(53.5)2(16)+(63.5)2(16)=2.92
=i(xi)2P(xi)μ2=[1(16)+4(16)+9(16)+16(16)+25(16)+36(16)](3.5)3=2.92


If we roll the die 10 times what is the probability that X dice will show a 6?

A success will be that the die landed with 6 dots face up.

So the probability of this is 1/6 (p=1/6) , we toss it 10 times (n=10) so the binomial distribution function for a success/fail experiment says

PB(x)=(nx)pxqnx=10![(10x)!x!](16)x(56)10x

So the probability the die will have 6 dots face up in 4/10 rolls is

PB(x=4)=10![(104)!4!](16)4(56)104

=10![(6)!4!](16)4(56)6=210×56610=0.054

Mean = np =μ=10/6=1.67 Variance = σ2=10(1/6)(5/6)=1.38

Poisson Distribution

The Poisson distribution is an approximation to the binomial distribution in the event that the probability of a success is quite small (p1). As the number of repeated observations (n) gets large, the binomial distribution becomes more difficult to evaluate because of the leading term

n![(nx)!x!]


The poisson distribution overcomes this problem by defining the probability in terms of the average μ.

PP(x)=μxeμx!


Poisson as approximation to Binomial

To drive home the idea that the Poisson distribution approximates a Binomial distribution at small p and large n consider the following derivation

The Binomial Probability Distriubtions is


PB(x)=n!x!(nx)!pxqnx

The term

n!(nx)!=(nx)!(nx+1)(nx+2)(n1)(n)(nx)!
=n(n1)(n2)(nx+2)(nx+1)
IFF xn we have x terms above
then n!(nx)!=nx
example:100!(1001)!=99!×10099!=1001

This leave us with

P(x)=nxx!pxqnx=(np)xx!(1p)nx
=(μ)xx!(1p)n(1p)x
(1p)x=1(1p)x=1+px=1:p1
P(x)=(μ)xx!(1p)n


(1p)n=[(1p)1/p]μ


limp0[(1p)1/p]μ=(1e)μ=eμ
For xn
limp0PB(x,n,p)=PP(x,μ)

Derivation of Poisson Distribution

The mean free path of a particle traversing a volume of material is a common problem in nuclear and particle physics. If you want to shield your apparatus or yourself from radiation you want to know how far the radiation travels through material.

The mean free path is the average distance a particle travels through a material before interacting with the material.

If we let λ represent the mean free path
Then the probability of having an interaction after a distance x is
xλ

as a result

1xλ=P(0,x,λ) = probability of getting no events after a length dx

When we consider xλ1 ( we are looking for small distances such that the probability of no interactions is high)


P(0,x,λ)=exλ1xλ

Now we wish to find the probability of finding N events over a distance x given the mean free path.

This is calculated as a joint probability. If it were the case that we wanted to know the probability of only one interaction over a distance L. Then we would want to multiply the probability that an interaction happened after a distance dx by the probability that no more interactions happen by the time the particle reaches the distance L.

For the case of N interactions, we have a series of N interactions happening over N intervals of dx with the probability dx/λ


P(N,x,λ) = probability of finding N events within the length x
=dx1λdx2λdx3λdxNλexλ


The above expression represents the probability for a particular sequence of events in which an interaction occurs after a distance dx1 then a interaction after dx2 ,

So in essence the above expression is a "probability element" where another probability element may be


P(N,x,λ)=dx2λdx1λdx3λdxNλexλ

where the first interaction occurs after the distance x2.

=ΠNi=1[dxiλ]exλ


So we can write a differential probability element which we need to add up as

dNP(N,x,λ)=1N!ΠNi=1[dxiλ]exλ


The N! accounts for the degeneracy in which for every N! permutations there is really only one new combination. ie we are double counting when we integrate.


Using the integral formula

ΠNi=1[x0dxiλ]=[xλ]N


we end up with

P(N,x,λ)=[xλ]NN!exλ

Mean of Poisson Dist

μ=i=1iP(i,x,λ)
=i=1i[xλ]ii!exλ=xλi=1[xλ](i1)(i1)!exλ=xλ


PP(x,μ)=μxeμx!

Variance of Poisson Dist

For Homework you will show, in a manner similar to the above mean calculation, that the variance of the Poisson distribution is

σ2=μ

Gaussian

The Gaussian (Normal) distribution is an approximation of the Binomial distribution for the case of a large number of possible different observations. Poisson approximated the binomial distribution for the case when p<<1 ( the average number of successes is a lot smaller than the number of trials (μ=np) ).

The Gaussian distribution is accepted as one of the most likely distributions to describe measurements.

A Gaussian distribution which is normalized such that its integral is unity is refered to as the Normal distribution. You could mathematically construct a Gaussian distribution which is not normalized to unity (this is often done when fitting experimental data).

PG(x,μ,σ)=1σ2πe12(xμσ)2 = probability of observing x from a Gaussian parent distribution with a mean μ and standard deviation σ.

Half-Width Γ (a.k.a. Full Width as Half Max)

The half width Γ is used to describe the range of x through which the distributions amplitude decreases to half of its maximum value.

ie
PG(μ±Γ2,μ,σ)=PG(μ,μ,σ)2
Side note
the point of steepest descent is located at x±σ such that
PG(μ±σ,μ,σ)=e1/2PG(μ,μ,σ)

Probable Error (P.E.)

The probable error is the range of x in which half of the observations (values of x) are expected to fall.

x=μ±P.E.

Binomial with Large N becomes Gaussian

Consider the binomial distribution in which a fair coin is tossed a large number of times (N is very large and an EVEN number N=2n)

What is the probability you get exactly 12Ns heads and 12N+s tails where s is an integer?

The Binomial Probability distribution is given as

PB(x)=(Nx)pxqNx=N!x!(Nx)!pxqNx

p = probability of success= 1/2

q= 1-p = 1/2

N = number of trials =2n

x= number of successes=n-s


PB(ns)=2n!(ns)!(2nn+s)!pnsq2nn+s
=2n!(ns)!(n+s)!pnsqn+s
=2n!(ns)!(n+s)!(12)ns(12)n+s
=2n!(ns)!(n+s)!(12)2n


Now let's cast this probability with respect to the probability that we get an even number of heads and tails by defining the following ratio R such that

RPB(ns)PB(n)
PB(x=n)=N!n!(Nn)!pnqNn=(2n)!n!(n)!pnqn=(2n)!(n)!(n)!(12)2n
R=2n!(ns)!(n+s)!(12)2n(2n)!(n)!(n)!(12)2n=n!n!(ns)!(n+s)!

Take the natural logarithm of both sides

ln(R)=ln(n!n!(ns)!(n+s)!)=ln(n!)+ln(n!)ln[(ns)!]ln[(n+s)!]=2ln(n!)ln[(ns)!]ln[(n+s)!]


Stirling's Approximation says

n!(2πn)1/2nnen
ln(n!)ln[(2πn)1/2nnen]=ln[(2π)1/2]+ln[n1/2]+ln[nn]+ln[en]
=ln[(2π)1/2]+ln[n1/2]+nln[n]+(n)
=ln[(2π)1/2]+ln[n1/2]+n(ln[n]1)

similarly

ln[(ns)!]ln[(2π)1/2]+ln[(n1)1/2]+(ns)(ln[(ns)]1)
ln[(n+s)!]ln[(2π)1/2]+ln[(n+1)1/2]+(n+s)(ln[(n+s)]1)
ln(R)=2×(ln[(2π)1/2]+ln[n1/2]+n(ln[n]1))
(ln[(2π)1/2]+ln[(n1)1/2]+(ns)(ln[(ns)]1))
(ln[(2π)1/2]+ln[(n+1)1/2]+(n+s)(ln[(n+s)]1))
=2ln[n1/2]+2n(ln[n]1)ln[(n1)1/2](ns)(ln[(ns)]1)ln[(n+1)1/2](n+s)(ln[(n+s)]1)


ln[n1/2]=ln[(n1)1/2]=ln[(n+1)1/2] For Large n
ln(R)=2n(ln[n]1)(ns)(ln[(ns)]1)(n+s)(ln[(n+s)]1)
=2n(ln[n]1)(ns)(ln[n(1s/n)]1)(n+s)(ln[n(1+s/n)]1)
=2nln(n)2n(ns)[ln(n)+ln(1s/n)1](n+s)[ln(n)+ln(1+s/n)1]
=2n(ns)[ln(1s/n)1](n+s)[ln(1+s/n)1]
=(ns)[ln(1s/n)](n+s)[ln(1+s/n)]

If 1<s/n1

Then

ln(1+s/n)=s/ns22n2+s33n3

ln(R)=(ns)[s/ns22n2s33n3](n+s)[s/ns22n2+s33n3]
=s2n=2s2N

or

Re2s2/N

as a result

P(ns)=RPB(n)


PB(x=n)=(2n)!(n)!(n)!(12)2n=(((2π2n)1/2(2n)2ne2n)((2πn)1/2nnen)((2πn)1/2nnen)(12)2n
=(1πn)1/2=(2πN)1/2


P(ns)=(2πN)1/2e2s2/N


In binomial distributions

σ2=Npq=N4 for this problem

or

N=4σ2


P(ns)=(2π4σ2)1/2e2s2/N=1σ2πe2s24σ2=1σ2πe12(sσ)2 = probability of exactly (N2s) heads AND (N2+s) tails after flipping the coin N times (N is and even number and s is an integer).

If we let x=ns and realize that for a binomial distributions

μ=Np=N/2=n

Then

P(x)=1σ2πe12(nxσ)2=1σ2πe12(xμσ)2


So when N gets big the Gaussian distribution is a good approximation to the Binomianl

Gaussian approximation to Poisson when μ1

PP(r)=μreμr! = Poisson probability distribution

substitute

xrμ


PP(x+μ)=μx+μeμ(x+μ)!=eμμμμx(μ+x)!=eμμμμx(μ)!(μ+1)(μ+x)
=eμμμμ![μ(μ+1)μ(μ+2)μ(μ+x)]


eμμμμ!=eμμμ2πμμμeμ=12πμ Stirling's Approximation when μ1


PP(x+μ)=12πμ[μ(μ+1)μ(μ+2)μ(μ+x)]


[μ(μ+1)μ(μ+2)μ(μ+x)]=11+1μ11+2μ11+xμ


ex/μ1+xμ : if x/μ1 Note:xrμ


PP(x+μ)=12πμ[11+1μ11+2μ11+xμ]=12πμ[e1/μ×e2/μex/μ]=12πμe1[1μ+2μxμ]
=12πμe1μ[x1i]

another mathematical identity

xi=1i=x2(1+x)


PP(x+μ)=12πμe1μ[x2(1+x)]

ifx1 then

x2(1+x)x22
PP(x+μ)=12πμe1μ[x22]=12πμex22μ

In the Poisson distribution

σ2=μ

replacing dummy variable x with r - μ

PP(r)=12πσ2e(rμ)22σ2=1σ2πe12(rμσ)2 = Gaussian distribution when μ1

Integral Probability (Cumulative Distribution Function)

The Poisson and Binomial distributions are discrete probability distributions (integers).

The Gaussian distribution is our first continuous distribution as the variables are real numbers. It is not very meaningful to speak of the probability that the variate (x) assumes a specific value.

One could consider defining a probability element AG which is really an integral over a finite region Δx such that

AG(Δx,μ,σ)=1σ2πμ+ΔxμΔxe12(xμσ)2dx

The advantage of this definition becomes apparent when you are interesting in quantifying the probability that a measurement would fall outside a range Δx.


PG(xΔx>x>x+Δx)=1AG(Δx,μ,σ)

The Cumulative Distribution Function (CDF), however, is defined in terms of the integral from the variates min value

CDFxxminPG(x,μ,σ)=xPG(x,μ,σ)=PG(Xx)= Probability that you measure a value less than or equal to x
discrete CDF example

The probability that a student fails this class is 7.3%.

What is the probability more than 5 student will fail in a class of 32 students?

Answ: PB(x5)=32x=5PB(x)=CDF(x5)=14x=0PB(x)=1CDF(x<5)

=1PB(x=0)PB(x=1)PB(x=2)PB(x=3)PB(x=4)
=10.0880.2230.2720.2140.122=0.92PB(x5)=0.08= 8%


There is an 8% probability that 5 or more student will fail the class

2 SD rule of thumb for Gaussian PDF

In the above example you calculated the probability that more than 5 student will fail a class. You can extend this principle to calculate the probability of taking a measurement which exceeds the expected mean value.

One of the more common consistency checks you can make on a sample data set which you expect to be from a Gaussian distribution is to ask how many data points appear more than 2 S.D. (σ) from the mean value.


The CDF for this is

PG(Xμ2σ,μ,σ)=μ2σPG(x,μ,σ)dx
=1σ2πμ2σe12(xμσ)2dx


Let

z=x2σ
dz=dxσ
PG(Xμ2σ,μ,σ)=12πμ2σez22dz


The above integral can only be done numerically by expanding the exponential in a power series

ex=n=0xnn!
ex=1x+x22!x33!
ez2/2=1z22+z48z648
PG(Xμ2σ,μ,σ)=12πμ2σ(1z22+z48z648)dz
=12π(zz36+z540z748×7)|μ2σ
=1πj=0(1)j(x2)2j+1j!(2j+1)|x=μ2σ

No analytical for the probability but one which you can compute.

Below is a table representing the cumulative probability PG(x<μδ and x>μ+δ,μ,σ) for events to occur outside and interval of ±δ in a Gaussian distribution


PG(x<μδ and x>μ+δ,μ,σ) δ
3.2×101 1σ
4.4×102 2σ
2.7×103 3σ
6.3×105 4σ


TF Error CDF Gauss.png

Cauchy/Lorentzian/Breit-Wigner Distribution

In Mathematics, the Cauchy distribution is written as

PCL(x,x0,Γ)=1πΓ/2(xx0)2+(Γ/2)2 = Cauchy-Lorentian Distribution
Note; The probability does not fall as rapidly to zero as the Gaussian. As a result, the Gaussian's central peak contributes more to the area than the Lorentzian's.

This distribution happens to be a solution to physics problems involving forced resonances (spring systems driven by a source, or a nuclear interaction which induces a metastable state).

PBW=σ(E)=12πΓ(EE0)2+(Γ/2)2 = Breit-Wigner distribution
E0= mass resonance
Γ=FWHM
ΔEΔt=Γτ=h2π = uncertainty principle
τ=lifetime of resonance/intermediate state particle


A Beit-Wigner function fit to cross section measured as a function of energy will allow one to evaluate the rate increases that are produced when the probing energy excites a resonant state that has a mass E0 and lasts for the time τ derived from the Half Width Γ.

mean

Mean is not defined

Mode = Median = x0 or E0

Variance

The variance is also not defined but rather the distribution is parameterized in terms of the Half Width Γ


Let

z=xμΓ/2

Then

σ2=Γ24πz21+z2dz


The above integral does not converge for large deviations (xμ) . The width of the distribution is instead characterized by Γ = FWHM

Landau

PL(x)=12πic+icieslogs+xsds

where c is any positive real number.

To simplify computation it is more convenient to use the equivalent expression

PL(x)=1π0etlogtxtsin(πt)dt.


The above distribution was derived by Landau (L. Landau, "On the Energy Loss of Fast Particles by Ionization", J. Phys., vol 8 (1944), pg 201 ) to describe the energy loss by particles traveling through thin material ( materials with a thickness on the order of a few radiation lengths).

Bethe-Bloch derived an expression to determine the AVERAGE amount of energy lost by a particle traversing a given material (dEdx) assuming several collisions which span the physical limits of the interaction.

For the case a thin absorbers, the number of collisions is so small that the central limit theorem used to average over several collision doesn't apply and there is a finite possibility of observing large energy losses.

As a result one would expect a distribution which is Gaussian like but with a "tail" on the μ+σ side of the distribution.

Gamma

Pγ(x,k,θ)=xk1ex/θθkΓ(k) for x>0 and k,θ>0.

where

Γ(z)=0tz1etdt


The distribution is used for "waiting time" models. How long do you need to wait for a rain storm, how long do you need to wait to die,...

Climatologists use this for predicting how rain fluctuates from season to season.

If k= integer then the above distribution is a sum of k independent exponential distributions

Pγ(x,k,θ)=1ex/θk1j=01j!(xθ)j

Mean

μ=kθ

Variance

σ2=kθ2

Properties

limXPγ(x,k,θ)={k<10k>1
=1θifk=1

Beta

Pβ(x;α,β)=xα1(1x)β110uα1(1u)β1du
=Γ(α+β)Γ(α)Γ(β)xα1(1x)β1
=1B(α,β)xα1(1x)β1

Mean

μ=αα+β

Variance

σ2=αβ(α+β)2(α+β+1)

Exponential

The exponential distribution may be used to describe the processes that are in between Binomial and Poisson (exponential decay)

Pe(x,λ)={λeλxx00x<0


CDFe(x,λ)={λ1eλxx00x<0

Mean

μ=1λ

Variance

μ=1λ2

Skewness and Kurtosis

Distributions may also be characterized by how they look in terms of Skewness and Kurtosis

Skewness

Measures the symmetry of the distribution


Skewness = (xiˉx)3(N1)s3=3rd moment2nd moment

where

s2=(xiˉx)2N1
The higher the number the more asymmetric (or skewed) the distribution is. The closer to zero the more symmetric.


A negative skewness indicates a tail on the left side of the distribution. Positive skewness indicates a tail on the right.

Kurtosis

Measures the "pointyness" of the distribution

Kurtosis = (xiˉx)4(N1)s4

where

s2=(xiˉx)2N1



K=3 for Normal Distribution

In ROOT the Kurtosis entry in the statistics box is really the "excess kurtosis" which is the subtraction of the kurtosis by 3

Excess Kurtosis = (xiˉx)4(N1)s43

In this case a Positive excess Kurtosis will indicate a peak that is sharper than a gaussian while a negative value will indicate a peak that is flatter than a comparable Gaussian distribution.


ForeErrAna Gaus-Cauchy SkeKurt.gifForeErrAna Gaus-Landau SkeKurt.gifForeErrAna Gaus-gamma SkeKurt.gif

[1] Forest_Error_Analysis_for_the_Physical_Sciences