Difference between revisions of "Forest ErrAna StatDist"
| Line 534: | Line 534: | ||
| :<math>P(0,x, \lambda) = e^{\frac{-x}{\lambda}} \approx 1 - \frac{x}{\lambda}</math> | :<math>P(0,x, \lambda) = e^{\frac{-x}{\lambda}} \approx 1 - \frac{x}{\lambda}</math> | ||
| + | |||
| + | Now we wish to find the probability of finding <math>N</math> events over a distance <math>x</math> given the mean free path. | ||
| + | |||
| + | This is calculated as a joint probability.  If it were the case that we wanted to know the probability of only one interaction over a distance <math>L</math>.  Then we would want to multiply the probability that an interaction happened after a distance <math>dx</math> by the probability that no more interactions happen by the time the particle reaches the distance <math>L</math>.  | ||
| + | |||
| + | We need to consider the probability that there was an interaction after a distance dx and the probability that no more interactions happen for the involving the probability of the event happening within a distance <math>dx</math> times the probability of it not happening for the rest of the length.  | ||
Revision as of 18:07, 6 February 2010
Parent Distribution
Let represent our ith attempt to measurement the quantity
Due to the random errors present in any experiment we should not expect .
If we neglect systematic errors, then we should expect to, on average, follow some probability distribution around the correct value .
This probability distribution can be referred to as the "parent population".
Average and Variance
Average
The word "average" is used to describe a property of a "parent" probability distribution or a set of observations/measurements made in an experiment which gives an indication of a likely outcome of an experiment.
The symbol
is usually used to represent the "mean" of a known probability (parent) distribution (parent mean) while the "average" of a set of observations/measurements is denoted as
and is commonly referred to as the "sample" average or "sample mean".
Definition of the mean
Here the above average of a parent distribution is defined in terms of an infinite sum of observations (x_i) of an observable x divided by the number of observations.  
is a calculation of the mean using a finite number of observations
This definition uses the assumption that the result of an experiment, measuring a sample average of , asymptotically approaches the "true" average of the parent distribution  .
Variance
The word "variance" is used to describe a property of a probability distribution or a set of observations/measurements made in an experiment which gives an indication how much an observation will deviate from and average value.
A deviation of any measurement from a parent distribution with a mean can be defined as
the deviations should average to ZERO for an infinite number of observations by definition of the mean.
Definition of the average
But the AVERAGE DEVIATION  is given by an average of the magnitude of the deviations given by 
- = a measure of the dispersion of the expected observations about the mean
Taking the absolute value though is cumbersome when performing a statistical analysis so one may express this dispersion in terms of the variance
A typical variable used to denote the variance is
and is defined as
Standard Deviation
The standard deviation is defined as the square root of the variance
- S.D. =
The mean should be thought of as a parameter which characterizes the observations we are making in an experiment.  In general the mean specifies the probability distribution that is representative of the observable we are trying to measure through experimentation.
The variance characterizes the uncertainty associated with our experimental attempts to determine the "true" value.  Although the mean and true value may not be equal, their difference should be less than the uncertainty given by the governing probability distribution.
Another Expression for Variance
Using the definition of variance (omitting the limit as )
- Evaluating the definition of variance
Average for an unknown probability distribution (parent population)
If the "Parent Population" is not known, you are just given a list of numbers with no indication of the probability distribution that they were drawn from, then the average and variance may be calculate as shown below.
Arithmetic Mean and variance
If observables are mode in an experiment then the arithmetic mean of those observables is defined as
The "unbiased" variance of the above sample is defined as
- If you were told that the average is then you can calculate the
"true" variance of the above sample as
- = RMS Error= Root Mean Squared Error
- Note
- RMS = Root Mean Square = =
Statistical Variance decreases with N
The repetition of an experiment can decrease the STATISTICAL error of the experiment
Consider the following:
The average value of the mean of a sample of n observations drawn from the parent population is the same as the average value of each observation. (The average of the averages is the same as one of the averages)
- sample mean
- if all means are the same
This is the reason why the sample mean is a measure of the population average ( )
Now consider the variance of the average of the averages (this is not the variance of the individual measurements but the variance of their means)
- If the measurements are all independent
- Then : if is independent of
The above part of the proof needs work
I use the expression again, except for and not and turn it around so
Now I have
- Number of cross terms is N*(N-1)
The above is the essence of counting statistics.
It says that the STATISTICAL error in an experiment decreases as a function of
Biased and Unbiased variance
Where does this idea of an unbiased variance come from?
Using the same procedure as the previous section let's look at the average variance of the variances.
A sample variance of measurements of is
To determine the "true" variance consider taking average of several sample variances (this is the same argument used above which let to  )
- : as shown previously
- : also shown previously the universe average is the same as the sample average
Here
- the sample variance
- an average of all possible sample variance which should be equivalent to the "true" population variance.
- unbiased sample variance
Probability Distributions
Mean(Expectation value) and variance
Mean of Discrete Probability Distribution
In the case that you know the probability distribution you can calculate the mean or expectation value E(x) and standard deviation as
For a Discrete probability distribution
where
number of observations
number of different possible observable variables
ith observable quantity
probability of observing = Probability Mass Distribution for a discrete probability distribution
Mean of a continuous probability distibution
The average (mean) of a sample drawn from any probability distribution is defined in terms of the expectation value E(x) such that
The expectation value for a continuous probability distribution is calculated as
Variance
Variance of a discrete PDF
Variance of a Continuous PDF
Expectation of Arbitrary function
If is an arbitrary function of a variable governed by a probability distribution
then the expectation value of is
or if a continuous distribtion
Uniform
The Uniform probability distribution function is a continuous probability function over a specified interval in which any value within the interval has the same probability of occurring.
Mathematically the uniform distribution over an interval from a to b is given by
Mean of Uniform PDF
Variance of Uniform PDF
Now use ROOT to generate uniform distributions.
http://wiki.iac.isu.edu/index.php/TF_ErrAna_InClassLab#Day_3
Binomial Distribution
Binomial random variable describes experiments in which the outcome has only 2 possibilities. The two possible outcomes can be labeled as "success" or "failure". The probabilities may be defined as
- p
- the probability of a success
and
- q
- the probability of a failure.
If we let represent the number of successes after repeating the experiment times
Experiments with are also known as Bernoulli trails.
Then is the Binomial random variable with parameters and .
The number of ways in which the successful outcomes can be organized in repeated trials is
- where the denotes a factorial such that .
The expression is known as the binomial coefficient and is represented as
The probability of any one ordering of the success and failures is given by 
This means the probability of getting exactly k successes after n trials is 
Mean
It can be shown that the Expectation Value of the distribution is
- :summation starts from x=1 and not x=0 now
- :factor out : replace n-1 with m everywhere and it looks like binomial distribution
- :change summation index so y=x-1, now n become n-1
- :
- :definition of binomial expansion
- :q+p =1
variance
- Remember
To calculate the variance of the Binomial distribution I will just calculate  and then subtract off .
- : x=0 term is zero so no contribution
Let m=n-1 and y=x-1
Examples
The number of times a coin toss is heads.
The probability of a coin landing with the head of the coin facing up is
- = Uniform distribution with a=0 (tails) b=1 (heads).
Suppose you toss a coin 4 times. Here are the possible outcomes
| order Number | Trial # | # of Heads | |||
| 1 | 2 | 3 | 4 | ||
| 1 | t | t | t | t | 0 | 
| 2 | h | t | t | t | 1 | 
| 3 | t | h | t | t | 1 | 
| 4 | t | t | h | t | 1 | 
| 5 | t | t | t | h | 1 | 
| 6 | h | h | t | t | 2 | 
| 7 | h | t | h | t | 2 | 
| 8 | h | t | t | h | 2 | 
| 9 | t | h | h | t | 2 | 
| 10 | t | h | t | h | 2 | 
| 11 | t | t | h | h | 2 | 
| 12 | t | h | h | h | 3 | 
| 13 | h | t | h | h | 3 | 
| 14 | h | h | t | h | 3 | 
| 15 | h | h | h | t | 3 | 
| 16 | h | h | h | h | 4 | 
The probability of order #1 happening is
P( order #1) =
P( order #2) =
The probability of observing the coin land on heads 3 times out of 4 trials is.
A 6 sided die
A die is a 6 sided cube with dots on each side. Each side has a unique number of dots with at most 6 dots on any one side.
P=1/6 = probability of landing on any side of the cube.
Expectation value :
- The expected (average) value for rolling a single die.
The variance:
If we roll the die 10 times what is the probability that X dice will show a 6?
A success will be that the die landed with 6 dots face up.
So the probability of this is 1/6 (p=1/6) , we toss it 10 times (n=10) so the binomial distribution function for a success/fail experiment says
So the probability the die will have 6 dots face up in 4/10 rolls is
Mean = np = Variance =
Poisson Distribution
The Poisson distribution is an approximation to the binomial distribution in the event the the probability of a success is quite small . As the number of repeated observations (n) gets large, the binomial distribution becomes more difficult to evaluate because of the leading term
The poisson distribution overcomes this problem by defining the probability in terms of the average .
where
- = probability for the occurrence of an event per unit interval
Poisson as approximation to Binomial
To drive home the idea that the Poisson distribution approximates a Binomial distribution at small p and large n consider the following derivation
The Binomial Probability Distriubtions is
The term
- IFF we have x terms above
- then
- example:
This leave us with
- For
Derivation of Poisson Distribution
The mean free path of a particle traversing a volume of material is a common problem in nuclear and particle physics. If you want to shield your apparatus or yourself from radiation you want to know how far the radiation travels through material.
The mean free path is the average distance a particle travels through a material before interacting with the material.
- If we let represent the mean free path
- Then the probability of having an interaction after a distance x is
as a result
- = probability of getting no events after a length dx
When we consider ( we are looking for small distances such that the probability of no interactions is high)
Now we wish to find the probability of finding events over a distance given the mean free path.
This is calculated as a joint probability. If it were the case that we wanted to know the probability of only one interaction over a distance . Then we would want to multiply the probability that an interaction happened after a distance by the probability that no more interactions happen by the time the particle reaches the distance .
We need to consider the probability that there was an interaction after a distance dx and the probability that no more interactions happen for the involving the probability of the event happening within a distance times the probability of it not happening for the rest of the length.
1.) Assume that the average rate of an event is constant over a given time interval and that the events are randomly distributed over that time interval.
2.) The probability of NO events occurring over the time interval t is exponential such that
where \tau is a constant of proportionality associated with the mean time
the change in the probability as a function of time is given by
To Find the probability of actually observing a non zero value of 
x
events in the time interval t you would integrate the differential probability
Gaussian
Lorentzian
Gamma
Beta
Breit Wigner
Cauchy
Chi-squared
Exponential
Landau
Log Normal
t-Distribution
The Student's t-distribution is defined as
t-distribution is a "one tailed" test typically used for small sample sizes (N > 24) where the parent population mean  is known.
F-distribution
Skewness and Kutosis
Skewness
Skewness =
where
Measures the symmetry of the distribution
Kurtosis
Kurtosis =
where
Measures the "pointyness" of the distribution
K=3 for Normal Distribution