New IAC Wiki - User contributions [en]

TF ErrorAna StatInference

2014-03-21T19:19:00Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:18:47Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

//%This section may have an error. Matrix elements of be in position 22 should be \sigma^2??
: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:18:31Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

:%This section may have an error. Matrix elements of be in position 22 should be \sigma^2??
: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:18:12Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

%This section may have an error. Matrix elements of be in position 22 should be \sigma^2??
: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:17:16Z

Stocjas2: /* The Method of Determinants */

TF ErrorAna StatInference

2014-03-21T19:16:57Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:16:22Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| x_1</math>
test
or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:13:30Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:13:01Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
:$\begin{matrix}a&a\\a&a\end{matrix}$
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrorAna StatInference

2014-03-21T19:12:10Z

Stocjas2: /* The Method of Determinants */

Statistical Inference
=Frequentist -vs- Bayesian Inference=

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

== frequentist statistical inference==

:Statistical inference is made using a null-hypothesis test; that is, one that answers the question, Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?

The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
Thus, if <math>n_t</math> is the total number of trials and <math>n_x</math> is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

:<math>P(x) \approx \frac{n_x}{n_t}.</math>

== Bayesian inference.==

:Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability, conditional and marginal probability, marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:

:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.

Each term in Bayes' theorem has a conventional name:
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about ''B''.
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of ''B''.
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.

===Example===
Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

:<math> P(A) \equiv</math> probability that the student observed is a girl = 0.4
: <math>P(B) \equiv</math> probability that the student observed is wearing trousers = 60+20/100 = 0.8
: <math>P(B|A) \equiv</math> probability the student is wearing trousers given that the student is a girl
: <math>P(A|B) \equiv</math> probability the student is a girl given that the student is wearing trousers

:<math>P(B|A) =0.5</math>

:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>

=Method of Maximum Likelihood=

;The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observation is a maximum.

= Least Squares Fit to a Line=

== Applying the Method of Maximum Likelihood==

Our object is to find the best straight line fit for an expected linear relationship between dependent variate <math>(y)</math> and independent variate <math>(x)</math>.

If we let <math>y_0(x)</math> represent the "true" linear relationship between independent variate <math>x</math> and dependent variate <math>y</math> such that

:<math>y_o(x) = A + B x</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>A</math> and <math>B</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math>
: <math>= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}</math> = Max

The maximum probability will result in the best values for <math>A</math> and <math>B</math>

This means

: <math>\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2</math> = Min

The min for <math>\chi^2</math> occurs when the function is a minimum for both parameters A & B : ie

:<math>\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0</math>

;If <math>\sigma_i = \sigma</math> : All variances are the same (weighted fits don't make this assumption)

Then

:<math>\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0</math>

:<math>\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0</math>

or

:<math>\sum \left ( y_i - A - B x_i \right)=0</math>
:<math>\sum x_i \left( y_i - A - B x_i \right)=0</math>

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

: <math>\sum y_i = \sum A + B \sum x_i</math>
: <math>\sum x_i y_i = A \sum x_i + B \sum x_i^2</math>

:<math>\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\
\sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)</math>

===The Method of Determinants ===
for the matrix problem:
:<math>\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)</math>

the above can be written as

:<math>y_1 = a_{11} x_1 + a_{12} x_2</math>
:<math>y_2 = a_{21} x_1 + a_{22} x_2</math>

solving for <math>x_1</math> assuming <math>y_1</math> is known

:<math>a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)</math>
:<math>-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)</math>

:<math>\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1</math>
test
: <math>\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| x_1</math>

or
: <math>x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math> similarly <math>x_2 = \frac{\left| \begin{array}{cc} a_{11} & y_1 \\ a_{21} & y_2 \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }</math>

Solutions exist as long as

:<math>\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| \ne 0</math>

Apply the method of determinant for the maximum likelihood problem above

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
: <math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

If the uncertainty in all the measurements is not the same then we need to insert <math>\sigma_i</math> back into the system of equations.

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

== Uncertainty in the Linear Fit parameters==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

By definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}</math> : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.

The least square fit ( assuming equal <math>\sigma</math>) has the following solution for the parameters A & B as

: <math>A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>

=== uncertainty in A===

:<math>\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math>
:<math> = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}</math> only the <math>y_j</math> term survives
:<math> = D \left ( \sum x_i^2 - x_j\sum x_i \right)</math>

Let
:<math>D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }</math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>
: <math>= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2</math> : Assume <math>\sigma_i = \sigma</math>
:<math> = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ]</math>

: <math> \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2</math> Both sums are over the number of observations <math>N</math>

:<math> = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ]</math>
:<math> = \sigma^2 D^2\sum x_i^2 \frac{1}{D}</math>
:<math> \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>
:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

If we redefine our origin in the linear plot so the line is centered a x=0 then

:<math>\sum{x_i} = 0</math>

:<math>\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}</math>

or

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}</math>

;Note
: The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y

=== uncertainty in B===
:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}
</math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

:<math>\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )</math>

: <math>= D \left ( N x_j - \sum x_i \right) </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( N x_j - \sum x_i \right)^2 \right ]</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N x_j - \sum x_i \right)^2 \right ]</math> assuming <math>\sigma_j = \sigma</math>
:<math>= \sigma^2 D^2 \sum_{j=1}^N \left [ \left ( N^2 x_j^2 - 2N x_j \sum x_i + \sum x_i^2\right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum_{j=1}^Nx_j^2 - 2N \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \sum_{j=1}^N \right) \right ]</math>
:<math>= \sigma^2 D^2 \left [ \left ( N^2 \sum x_i^2 - 2N \sum x_i \sum_{j=1}^N x_j + N \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - 2 \sum x_i \sum_{j=1}^N x_j + \sum x_i^2 \right) \right ]</math>
:<math>= N \sigma^2 D^2 \left [ \left ( N \sum x_i^2 - \sum x_i^2 \right) \right ]</math>
:<math> = N D^2 \sigma^2 \frac{1}{D} = ND \sigma^2</math>

:<math> \sigma_B^2= \frac{N \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}} {N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

== Linear Fit with error==

From above we know that if each independent measurement has a different error <math>\sigma_i</math> then the fit parameters are given by

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

=== Weighted Error in A===

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ]</math>

: <math>A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} </math>

Let
:<math>D = \frac{1}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right| }=
\frac{1}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

:<math>\frac{\partial A}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [ \sum\frac{ y_i}{\sigma_i^2} \sum\frac{ x_i^2}{\sigma_i^2}- \sum\frac{ x_i}{\sigma_i^2} \sum\frac{ x_i y_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ] = \sum_{j=1}^N \left [ \sigma_j^2 D^2 \left ( \frac{ 1}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}- \frac{ x_j}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \sigma_j^2 \left [ \frac{ 1}{\sigma_j^4} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \frac{ x_j}{\sigma_j^4} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \frac{ x_j^2}{\sigma_j^4} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i^2}{\sigma_i^2}\sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2} \right) ^2 \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - 2 \sum_{j=1}^N \frac{ x_j}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2} + \sum_{j=1}^N \frac{ x_j^2}{\sigma_j^2} \left (\sum\frac{ 1}{\sigma_i^2} \right) \right ]</math>
:<math>= D^2 \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) \left [ \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right)^2 - \left( \sum \frac{ x_j}{\sigma_j^2} \right)^2 \right ]</math>
: <math>= D \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) = \frac{ \left ( \sum\frac{ x_i^2}{\sigma_i^2}\right) }{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

;Compare with the unweighted error

:<math> \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}</math>

=== Weighted Error in B ===
:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>

: <math>B = \frac{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} \sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}</math>

:<math>\frac{\partial B}{\partial y_j} = D\frac{\partial }{\partial y_j} \left [\sum \frac{1}{\sigma_i^2}\sum \frac{x_i y_i}{\sigma_i^2} - \sum \frac{ y_i}{\sigma_i^2}\sum \frac{x_i}{\sigma_i^2} \right ]</math>
: <math>= \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} </math>

:<math>\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ]</math>
:<math>= \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{ D}{\sigma_j^2} \sum\frac{ x_i}{\sigma_i^2}- \frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \left [ \frac{ 1}{\sigma_j^2} \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\frac{1}{\sigma_j^2}\sum\frac{ x_i}{\sigma_i^2} + \left (\frac{1}{\sigma_j}\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 2 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} + \left (\sum\frac{ x_i}{\sigma_i^2} \right )^2\right ]</math>
:<math>= D^2 \sum_{j=1}^N \frac{ 1}{\sigma_j^2} \left [ \left (\sum\frac{ x_i}{\sigma_i^2}\right)^2 - 1 \left (\sum\frac{x_i}{\sigma_i^2}\right)\sum\frac{ x_i}{\sigma_i^2} \right ]</math>
:<math>= D \sum_{j=1}^N \frac{ 1}{\sigma_j^2}</math>
: <math>\sigma_B^2 = \frac{ \sum\frac{ 1}{\sigma_i^2}}{\sum \frac{1}{\sigma_i^2} \sum \frac{x_i^2}{\sigma_i^2} - \sum \frac{x_i}{\sigma_i^2} \sum \frac{x_i}{\sigma_i^2}}</math>

==Correlation Probability==

Once the Linear Fit has been performed, the next step will be to determine a probability that the Fit is actually describing the data.

The Correlation Probability (R) is one method used to try and determine this probability.

This method evaluates the "slope" parameter to determine if there is a correlation between the dependent and independent variables , x and y.

The liner fit above was done to minimize \chi^2 for the following model

:<math>y = A + Bx</math>

What if we turn this equation around such that

:<math>x = A^{\prime} + B^{\prime}y</math>

If there is no correlation between <math>x</math> and <math>y</math> then <math>B^{\prime} =0</math>

If there is complete correlation between <math>x</math> and <math>y</math> then

<math>\Rightarrow</math>
:<math>A = -\frac{A^{\prime}}{B^{\prime}}</math> and <math>B = \frac{1}{B^{\prime}}</math>

: and <math>BB^{\prime} = 1</math>

So one can define a metric BB^{\prime} which has the natural range between 0 and 1 such that

:<math>R \equiv \sqrt{B B^{\prime}}</math>

since

:<math>B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i }</math>

and one can show that

:<math>B^{\prime} = \frac{\left| \begin{array}{cc} N & \sum x_i\\ \sum y_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum y_i\\ \sum y_i & \sum y_i^2 \end{array}\right|} = \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i }</math>

Thus

: <math>R = \sqrt{ \frac{N\sum x_i y_i - \sum y_i \sum x_i }{N\sum x_i^2 - \sum x_i \sum x_i } \frac{N\sum x_i y_i - \sum x_i \sum y_i }{N\sum y_i^2 - \sum y_i \sum y_i } }</math>
: <math>= \frac{N\sum x_i y_i - \sum y_i \sum x_i }{\sqrt{\left( N\sum x_i^2 - \sum x_i \sum x_i \right ) \left (N\sum y_i^2 - \sum y_i \sum y_i\right) } }</math>

;Note
:The correlation coefficient (R) CAN'T be used to indicate the degree of correlation. The probability distribution <math>R</math> can be derived from a 2-D gaussian but knowledge of the correlation coefficient of the parent population <math>(\rho)</math> is required to evaluate R of the sample distribution.

Instead one assumes a correlation of <math>\rho=0</math> in the parent distribution and then compares the sample value of <math>R</math> with what you would get if there were no correlation.

The smaller <math>R</math> is the more likely that the data are correlated and that the linear fit is correct.

:<math>P_R(R,\nu) = \frac{1}{\sqrt{\pi}} \frac{\Gamma\left ( \frac{\nu+1}{2}\right )}{\Gamma \left ( \frac{\nu}{2}\right)} \left( 1-R^2\right)^{\left( \frac{\nu-2}{2}\right)}</math>
= Probability that any random sample of UNCORRELATED data would yield the correlation coefficient <math>R</math>

where
:<math>\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt</math>
(ROOT::Math::tgamma(double x) )

:<math>\nu=N-2</math> = number of degrees of freedom = Number of data points - Number of parameters in fit function

Derived in "Pugh and Winslow, The Analysis of Physical Measurement, Addison-Wesley Publishing, 1966."

= Least Squares fit to a Polynomial=

Let's assume we wish to now fit a polynomial instead of a straight line to the data.

:<math>y(x) = \sum_{j=0}^{n} a_j x^{n}=\sum_{j=0}^{n} a_j f_j(x)</math>

:<math>f_j(x) =</math> a function which does not depend on <math>a_j</math>

Then the Probability of observing the value <math>y_i</math> with a standard deviation <math>\sigma_i</math> is given by

:<math>P_i(a_0,a_1, \cdots ,a_n) = \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2}</math>

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment <math>N</math> times then the probability of deducing the values <math>a_n</math> from the data can be expressed as the joint probability of finding <math>N</math> <math>y_i</math> values for each <math>x_i</math>

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right)^2 \right]}</math>

Once again the probability is maximized when the numerator of the exponential is a minimum

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i - \sum_{j=0}^{n} a_j f_j(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

The minimum in <math>\chi^2</math> is found by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \frac{\partial \left( - \sum_{j=0}^{n} a_j f_j(x_i) \right)}{\partial a_k}</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2 \sum_i^N \frac{1}{\sigma_i^2}\left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) \left ( - f_k(x_i) \right)</math>
: <math>= 2\sum_i^N \frac{1}{\sigma_i^2} \left ( - f_k(x_i) \right) \left ( y_i - \sum_{j=0}^{n} a_j f_j(x_i)\right ) =0</math>

: <math>\Rightarrow \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} \sum_{j=0}^{n} a_j f_j(x_i)= \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

You now have a system of <math>n</math> coupled equations for the parameters <math>a_j</math> with each equation summing over the <math>N</math> measurements.

The first equation <math>(f_1)</math> looks like this

:<math>\sum_i^N f_1(x_i) \frac{y_i}{\sigma_i^2} = \sum_i^N \frac{f_1(x_i)}{\sigma_i^2} \left ( a_1 f_1(x_i) + a_2 f_2(x_i) + \cdots a_n f_n(x_i)\right )</math>

You could use the method of determinants as we did to find the parameters <math>(a_n)</math> for a linear fit but it is more convenient to use matrices in a technique referred to as regression analysis

==Regression Analysis==

The parameters <math>a_j</math> in the previous section are linear parameters to a general function which may be a polynomial.

The system of equations is composed of <math>n</math> equations where the <math>k^{\mbox{th}}</math> equation is given as

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

may be represented in matrix form as

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>

where
:<math>\beta_k= \sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} </math>

: <math>\alpha_{kj} = \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

or in matrix form

:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters

: <math>\tilde{\alpha} = \left ( \begin{matrix} \alpha_{11} & \alpha_{12} & \cdots & \alpha_{1j} \\ \alpha_{21} & \alpha_{22}&\cdots&\alpha_{2j} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1} &\alpha_{k2} &\cdots &\alpha_{kj}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

; the objective is to find the parameters <math>a_j</math>
: To find <math>a_j</math> just invert the matrix

: <math>\tilde{\beta} = \tilde{a} \tilde{\alpha}</math>
: <math>\left ( \tilde{\beta} = \tilde{a} \tilde{\alpha} \right) \tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{\alpha}\tilde{\alpha}^{-1}</math>
: <math>\tilde{\beta}\tilde{\alpha}^{-1} = \tilde{a} \tilde{1}</math>

: <math> \Rightarrow \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

; Thus if you invert the matrix <math>\tilde{\alpha}</math> you find <math>\tilde{\alpha}^{-1}</math> and as a result the parameters <math>a_j</math>.

== Matrix inversion==

The first thing to note is that for the inverse of a matrix <math>(\tilde{A})</math> to exist its determinant can not be zero
: <math>\left |\tilde{A} \right | \ne 0</math>

The inverse of a matrix <math>(\tilde{A}^{-1})</math> is defined such that

:<math>\tilde{A} \tilde{A}^{-1} = \tilde{1}</math>

If we divide both sides by the matrix <math>\tilde {A}</math> then we have

:<math>\tilde{A}^{-1} = \frac{\tilde{1}}{\tilde{A} }</math>

The above ratio of the unity matrix to matrix <math>\tilde{A}</math> is always equal to <math>\tilde{A}^{-1}</math> as long as both the numerator and denominator are multiplied by the same constant factor.

If we do such operations we can transform the ratio such that the denominator has the unity matrix and then the numerator will have the inverse matrix.

This is the principle of Gauss-Jordan Elimination.

===Gauss-Jordan Elimination===
If Gauss–Jordan elimination is applied on a square matrix, it can be used to calculate the inverse matrix. This can be done by augmenting the square matrix with the identity matrix of the same dimensions, and through the following matrix operations:
:<math>\tilde{A} \tilde{1} \Rightarrow
\tilde{1} \tilde{A}^{-1} .
</math>

If the original square matrix, <math>A</math>, is given by the following expression:
:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0 \\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

Then, after augmenting by the identity, the following is obtained:
:<math> \tilde{A}\tilde{1} =
\begin{bmatrix}
2 & -1 & 0 & 1 & 0 & 0\\
-1 & 2 & -1 & 0 & 1 & 0\\
0 & -1 & 2 & 0 & 0 & 1
\end{bmatrix}.
</math>

By performing elementary row operations on the <math>\tilde{A}\tilde{1}</math> matrix until it reaches reduced row echelon form, the following is the final result:

:<math> \tilde{1}\tilde{A}^{-1} =
\begin{bmatrix}
1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2}\\
0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>

The matrix augmentation can now be undone, which gives the following:
:<math> \tilde{1} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}\qquad
\tilde{A}^{-1} =
\begin{bmatrix}
\frac{3}{4} & \frac{1}{2} & \frac{1}{4}\\
\frac{1}{2} & 1 & \frac{1}{2}\\
\frac{1}{4} & \frac{1}{2} & \frac{3}{4}
\end{bmatrix}.
</math>
or
:<math>
\tilde{A}^{-1} =\frac{1}{4}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}=\frac{1}{det(A)}
\begin{bmatrix}
3 & 2 & 1\\
2 & 4 & 2\\
1 & 2 & 3
\end{bmatrix}.
</math>
A matrix is non-singular (meaning that it has an inverse matrix) if and only if the identity matrix can be obtained using only elementary row operations.

== Error Matrix==

As always the uncertainty is determined by the Taylor expansion in quadrature such that

:<math>\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ]</math> = error in parameter P: here covariance has been assumed to be zero

Where the definition of variance
:<math>\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - \sum_{j=0}^{n} a_j f_j(x_i) \right)^2}{N -n}</math> : there are <math>n</math> parameters and <math>N</math> data points which translate to <math>(N-n)</math> degrees of freedom.

Applying this for the parameter <math>a_k</math> indicates that

:<math>\sigma_{a_k}^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial a_k}{\partial y_i}\right )^2\right ]</math>

;But what if there are covariances?
In that case the following general expression applies

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>

:<math>\frac{\partial a_k}{\partial y_i} = </math>?

: <math> \tilde{a} =\tilde{\beta}\tilde{\alpha}^{-1} </math>

where

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n)</math> = a row matrix of the parameters
:<math>\tilde{\beta}= ( \beta_1, \beta_2, \cdots , \beta_n) </math> = a row matrix of order <math>n</math>
: <math>\tilde{\alpha}^{-1} = \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math> = a <math>k \times j = n \times n</math> matrix

:<math>\tilde{a} =( a_1, a_2, \cdots , a_n) = ( \beta_1, \beta_2, \cdots , \beta_n) \left ( \begin{matrix} \alpha_{11}^{-1} & \alpha_{12}^{-1} & \cdots & \alpha_{1j}^{-1} \\ \alpha_{21}^{-1} & \alpha_{22}^{-1}&\cdots&\alpha_{2j}^{-1} \\ \vdots &\vdots &\ddots &\vdots \\ \alpha_{k1}^{-1} &\alpha_{k2}^{-1} &\cdots &\alpha_{kj}^{-1}\end{matrix} \right )</math>

:<math>\Rightarrow a_k = \sum_j^n \beta_j \alpha_{jk}^{-1}</math>

:<math>\frac{\partial a_k}{\partial y_i} = \frac{\partial }{\partial y_i} \sum_j^n \beta_j \alpha_{jk}^{-1}</math>
: <math>= \frac{\partial }{\partial y_i} \sum_j^n \left( \sum_i^N f_j(x_i) \frac{y_i}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math>
: <math>= \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}</math> :only one <math>y_i</math> in the sum over <math>N</math> survives the derivative

similarly

:<math>\frac{\partial a_l}{\partial y_i} = \sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{pl}^{-1}</math>

substituting

:<math>\sigma_{a_k,a_l}^2 = \sum_i^N \left [ \sigma_i^2 \frac{\partial a_k}{\partial y_i}\frac{\partial a_l}{\partial y_i}\right ]</math>
:<math>= \sum_i^N \left [ \sigma_i^2 \sum_j^n \left( f_j(x_i) \frac{1}{\sigma_i^2}\right) \alpha_{jk}^{-1}\sum_p^n \left( f_p(x_i) \frac{1}{\sigma_i^2}\right)\alpha_{pl}^{-1} \right ]</math>

A <math> \sigma_i^2</math> term appears on the top and bottom.
;Move the outer most sum to the inside

: <math>\sigma_{a_k,a_l}^2 = \sum_j^n \left( \alpha_{jk}^{-1}\sum_p^n \sum_i^N \frac{f_j(x_i) f_p(x_i)}{\sigma_i^2} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\sum_p^n \left( \alpha_{jp} \right)\alpha_{pl}^{-1}</math>
: <math>= \sum_j^n \alpha_{jk}^{-1}\tilde{1}_{jl}</math>

where

: <math>\tilde{1}_{jl}</math> = the <math>j,l</math> element of the unity matrix = 1

;Note
: <math>\alpha_{jk} = \alpha_{kj} </math> : the matrix is symmetric.

: <math> \sigma_{a_k,a_l}^2 = \sum_j^n \alpha_{kj}^{-1}\tilde{1}_{jl}= \alpha_{kl}^{-1}</math> = Covariance/Error matrix element

The inverse matrix <math>\alpha_{kl}^{-1}</math> tells you the variance and covariance for the calculation of the total error.

;Remember
:<math>Y = \sum_i^n a_i f_i(x)</math>
:<math>\sigma_{a_i}^2 = \alpha_{ii}^{-1}</math>= error in the parameters
: <math>s^2 = \sum_i^n \sum_j^n \left ( \frac{\partial Y}{\partial a_i}\frac{\partial Y}{\partial a_j}\right) \sigma_{ij}=\sum_i^n \sum_j^n \left ( f_i(x)f_j(x)\right) \sigma_{ij} </math> = error in the model's prediction
:<math>= \sum_i^n \sum_j^n x^{j+i} \sigma_{ij}</math> If Y is a power series in x <math>(f_i(x) = x^i)</math>

=Chi-Square Distribution=

The above tools allow you to perform a least squares fit to data using high order polynomials.

; The question though is how high in order should you go? (ie; when should you stop adding parameters to the fit?)

One argument is that you should stop increasing the number of parameters if they don't change much. The parameters, using the above techniques, are correlated such that when you add another order to the fit all the parameters have the potential to change in value. If their change is miniscule then you can ague that adding higher orders to the fit does not change the fit. There are techniu

A quantitative way to express the above uses the <math>\chi^2</math> value of the fit. The above technique seeks to minimize <math>\chi^2</math>. So if you add higher orders and more parameters but the <math>\chi^2</math> value does not change appreciably, you could argue that the fit is a good as you can make it with the given function.

==Derivation==

If you assume a series of measurements have a Gaussian parent distribution

Then

:<math>P(x,\bar{x},\sigma) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2} \left ( \frac{x - \bar{x}}{\sigma}\right)^2}</math> = probability of measuring the value <math>x</math> from a Gaussian distribution with s sample mean<math> \bar{x}</math> from a parent distribution of with <math>\sigma</math>

If you break the above probability up into intervals of <math>d\bar{x}</math> then

:<math>P(x,\bar{x},\sigma) d\bar{x}</math> = probability that <math>\bar{x}</math> lies within the interval <math>d\bar{x}</math>

If you make N measurements of two variates (x_i and y_i) which may be correlated using a function with n parameters.

== Chi-Square Cumulative Distribution==

The <math>\chi^2</math> probability distribution function is

:<math>P(x,\nu) = \frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)}</math>

where

:<math>x = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>
:<math> y(x_i)</math> = the assumed functional dependence of the data on <math>x</math>
:<math>\nu</math> = N - n -1 = degrees of freedom
:<math>N</math> = number of data points
:<math>n + 1</math> = number of parameters used in the fit (n coefficients + 1 constant term)
:<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The above tells you the probability of getting the value of <math>x</math> given the number of degrees of freedom <math>\nu</math> in your fit.

While it is useful to know what the probability is of getting a value of <math>\chi^2 = x</math> it is more useful to use the cumulative distribution function.

The probability that the value of <math>\chi^2</math> you received from your fit is as larger or larger than what you would get from a function described by the parent distribution is given

: <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty} P(x,\nu) dx</math>

A graph of <math>P(x,\nu)</math> shows that the mean value of this function is <math>\nu</math>.

:<math>\Rightarrow P(\chi^2 = \nu ,\nu)=\int_{\chi^2=\nu}^{\infty} P(x,\nu) dx = 0.5</math> = probability of getting the average value for <math>\chi^2</math> of <math>\nu</math> or larger is 0.5.

== Reduced Chi-square==

The reduced Chi-square <math>\chi^2_{\nu}</math> is defined as

:<math>\chi^2_{\nu} = \frac{\chi^2}{\nu}</math>

Since the mean of <math>P(x,\nu) = \nu</math>

then the mean of

:<math>P(\frac{x}{\nu},\nu) = 1</math>

A reduced chi-squared distribution <math>\chi^2_{\nu}</math> has a mean value of 1.

==p-Value==

For the above fits,

:<math>\chi^2 = \sum_i^N \frac{\left( y_i-y(x_i)\right)^2}{\sigma_i^2}</math>

The p-value is defined as the cumulative <math>\chi^2</math> probability distribution

: p-value= <math>P(\chi^2,\nu) = \int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

The p-value is the probability, under the assumption of a hypothesis H , of obtaining data at least as incompatible with H as the data actually observed.

:<math>\Rightarrow</math> small p-values are good.

;The p-value is NOT

# the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).) In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses.
# the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.) As the calculation of a p-value is based on the ''assumption'' that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," ''and'' be explicable by the null hypothesis with confidence equal to the p-value.
#the probability of falsely rejecting the null hypothesis.
#the probability that a replicating experiment would not yield the same conclusion.
#1 − (p-value) is ''not'' the probability of the alternative hypothesis being true.
#A determination of the significance level of the test. The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
#an indication of the size or importance of the observed effect.

In ROOT

double ROOT::Math::chisquared_cdf_c (double <math>\chi^2</math>, double <math>\nu</math>, double x0 = 0 )

:<math>=\int_{\chi^2}^{\infty}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx</math>

or

TMath::Prob(<math>\chi^2</math>,<math>\nu</math>)

You can turn things around the other way by defining P-value to be

: P-value= <math>P(\chi^2,\nu) = \int_{\infty}^{\chi^2}\frac{x^{\frac{\nu}{2}-1} e^{-\frac{x}{2}}}{2^{\nu/2} \Gamma(\nu/2)} dx = 1-(p-value)</math>

here P-value would be the probability, under the assumption of a hypothesis H , of obtaining data at least as '''compatible''' with H as the data actually observed.

or

The P-value is the probability of observing a sample statistic as extreme as the test statistic.

http://stattrek.com/chi-square-test/goodness-of-fit.aspx

=F-test=

The <math> \chi^2</math> test in the previous section measures both the difference between the data and the fit function as well as the difference between the fit function and the "parent" function. The "parent" function is the true functional dependence of the data.

The F-test can be used to determine the difference between the fit function and the parent function, to more directly test if you have come up with the correct fit function.

== F-distribution==

If, for example, you are comparing the Ratio of the <math>\chi^2</math> values from 2 different fits to the data then the function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

Then since

:<math>\chi^2 \equiv \sum_i^N \frac{\left (y_i-y(x_i)\right)^2}{\sigma_i^2}= \frac{s^2}{\sigma^2}</math> if <math>\sigma_i</math> = constant

and assuming <math>\sigma_1 = \sigma_2</math> (same data set)

one can argue that

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_1}{\nu_1}} = \frac{s_1^2}{\sigma_1^2}\frac{\sigma_2^2}{s_2^2} \frac{\nu_2}{\nu_1} = \frac{\nu_2}{\nu_1} \frac{s_1^2}{s_2^2}</math> = function that is independent of the error <math> \sigma</math> intrinsic to the data thus the function is only comparing the fit residuals<math> (s^2)</math>.

The Function

:<math>F = \frac{\frac{\chi^2_1}{\nu_1}}{\frac{\chi^2_2}{\nu_2}}</math>

can be shown to follow the following probability distribution.

:<math>P_F(f,\nu_1, \nu_2) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

which is available in ROOT as

ROOT::Math::fdistribution_pdf(double x, double n, double m, double x0 = 0 )

== The chi^2 difference test==

In a similar fashion as above, one can define another ratio which is based on the difference between the <math>\chi^2</math> value using 2 different .

:<math>F_{\chi} = \frac{\frac{\chi^2_1 - \chi^2_2}{\nu_1-\nu_2}}{\frac{\chi^2_1}{\nu_1}} = \frac{\frac{\chi^2(m) - \chi^2(n)}{n-m}}{\frac{\chi^2(m)}{(N-m-1)}}</math>

Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with <math>(\nu_2 − \nu_1, \nu_1)</math> degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F distribution for some desired false-rejection probability (e.g. 0.05).

If you want to determine if you need to continue adding parameters to the fit then you can consider an F-test.

:<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_1}{N-m-1}} = \frac{\Delta \chi^2}{\chi^2_{\nu}}</math>

The above statistic will again follow the F-distribution

:<math>P_F(f,\nu_1=1, \nu_2=N-m-1) = \frac{\Gamma[\frac{\nu_1 + \nu_2}{2}]}{\Gamma[\frac{\nu_1}{2}]\Gamma[\frac{\nu_2}{2}]} \left ( \frac{\nu_1}{\nu_2}\right )^{\frac{\nu_1}{2}} \frac{f^{\frac{\nu_1}{2}-1}}{(1+f \frac{\nu_1}{\nu_2})^{\frac{\nu_1+\nu_2}{2}}}</math>

;Note: Some will define the fraction with respect to the next order fit
::<math>F_{\chi} =\frac{\frac{\chi^2_1(m) - \chi^2_2(m+1)}{1}}{\frac{\chi^2_2}{N-m}} </math>

==The multiple correlation coefficient test==

While the <math>\chi^2</math> difference F-test above is useful to evaluate the impact of adding another fit parameter, you will also want to evaluate the "goodness" of the entire fit in a manner which can be related to a correlation coefficient R (in this case it is a multiple-correlation coefficient because the fit can go beyond linear).

I usually suggest that you do this after the <math>\chi^2</math> difference test unless you have a theoretical model which constrains the number of fit parameters.

=Least Squares fit to an Arbitrary Function=

The above Least Squares fit methods work well if your fit function has a linear dependence on the fit parameters.

:ie; <math>y(x) =\sum_i^n a_i x^i</math>

such fit functions give you a set of <math>n</math> equations that are linear in the parameters <math>a_i</math> when you are minimizing <math>\chi^2</math>. The <math>k^{\mbox{th}}</math> equation of this set of n equations is shown below.

: <math>\sum_i^N y_i \frac{ f_k(x_i)}{\sigma_i^2} = \sum_{j=0}^{n} a_j \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math>

If your fit equation looks like

:<math>y(x) =a_1 e^{\frac{1}{2} \left (\frac{x-a_2}{a_3}\right)^2}</math>

now your set of equations minimizing <math>\chi^2</math> are non-linear with respect to the parameters. You can't take separate the <math>a_j</math> term from the <math>f_k(x_i)</math> function and thereby solve the system by inverting your matrix
:<math> \sum_i^N \frac{ f_k(x_i)}{\sigma_i^2} f_j(x_i)</math> .

Because of this, a direct analytical solution is difficult (you need to find roots of coupled non-linear equations) and one resorts to approximation methods for the solution.
== Grid search==

The fundamentals of the grid search is to basically change all parameters over some natural range to generate a hypersurface in <math>\chi^2</math> and look for the smallest value of of <math>\chi^2</math> that appears on the hypersurface.

In Lab 15 you will generate this hypersurface for the simple least squares linear fit tot he Temp -vs- Voltage data in lab 14. Your surface might look like the following.

[[File:TF_ErrAna_Lab15.png | 200 px]]

As shown above, I have plotted the value of <math>\chi^2</math> as a function of the y-Intercept <math>(a_1)</math> and the slope <math>(a_2)</math>. The minimum in this hypersurface should coincide with the values of Y-intercept=<math>a_1=-1.01</math> and Slope=<math>a_2=0.0431</math> from the linear regression solution.

If we return to the beginning, the original problem is to use the max-likelihood principle on the probability function for finding the correct fit using the data.

: <math>P(a_0,a_1, \cdots ,a_n) = \Pi_i P_i(a_0,a_1, \cdots ,a_n) =\Pi_i \frac{1}{\sigma_i \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2} \propto e^{- \frac{1}{2}\left [ \sum_i^N \left ( \frac{y_i - y(x_i)}{\sigma_i}\right)^2 \right]}</math>

Let

:<math>\chi^2 = \sum_i^N \left ( \frac{y_i -y(x_i)}{\sigma_i}\right )^2</math>

where <math>N</math> = number of data points and <math>n</math> = order of polynomial used to fit the data.

To maximize the probability of finding the best fit we need to find by minimum in <math>\chi^2</math> by setting the partial derivate with respect to the fit parameters <math>\left (\frac{\partial \chi}{\partial a_k} \right)</math> to zero

: <math>\frac{\partial \chi^2}{\partial a_k} = \frac{\partial}{\partial a_k}\sum_i^N \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right )^2</math>
: <math>= \sum_i^N 2 \frac{1}{\sigma_i^2} \left ( y_i - y(x_i)\right ) \frac{\partial \left( - y(x_i) \right)}{\partial a_k} = 0</math>

Alternatively you could also express \chi^2 in terms of the probability distributions as

: <math>P(a_0,a_1, \cdots ,a_n) \propto e^{- \frac{1}{2}\chi^2}</math>

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

==Grid search method==

The grid search method relies on the independence of the fit parameters <math>a_i</math>. The search starts by selecting initial values for all parameters and then searching for a <math>\chi^2</math> minimum for one of the fit parameters. You then set the parameter to the determined value and repeat the procedure on the next parameter. You keep repeating until a stable value of <math>\chi^2</math> is found.

Obviously, this method strongly depends on the initial values selected for the parameters.

==Parameter Errors==

Returning back to the equation

:<math>\chi^2 = -2\ln\left [P(a_0,a_1, \cdots ,a_n) \right ] + 2\sum \ln(\sigma_i \sqrt{2 \pi})</math>

it may be observed that <math>\sigma_i</math> represents the error in the parameter <math>a_i</math> and that the error alters <math>\chi^2</math>.

Once a minimum is found, the parabolic nature of the <math>\chi^2</math> dependence on a single parameter is such that an increase in 1 standard deviation in the parameter results in an increase in <math>\chi^2</math> of 1.

We can turn this to determine the error in the fit parameter by determining how much the parameter must increase (or decrease) in order to increase <math>\chi^2</math> by 1.

==Gradient Search Method==

The Gradient search method improves on the Grid search method by attempting to search in a direction towards the minima as determined by simultaneous changes in all parameters. The change in each parameter may differ in magnitude and is adjustable for each parameter.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference Go Back] [[Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference]]

TF ErrAna Homework

2014-03-21T19:08:48Z

Stocjas2: /* 3 Eq. 3 unknowns */

=Errors=

== Give 5 examples of a different type of Systematic error.==

==Find 3 published examples of data which is wrongly represented. ==
Identify what is incorrect about it. What does it mean to be wrongly presented? A typical example is a political poll which does not identify the statistical uncertainty properly or at all.

==Create a Histogram using ROOT==

some commands that may interest you
<pre>

root [1] TH1F *Hist1=new TH1F("Hist1","Hist1",50,-0.5,49.5);
root [2] Hist1->Fill(10);
root [3] Hist1->Draw();
</pre>

You can use the above commands but you need to change the names and numbers above to receive credit. You must also add a title to the histogram which contains your full name. You will printout the histogram and hand it in with the above two problems.

;Notice how the square rectangle in the histogram is centered at 10!
;Notice that if you do the commands

<pre>
root [2] Hist1->Fill(10);
root [3] Hist1->Draw();
</pre>

the rectangle centered a 10 will reach the value of 2 on the vertical axis.

Two dice are rolled 20 times. Create a histogram to represent the 20 trials below

{| border="1" |cellpadding="20" cellspacing="0
|-
| Trial || Value
|-
|1 ||8
|-
|2 || 10
|-
|3 || 9
|-
|4 || 5
|-
|5 || 9
|-
|6 || 6
|-
|7 || 5
|-
|8 || 6
|-
|9 || 3
|-
|10 || 9
|-
|11 || 8
|-
|12 || 5
|-
|13 || 8
|-
|14 || 10
|-
|15 || 8
|-
|16 || 11
|-
|17 || 12
|-
|18 || 6
|-
|19 || 7
|-
|20 || 8

|}

=Mean and SD=
==Electron radius==
The probability that an electron is a distance <math>r</math> from the center of the hydrogen atom

:<math>P(r) = Cr^2 \exp^{-2 \frac{r}{R}}</math>

Doing the integrals by hand (no tables) ,

a.)Find the value of C

b.) Find the mean electron radius and standard deviation for an electron in a hydrogen atom

==Histograms by Hand==

Given the following test scores from 40 students.

{| border="1" |cellpadding="20" cellspacing="0
|-
| Trial || Value || Trial || Value|| Trial || Value|| Trial || Value
|-
|1 ||49 || 11 ||90 || 21 || 69 || 31 || 74
|-
|2 || 80 || 12 || 84 || 22 ||69 || 32 ||86
|-
|3 || 84 || 13 || 59 || 23 || 53 || 33 || 78
|-
|4 || 73 || 14 || 56 || 24 || 55 || 34 || 55
|-
|5 || 89 || 15 || 62 || 25 || 77 || 35 || 66
|-
|6 || 78 || 16 || 53 || 26 || 82 || 36 || 60
|-
|7 || 78 || 17 || 83 || 27 || 81 || 37 || 68
|-
|8 || 92 || 18 || 81 || 28 || 76 || 38 || 92
|-
|9 || 56 || 19 || 65 || 29 || 79 || 39 || 87
|-
|10 || 85 || 20 || 81 || 30 || 83 || 40 || 86
|}

a.) calculate the mean and standard deviation

b.) construct a histogram by hand which has 10 bins centered on 10,20,...

c.) Use ROOT to construct a histogram. Compare the mean and RMS from ROOT with your result in part (a) above. What is the difference between the RMS report in the ROOT histogram and the standard deviation you calculated in part (a)?

== Variance using Probability function==

Given that
: <math>\sigma^2 = \lim_{N \rightarrow \infty} \frac{1}{N}\sum_{j=1}^n \left [ \left (x_j - \mu \right)^2) \right ]</math>

Justify that

<math>\lim_{N \rightarrow \infty} \frac{1}{N}\sum_{j=1}^n \left [ \left (x_j - \mu \right)^2 \right ] = \lim_{N \rightarrow \infty} \left [ \frac{1}{N}\sum_{j=1}^n x_j^2\right ] - \mu^2</math>

;Note:The standard deviation (<math>\sigma</math>) is the root mean square (RMS) of the deviations.

RMS = <math>\sqrt{\frac{1}{N}\sum_i^N x_i^2}</math> so <math>\sigma = \mbox{RMS}(x_i -\mu)</math>

=Binomial Probability Distributions=
1.)Evaluate the following (at least one by hand)

a.) <math>{6\choose 3}</math>

b.) <math>{4\choose 2}</math>

c.) <math>{10\choose 3}</math>

d.) <math>{52\choose 4}</math>

2.) Plot the binomial distribution P(x) for n=6 and p=1/2 from x=0 to 6.

:<math>P(x) = {n\choose x}p^{x}q^{n-x} </math>

3.) Given the probability distribution below for the sum of the point on a pair of dice

:<math>P(x) =\left \{ {\frac{x-1}{36} \;\;\;\; 2 \le x \le 7 \atop \frac{13-x}{36} \;\;\; 7 < x \le 12} \right .</math>

a.) find the mean

b.) find the standard deviation <math>(\sigma)</math>

4.) Prove that <math>\sigma^2 = npq</math> for the Binomial distribution.

=Poisson Prob Dist=

==Variance==

Show that <math>\sigma^2 = \mu</math> for the Poisson Distribution starting with the definition of variance.

==Binomial/Poisson Statistic==
The probability that a student will fail this course is 7.3%.

a.) Calculate by hand (i.e. without a computer/calculator) the expected number of students that will fail this course if there are 32 enrolled?

b.) Calculate by hand the probability that 5 or more will fail in one semester.

== Deadtime==

In a counting experiment it is possible for a detector to be "too busy" recording the effects of a detected particle that it is unable to measure another particle traversing the detector during the short time interval. "Dead time" is a measure of the time interval over which your detector is unable to make a measurement because it is currently making a measurement.

Assume that particle hit your detector at a rate of <math>1 \times 10^6</math> particles/sec and that your detector has a deadtime of 200 ns <math>(200 \times 10^{-9} sec)</math>. The mean number of particles hitting the detector during this deadtime is <math>\mu = 0.2</math>. The detector efficiency is defined as

:<math>\epsilon = \frac{\mbox{average number of particles counted}}{\mbox{number of particles passing through the detector in 200 ns}}</math>

a.) Find the efficiency of the detector assuming the process follows the Poisson distribution.

b.) Graph the efficiency as a function of the incident particle flux for rates between 0 and <math>10 \times 10^6</math> particles/sec.

= Gaussian Prob Dist=
== Counting experiment variance==

a.)What is the standard deviation for a counting experiment with a mean <math>\mu</math> = 100.

b.)What is the standard deviation if the mean number of counts is increased by a factor of 4?

== Half Width -vs- variance==
Show that the full-width at half maximum <math>(\Gamma)</math> is related to the standard devision by <math>\Gamma = 2.3548 \sigma</math> for the Gaussian probability distribution. Begin with the definition that

:<math>P_G\left (\mu + \frac{\Gamma}{2} \right ) = \frac{P_G(\mu)}{2}</math>

== Gaussian Probability==
a.) Determine, by direct integration, the probability of observing a value beyond 1 standard deviation from the mean of a Gaussian distribution? ie: <math>P_G(X \le \mu - \sigma, \mu, \sigma)+ P_G(X \ge \mu + \sigma, \mu, \sigma)</math> =?

b.) Find the value of <math>A</math> such that <math>P_G(\mu + \sigma, \mu, \sigma) = A P_G(\mu,\mu,\sigma)</math>.

c.) repeat parts a.) and b.) above for <math>P_G(\mu + P.E., \mu, \sigma)</math> and <math>P_G(\mu + \Gamma/2, \mu, \sigma)</math>

<math>P_G(X =\mu + \sigma, \mu, \sigma)=0.24197 = 0.60653 P_G(X = \mu , \mu, \sigma)</math>

<math>P_G(X =\mu + PE, \mu, \sigma)=0.3178 = 0.7965 P_G(X = \mu , \mu, \sigma)</math>

<math>P_G(X =\mu - \Gamma/2, \mu, \sigma)=0.19947 = 0.5 P_G(X = \mu , \mu, \sigma)</math>

=Error Propagation=
==Lorentzian==

a.)What fraction of the area of a Lorentzian curve is enclosed within the interval <math>(\mu \pm \frac{3}{2} \Gamma)</math>?

b.) The probability of observing a value from the Lorentzian distribution that is more than 2 half-widths (<math>\Gamma/2</math>) from the mean.

== Derivatives==

Find the uncertainty <math>\sigma_x</math> in <math>x</math> as a function of the uncertainties<math> \sigma_u</math> and <math>\sigma_v</math> in <math>u</math> and <math>v</math> for the following functions:

a.) <math>x = \frac{1}{2(u-v)}</math>

b.) <math>x= uv^2</math>

c.) <math>x = u^2 + v^2</math>

d.) <math>x = \frac{u-v}{u+v}</math>

== Snell's Law==

Given Snell's Law

: <math>n_1 \sin(\theta_1) = n_2 \sin(\theta_2)</math>

Assume <math>n_1=1</math> is known with absolute certainty and find <math>n_2</math> and it's uncertainty when the following angles are measured

:<math>\theta_1 = (22.03 \pm 0.02)^{\circ}</math>
:<math>\theta_2 = (14.45 \pm 0.2)^{\circ}</math>

=Linear Fit=

==2 Eq. 2 unknowns==
Determine <math>x</math> & <math>y</math> for the system below

:<math>7x + 8y =100</math>
:<math>2x-9y = 10</math>

==3 Eq. 3 unknowns==

Given the system of 3 Equations and 3 Unkowns:

:<math>a_{11}x + a_{12}y + a_{13}z = A</math>
:<math>a_{21}x + a_{22}y +a_{23}z= B</math>
:<math>a_{31}x + a_{32}y +a_{33}z= C</math>

or in matrix form

:<math>\left( \begin{array}{ccc} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array} \right)\left( \begin{array}{c} x \\ y \\z\end{array} \right)= \left( \begin{array}{c} A \\ B \\C \end{array} \right) </math>

a.)Express a the solution for x,y, and z in matrix form assuming the remaining terms in the above are known (<math>A,B,C</math> and <math>a_{ij}</math> are known)

b.) Find expressions for x,y, and z in terms of A,B,C when

:<math>x + y -z = A</math>
:<math>z= B</math>
:<math>2x + y +2z= C</math>

== Linear Data Fit==

A metal rod is placed with its ends in two different temperature baths. One bath is at zero degree Celsius and the other at 100. The temperature along the rod is measure and given in the table below.

{| border="1" |cellpadding="20" cellspacing="0
|-
| Trial || Posiiton (cm) ||Temperature (Celsius)
|-
|1 ||1.0||15.6
|-
|2 || 2.0 || 17.5
|-
|3 || 3.0|| 36.6
|-
|4 || 4.0 || 43.8
|-
|5 || 5.0 || 58.2
|-
|6 || 6.0 ||61.6
|-
|7 || 7.0 || 64.2
|-
|8 || 8.0 || 70.4
|-
|9 || 9.0 || 98.8
|}

a.) Fit the above data assuming a linear function
T = A + B X

assuming all of the data has the same uncertainty. Report the value of A & B and the uncertainty in the parameters.

b.) repeat part a.) above except this time assume each data point has an uncertainty given by the Poisson distribution as:

<math>\sigma_i^2 \approx T_i</math>.

= Polynomial Least Squares Fit=

==Matrix Inversion==

Use the Gauss-Jordan method to find the inverse matrix <math>\tilde{A}^{-1}</math> given

:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0\\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

==Poly Fit using ROOT==

Hand in your source code and graphs for Lab 13

= Polynomial Least Squares Fit=

Perform a least squares fit to the data in Lab 14 using the Grid search method from
[http://wiki.iac.isu.edu/index.php/TF_ErrAna_InClassLab#Lab_16 Lab 16].

This will be a way to check your grid search algorithm against a known solution. Use the Temp -vs- Voltage data and the Linear function in Lab 13 above.

Hand in your source code in ROOT and a graph showing the linear fit with the Grid search parameters and the parameters found through Matrix inversion.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrAna Homework

2014-03-21T07:34:36Z

Stocjas2: /* Linear Data Fit */

=Errors=

== Give 5 examples of a different type of Systematic error.==

==Find 3 published examples of data which is wrongly represented. ==
Identify what is incorrect about it. What does it mean to be wrongly presented? A typical example is a political poll which does not identify the statistical uncertainty properly or at all.

==Create a Histogram using ROOT==

some commands that may interest you
<pre>

root [1] TH1F *Hist1=new TH1F("Hist1","Hist1",50,-0.5,49.5);
root [2] Hist1->Fill(10);
root [3] Hist1->Draw();
</pre>

You can use the above commands but you need to change the names and numbers above to receive credit. You must also add a title to the histogram which contains your full name. You will printout the histogram and hand it in with the above two problems.

;Notice how the square rectangle in the histogram is centered at 10!
;Notice that if you do the commands

<pre>
root [2] Hist1->Fill(10);
root [3] Hist1->Draw();
</pre>

the rectangle centered a 10 will reach the value of 2 on the vertical axis.

Two dice are rolled 20 times. Create a histogram to represent the 20 trials below

{| border="1" |cellpadding="20" cellspacing="0
|-
| Trial || Value
|-
|1 ||8
|-
|2 || 10
|-
|3 || 9
|-
|4 || 5
|-
|5 || 9
|-
|6 || 6
|-
|7 || 5
|-
|8 || 6
|-
|9 || 3
|-
|10 || 9
|-
|11 || 8
|-
|12 || 5
|-
|13 || 8
|-
|14 || 10
|-
|15 || 8
|-
|16 || 11
|-
|17 || 12
|-
|18 || 6
|-
|19 || 7
|-
|20 || 8

|}

=Mean and SD=
==Electron radius==
The probability that an electron is a distance <math>r</math> from the center of the hydrogen atom

:<math>P(r) = Cr^2 \exp^{-2 \frac{r}{R}}</math>

Doing the integrals by hand (no tables) ,

a.)Find the value of C

b.) Find the mean electron radius and standard deviation for an electron in a hydrogen atom

==Histograms by Hand==

Given the following test scores from 40 students.

{| border="1" |cellpadding="20" cellspacing="0
|-
| Trial || Value || Trial || Value|| Trial || Value|| Trial || Value
|-
|1 ||49 || 11 ||90 || 21 || 69 || 31 || 74
|-
|2 || 80 || 12 || 84 || 22 ||69 || 32 ||86
|-
|3 || 84 || 13 || 59 || 23 || 53 || 33 || 78
|-
|4 || 73 || 14 || 56 || 24 || 55 || 34 || 55
|-
|5 || 89 || 15 || 62 || 25 || 77 || 35 || 66
|-
|6 || 78 || 16 || 53 || 26 || 82 || 36 || 60
|-
|7 || 78 || 17 || 83 || 27 || 81 || 37 || 68
|-
|8 || 92 || 18 || 81 || 28 || 76 || 38 || 92
|-
|9 || 56 || 19 || 65 || 29 || 79 || 39 || 87
|-
|10 || 85 || 20 || 81 || 30 || 83 || 40 || 86
|}

a.) calculate the mean and standard deviation

b.) construct a histogram by hand which has 10 bins centered on 10,20,...

c.) Use ROOT to construct a histogram. Compare the mean and RMS from ROOT with your result in part (a) above. What is the difference between the RMS report in the ROOT histogram and the standard deviation you calculated in part (a)?

== Variance using Probability function==

Given that
: <math>\sigma^2 = \lim_{N \rightarrow \infty} \frac{1}{N}\sum_{j=1}^n \left [ \left (x_j - \mu \right)^2) \right ]</math>

Justify that

<math>\lim_{N \rightarrow \infty} \frac{1}{N}\sum_{j=1}^n \left [ \left (x_j - \mu \right)^2 \right ] = \lim_{N \rightarrow \infty} \left [ \frac{1}{N}\sum_{j=1}^n x_j^2\right ] - \mu^2</math>

;Note:The standard deviation (<math>\sigma</math>) is the root mean square (RMS) of the deviations.

RMS = <math>\sqrt{\frac{1}{N}\sum_i^N x_i^2}</math> so <math>\sigma = \mbox{RMS}(x_i -\mu)</math>

=Binomial Probability Distributions=
1.)Evaluate the following (at least one by hand)

a.) <math>{6\choose 3}</math>

b.) <math>{4\choose 2}</math>

c.) <math>{10\choose 3}</math>

d.) <math>{52\choose 4}</math>

2.) Plot the binomial distribution P(x) for n=6 and p=1/2 from x=0 to 6.

:<math>P(x) = {n\choose x}p^{x}q^{n-x} </math>

3.) Given the probability distribution below for the sum of the point on a pair of dice

:<math>P(x) =\left \{ {\frac{x-1}{36} \;\;\;\; 2 \le x \le 7 \atop \frac{13-x}{36} \;\;\; 7 < x \le 12} \right .</math>

a.) find the mean

b.) find the standard deviation <math>(\sigma)</math>

4.) Prove that <math>\sigma^2 = npq</math> for the Binomial distribution.

=Poisson Prob Dist=

==Variance==

Show that <math>\sigma^2 = \mu</math> for the Poisson Distribution starting with the definition of variance.

==Binomial/Poisson Statistic==
The probability that a student will fail this course is 7.3%.

a.) Calculate by hand (i.e. without a computer/calculator) the expected number of students that will fail this course if there are 32 enrolled?

b.) Calculate by hand the probability that 5 or more will fail in one semester.

== Deadtime==

In a counting experiment it is possible for a detector to be "too busy" recording the effects of a detected particle that it is unable to measure another particle traversing the detector during the short time interval. "Dead time" is a measure of the time interval over which your detector is unable to make a measurement because it is currently making a measurement.

Assume that particle hit your detector at a rate of <math>1 \times 10^6</math> particles/sec and that your detector has a deadtime of 200 ns <math>(200 \times 10^{-9} sec)</math>. The mean number of particles hitting the detector during this deadtime is <math>\mu = 0.2</math>. The detector efficiency is defined as

:<math>\epsilon = \frac{\mbox{average number of particles counted}}{\mbox{number of particles passing through the detector in 200 ns}}</math>

a.) Find the efficiency of the detector assuming the process follows the Poisson distribution.

b.) Graph the efficiency as a function of the incident particle flux for rates between 0 and <math>10 \times 10^6</math> particles/sec.

= Gaussian Prob Dist=
== Counting experiment variance==

a.)What is the standard deviation for a counting experiment with a mean <math>\mu</math> = 100.

b.)What is the standard deviation if the mean number of counts is increased by a factor of 4?

== Half Width -vs- variance==
Show that the full-width at half maximum <math>(\Gamma)</math> is related to the standard devision by <math>\Gamma = 2.3548 \sigma</math> for the Gaussian probability distribution. Begin with the definition that

:<math>P_G\left (\mu + \frac{\Gamma}{2} \right ) = \frac{P_G(\mu)}{2}</math>

== Gaussian Probability==
a.) Determine, by direct integration, the probability of observing a value beyond 1 standard deviation from the mean of a Gaussian distribution? ie: <math>P_G(X \le \mu - \sigma, \mu, \sigma)+ P_G(X \ge \mu + \sigma, \mu, \sigma)</math> =?

b.) Find the value of <math>A</math> such that <math>P_G(\mu + \sigma, \mu, \sigma) = A P_G(\mu,\mu,\sigma)</math>.

c.) repeat parts a.) and b.) above for <math>P_G(\mu + P.E., \mu, \sigma)</math> and <math>P_G(\mu + \Gamma/2, \mu, \sigma)</math>

<math>P_G(X =\mu + \sigma, \mu, \sigma)=0.24197 = 0.60653 P_G(X = \mu , \mu, \sigma)</math>

<math>P_G(X =\mu + PE, \mu, \sigma)=0.3178 = 0.7965 P_G(X = \mu , \mu, \sigma)</math>

<math>P_G(X =\mu - \Gamma/2, \mu, \sigma)=0.19947 = 0.5 P_G(X = \mu , \mu, \sigma)</math>

=Error Propagation=
==Lorentzian==

a.)What fraction of the area of a Lorentzian curve is enclosed within the interval <math>(\mu \pm \frac{3}{2} \Gamma)</math>?

b.) The probability of observing a value from the Lorentzian distribution that is more than 2 half-widths (<math>\Gamma/2</math>) from the mean.

== Derivatives==

Find the uncertainty <math>\sigma_x</math> in <math>x</math> as a function of the uncertainties<math> \sigma_u</math> and <math>\sigma_v</math> in <math>u</math> and <math>v</math> for the following functions:

a.) <math>x = \frac{1}{2(u-v)}</math>

b.) <math>x= uv^2</math>

c.) <math>x = u^2 + v^2</math>

d.) <math>x = \frac{u-v}{u+v}</math>

== Snell's Law==

Given Snell's Law

: <math>n_1 \sin(\theta_1) = n_2 \sin(\theta_2)</math>

Assume <math>n_1=1</math> is known with absolute certainty and find <math>n_2</math> and it's uncertainty when the following angles are measured

:<math>\theta_1 = (22.03 \pm 0.02)^{\circ}</math>
:<math>\theta_2 = (14.45 \pm 0.2)^{\circ}</math>

=Linear Fit=

==2 Eq. 2 unknowns==
Determine <math>x</math> & <math>y</math> for the system below

:<math>7x + 8y =100</math>
:<math>2x-9y = 10</math>

==3 Eq. 3 unknowns==

Given the system of 3 Equations and 3 Unkowns:

:<math>a_{11}x + a_{12}y + a_{13}z = A</math>
:<math>a_{21}x + a_{22}y +a_{23}z= B</math>
:<math>a_{31}x + a_{32}y +a_{33}z= C</math>

or in matrix form

:<math>\left( \begin{array}{ccc} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array} \right)\left( \begin{array}{c} x \\ y \\z\end{array} \right)= \left( \begin{array}{c} A \\ B \\C \end{array} \right) </math>

a.)Express a the solution for X,Y, and Z in matrix form assuming the remaining terms in the above are known (<math>A,B,C</math> and <math>a_{ij}</math> are known)

b.) Find expressions for x,y, and z in terms of A,B,C when

:<math>x + y -z = A</math>
:<math>z= B</math>
:<math>2x + y +2z= C</math>

== Linear Data Fit==

A metal rod is placed with its ends in two different temperature baths. One bath is at zero degree Celsius and the other at 100. The temperature along the rod is measure and given in the table below.

{| border="1" |cellpadding="20" cellspacing="0
|-
| Trial || Posiiton (cm) ||Temperature (Celsius)
|-
|1 ||1.0||15.6
|-
|2 || 2.0 || 17.5
|-
|3 || 3.0|| 36.6
|-
|4 || 4.0 || 43.8
|-
|5 || 5.0 || 58.2
|-
|6 || 6.0 ||61.6
|-
|7 || 7.0 || 64.2
|-
|8 || 8.0 || 70.4
|-
|9 || 9.0 || 98.8
|}

a.) Fit the above data assuming a linear function
T = A + B X

assuming all of the data has the same uncertainty. Report the value of A & B and the uncertainty in the parameters.

b.) repeat part a.) above except this time assume each data point has an uncertainty given by the Poisson distribution as:

<math>\sigma_i^2 \approx T_i</math>.

= Polynomial Least Squares Fit=

==Matrix Inversion==

Use the Gauss-Jordan method to find the inverse matrix <math>\tilde{A}^{-1}</math> given

:<math> \tilde{A} =
\begin{bmatrix}
2 & -1 & 0\\
-1 & 2 & -1 \\
0 & -1 & 2
\end{bmatrix}.
</math>

==Poly Fit using ROOT==

Hand in your source code and graphs for Lab 13

= Polynomial Least Squares Fit=

Perform a least squares fit to the data in Lab 14 using the Grid search method from
[http://wiki.iac.isu.edu/index.php/TF_ErrAna_InClassLab#Lab_16 Lab 16].

This will be a way to check your grid search algorithm against a known solution. Use the Temp -vs- Voltage data and the Linear function in Lab 13 above.

Hand in your source code in ROOT and a graph showing the linear fit with the Grid search parameters and the parameters found through Matrix inversion.

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

Forest ErrAna StatDist

2014-03-14T19:27:55Z

Stocjas2: /* Variance */

==Parent Distribution==

Let <math>x_i</math> represent our ith attempt to measurement the quantity <math>x</math>

Due to the random errors present in any experiment we should not expect <math>x_i = x</math>.

If we neglect systematic errors, then we should expect <math> x_i</math> to, on average, follow some probability distribution around the correct value <math>x</math>.

This probability distribution can be referred to as the "parent population".

== Average and Variance ==
===Average===

The word "average" is used to describe a property of a "parent" probability distribution or a set of observations/measurements made in an experiment which gives an indication of a likely outcome of an experiment.

The symbol

: <math>\mu</math>

is usually used to represent the "mean" of a known probability (parent) distribution (parent mean) while the "average" of a set of observations/measurements is denoted as

: <math>\bar{x}</math>

and is commonly referred to as the "sample" average or "sample mean".

Definition of the mean
:<math>\mu \equiv \lim_{N\rightarrow \infty} \frac{\sum x_i}{N}</math>

Here the above average of a parent distribution is defined in terms of an infinite sum of observations (x_i) of an observable x divided by the number of observations.

<math>\bar{x}</math> is a calculation of the mean using a finite number of observations

:<math> \bar{x} \equiv \frac{\sum x_i}{N}</math>

This definition uses the assumption that the result of an experiment, measuring a sample average of <math>(\bar{x})</math>, asymptotically approaches the "true" average of the parent distribution <math>\mu</math> .

===Variance===

The word "variance" is used to describe a property of a probability distribution or a set of observations/measurements made in an experiment which gives an indication how much an observation will deviate from and average value.

A deviation <math>(d_i)</math> of any measurement <math>(x_i)</math> from a parent distribution with a mean <math>\mu</math> can be defined as

:<math>d_i\equiv x_i - \mu</math>

the deviations should average to ZERO for an infinite number of observations by definition of the mean.

Definition of the average
:<math>\mu \equiv \lim_{N\rightarrow \infty} \frac{\sum x_i}{N}</math>

:<math>\lim_{N\rightarrow \infty} \frac{\sum (x_i - \mu)}{N}</math>
: <math>= \left ( \lim_{N\rightarrow \infty} \frac{\sum (x_i }{N}\right ) - \mu</math>
: <math>= \left ( \lim_{N\rightarrow \infty} \frac{\sum (x_i }{N}\right ) - \lim_{N\rightarrow \infty} \frac{\sum x_i}{N} = 0</math>

But the AVERAGE DEVIATION <math>(\bar{d})</math> is given by an average of the magnitude of the deviations given by

:<math>\bar{d} = \lim_{N\rightarrow \infty} \frac{\sum \left | (x_i - \mu)\right |}{N}</math> = a measure of the dispersion of the expected observations about the mean

Taking the absolute value though is cumbersome when performing a statistical analysis so one may express this dispersion in terms of the variance

A typical variable used to denote the variance is

:<math>\sigma^2</math>

and is defined as

:<math>\sigma^2 = \lim_{N\rightarrow \infty}\left [ \frac{\sum (x_i-\mu)^2 }{N}\right ]</math>

====Standard Deviation====

The standard deviation is defined as the square root of the variance

:S.D. = <math>\sqrt{\sigma^2}</math>

The mean should be thought of as a parameter which characterizes the observations we are making in an experiment. In general the mean specifies the probability distribution that is representative of the observable we are trying to measure through experimentation.

The variance characterizes the uncertainty associated with our experimental attempts to determine the "true" value. Although the mean and true value may not be equal, their difference should be less than the uncertainty given by the governing probability distribution.

==== Another Expression for Variance====

Using the definition of variance (omitting the limit as <math>n \rightarrow \infty</math>)
;Evaluating the definition of variance: <math>\sigma^2 \equiv \frac{\sum(x_i-\mu)^2}{N} = \frac{\sum (x_i^2 -2x_i \mu + \mu^2)}{N} = \frac{\sum x_i^2}{N} - 2 \mu \frac{\sum x_i}{N} + \frac{N \mu^2}{N} </math>
:<math> = \frac{\sum x_i^2}{N} -2 \mu^2 + \mu^2 =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\frac{\sum(x_i-\mu)^2}{N} =\frac{\sum x_i^2}{N} - \mu^2</math>

You can recast the above in terms of expectation value where

<math>E[x] \equiv \sum x_i P_x(x)</math>

:<math>\Rightarrow \sigma^2 = E[(x-\mu)^2] = \sum_{x=0}^n (x_i - \mu)^2 P(x_i)</math>
: <math>= E[x^2] - \left ( E[x]\right )^2 = \sum_{x=0}^n x_i^2 P(x_i) - \left ( \sum_{x=0}^n x_i P(x_i)\right )^2</math>

== Average for an unknown probability distribution (parent population)==

If the "Parent Population" is not known, you are just given a list of numbers with no indication of the probability distribution that they were drawn from, then the average and variance may be calculate as shown below.

===Arithmetic Mean and variance===

If <math>n</math> observables are mode in an experiment then the arithmetic mean of those observables is defined as

:<math>\bar{x} = \frac{\sum_{i=1}^{i=N} x_i}{N}</math>

The "unbiased" variance of the above sample is defined as

:<math>s^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N-1}</math>

;If you were told that the average is <math>\bar{x}</math> then you can calculate the
"true" variance of the above sample as

:<math>\sigma^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N}</math> = RMS Error= Root Mean Squared Error

;Note:RMS = Root Mean Square = <math>\frac{\sum_i^n x_i^2}{N}</math> =

==== Statistical Variance decreases with N====

The repetition of an experiment can decrease the STATISTICAL error of the experiment

Consider the following:

The average value of the mean of a sample of n observations drawn from the parent population is the same as the average value of each observation. (The average of the averages is the same as one of the averages)

: <math>\bar{x} = \frac{\sum x_i}{N} =</math> sample mean

:<math>\overline{\left ( \bar{x} \right ) } = \frac{\sum{\bar{x}_i}}{N} =\frac{1}{N} N \bar{x_i} = \bar{x}</math> if all means are the same

This is the reason why the sample mean is a measure of the population average ( <math>\bar{x} \sim \mu</math>)

Now consider the variance of the average of the averages (this is not the variance of the individual measurements but the variance of their means)

:<math>\sigma^2_{\bar{x}} = \frac{\sum \left (\bar{x} -\overline{\left ( \bar{x} \right ) } \right )^2}{N} =\frac{\sum \bar{x_i}^2}{N} -\left( \overline{\left ( \bar{x} \right ) } \right )^2</math>
:<math>=\frac{\sum \bar{x_i}^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{\sum \left( \sum \frac{x_i}{N}\right)^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2}\frac{\sum \left( \sum x_i\right)^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2}\frac{\sum \left (\sum x_i^2 + \sum_{i \ne j} x_ix_j \right )}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ \frac{\sum \left(\sum x_i^2 \right)}{N} + \frac{ \sum \left (\sum_{i \ne j} x_ix_j \right )}{N} \right ] -\left( \bar{x} \right )^2</math>

;If the measurements are all independent
:Then <math> \frac{\sum_{i \ne j} x_i x_j}{N} = \frac{\sum x_i}{N} \frac{ \sum x_j}{N}</math> : if <math>x_i</math> is independent of <math>x_j</math> (<math>i \ne j</math>)
:<math>= \left ( \frac{\sum x_i}{N} \right)^2 = \bar{x}^2</math>

example:
:(x_1x_2 + x_1x_3 + x_2x_1+x_2x_3+x_3x_1+x_3x_2+ ...) = (x_1+x_2+x_3)
The above part of the proof needs work

:<math>\sigma^2_{\bar{x}}=\frac{1}{N^2} \left [ \frac{\sum \left(\sum x_i^2 \right)}{N} + \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>

I use the expression <math>\sigma^2 = E[x^2] - \left ( E[x] \right)^2</math> again, except for<math> x_i</math> and not <math>\bar{x}</math> and turn it around so

: <math>\frac{\left(\sum x_i^2 \right)}{N} = \sigma^2 + \left ( \frac{\sum x_i}{N}\right)^2</math>

Now I have

:<math>\sigma^2_{\bar{x}}=\frac{1}{N^2} \left [ \sum \left (\sigma^2 + \left ( \frac{\sum x_i}{N} \right )^2 \right )+ \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + N(N-1) \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math> Number of cross terms is N*(N-1)
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + (N^2 -N) \left ( \frac{\sum x_i}{N} \right )^2 \right ] -\left( \bar{x} \right )^2</math> Number of cross terms is N*(N-1)
:<math>= \left [ \frac{\sigma^2}{N} + \left ( \frac{\sum x_i}{N} \right )^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>= \left [ \frac{\sigma^2}{N} + \left ( \bar{x}\right )^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>= \frac{\sigma^2}{N} </math>

The above is the essence of counting statistics.

It says that the STATISTICAL error in an experiment decreases as a function of <math>\frac{1}{\sqrt N}</math>

==== Biased and Unbiased variance====

Where does this idea of an unbiased variance come from?

Using the same procedure as the previous section let's look at the average variance of the variances.

A sample variance of <math>n</math> measurements of <math>x_i</math> is
: <math>\sigma_n^2 = \frac{\sum(x_i-\bar{x})^2}{n} = E[x^2] - \left ( E[x] \right)^2 = \frac{\sum x_i^2}{n} -\left ( \bar{x} \right)^2</math>

To determine the "true" variance consider taking average of several sample variances (this is the same argument used above which let to <math>\overline{(\bar{x})} = \bar{x}</math> )

:<math>\frac{\sum_j \left [ \sigma_n^2 \right ]_j}{N} = \frac{ \sum_j \left [ \frac{\sum_i x_i^2}{n} -\left ( \bar{x} \right)^2 \right ]_j}{N}</math>
:<math>= \frac{1}{n}\sum_i \left ( \frac{\sum_j x_j^2}{N} \right )_i - \frac {\sum_j \left ( \bar{x} \right)^2 }{N}</math>
:<math>= \frac{1}{n}\sum_i \left ( \frac{\sum_j x_j^2}{N} \right )_i - \left [ \left ( \frac {\sum_j \bar{x}}{N} \right)^2 + \sigma_{\bar{x}}^2\right ]</math> : as shown previously <math>E[\bar{x}^2] = \left ( E[\bar{x}] \right )^2 + \sigma_{\bar{x}}^2</math>
:<math>= \frac{1}{n}\sum_i \left ( \left [ \left (\frac{\sum_j x_j}{N}\right)^2 + \sigma^2 \right ]\right )_i - \left [ \left ( \frac {\sum_j x_j}{N} \right)^2 + \frac{\sigma^2}{n}\right ]</math> : also shown previously<math>\overline{\left ( \bar{x} \right ) } = \bar{x}</math> the universe average is the same as the sample average
:<math>= \frac{1}{n} \left ( n\left [ \left (\frac{\sum_j x_j}{N}\right)^2 + n\sigma^2 \right ]\right )_i - \left [ \left ( \frac {\sum_j x_j}{N} \right)^2 + \frac{\sigma^2}{n}\right ]</math>
:<math>= \sigma^2 - \frac{\sigma^2}{n}</math>
: <math>= \frac{n-1}{n}\sigma^2</math>

:<math>\Rightarrow \sigma^2 = \frac{n}{n-1}\frac{\sum \sigma_i^2}{N}</math>

Here

:<math>\sigma^2 =</math> the sample variance

:<math>\frac{\sum \sigma_i^2}{N} =</math> an average of all possible sample variance which should be equivalent to the "true" population variance.

:<math>\Rightarrow \frac{\sum \sigma_i^2}{N} \sim \sum \frac{x_i-\bar{x}}{n}</math> : if all the variances are the same this would be equivalent

: <math>\sigma^2 = \frac{n}{n-1}\frac{\sum(x_i-\bar{x})}{n}</math>
: <math>= \frac{\sum(x_i-\bar{x})}{n-1} =</math> unbiased sample variance

==Probability Distributions==

=== Mean(Expectation value) and variance===
====Mean of Discrete Probability Distribution====

In the case that you know the probability distribution you can calculate the mean<math> (\mu)</math> or expectation value E(x) and standard deviation as

For a Discrete probability distribution

<math>\mu = E[x]=\lim_{N \rightarrow \infty} \frac{\sum_{i=1}^n x_i P(x_i)}{N}</math>

where

<math>N=</math> number of observations

<math>n=</math> number of different possible observable variables

<math>x_i =</math> ith observable quantity

<math>P(x_i) =</math> probability of observing <math>x_i</math> = Probability Mass Distribution for a discrete probability distribution

====Mean of a continuous probability distibution====
The average (mean) of a sample drawn from any probability distribution is defined in terms of the expectation value E(x) such that

The expectation value for a continuous probability distribution is calculated as

: <math>\mu = E(x) = \int_{-\infty}^{\infty} x P(x)dx</math>

===Variance===

==== Variance of a discrete PDF====

<math>\sigma^2 = \sum_{i=1}^n \left [ (x_i - \mu)^2 P(x_i)\right ]</math>

==== Variance of a Continuous PDF ====

<math>\sigma^2 = \int_{-\infty}^{\infty} \left [ (x - \mu)^2 P(x)\right ]dx</math>

==== Expectation of Arbitrary function====

If <math>f(x)</math> is an arbitrary function of a variable <math>x</math> governed by a probability distribution <math>P(x)</math>

then the expectation value of <math>f(x)</math> is

<math>E[f(x)] = \sum_{i=1}^N f(x_i) P(x_i) </math>

or if a continuous distribtion

<math>E[f(x)] = \int_{-\infty}^{\infty} f(x) P(x)dx</math>

===Uniform===

The Uniform probability distribution function is a continuous probability function over a specified interval in which any value within the interval has the same probability of occurring.

Mathematically the uniform distribution over an interval from a to b is given by

:<math>P_U(x) =\left \{ {\frac{1}{b-a} \;\;\;\; x >a \mbox{ and } x b \mbox{ or } x < a} \right .</math>

====Mean of Uniform PDF====

:<math>\mu = \int_{-\infty}^{\infty} xP_U(x)dx = \int_{a}^{b} \frac{x}{b-a} dx = \left . \frac{x^2}{2(b-a)} \right |_a^b = \frac{1}{2}\frac{b^2 - a^2}{b-a} = \frac{1}{2}(b+a)</math>

====Variance of Uniform PDF====

:<math>\sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2 P_U(x)dx = \int_{a}^{b} \frac{\left (x-\frac{b+a}{2}\right )^2}{b-a} dx = \left . \frac{(x -\frac{b+a}{2})^3}{3(b-a)} \right |_a^b </math>
:<math>=\frac{1}{3(b-a)}\left [ \left (b -\frac{b+a}{2} \right )^3 - \left (a -\frac{b+a}{2} \right)^3\right ]</math>
:<math>=\frac{1}{3(b-a)}\left [ \left (\frac{b-a}{2} \right )^3 - \left (\frac{a-b}{2} \right)^3\right ]</math>
:<math>=\frac{1}{24(b-a)}\left [ (b-a)^3 - (-1)^3 (b-a)^3\right ]</math>
:<math>=\frac{1}{12}(b-a)^2</math>

Now use ROOT to generate uniform distributions.
http://wiki.iac.isu.edu/index.php/TF_ErrAna_InClassLab#Day_3

===Binomial Distribution===

Binomial random variable describes experiments in which the outcome has only 2 possibilities. The two possible outcomes can be labeled as "success" or "failure". The probabilities may be defined as

;p
: the probability of a success

and

;q
:the probability of a failure.

If we let <math>X</math> represent the number of successes after repeating the experiment <math>n</math> times

Experiments with <math>n=1</math> are also known as Bernoulli trails.

Then <math>X</math> is the Binomial random variable with parameters <math>n</math> and <math>p</math>.

The number of ways in which the <math>x</math> successful outcomes can be organized in <math>n</math> repeated trials is

:<math>\frac{n !}{ \left [ (n-x) ! x !\right ]}</math> where the <math> !</math> denotes a factorial such that <math>5! = 5\times4\times3\times2\times1</math>.

The expression is known as the binomial coefficient and is represented as

<math>{n\choose x}=\frac{n!}{x!(n-x)!}</math>

The probability of any one ordering of the success and failures is given by

<math>P( \mbox{experimental ordering}) = p^{x}q^{n-x}</math>

This means the probability of getting exactly k successes after n trials is

:<math>P_B(x) = {n\choose x}p^{x}q^{n-x} </math>

==== Mean====

It can be shown that the Expectation Value of the distribution is

:<math>\mu = n p</math>

:<math>\mu = \sum_{x=0}^n x P_B(x) = \sum_{x=0}^n x \frac{n!}{x!(n-x)!} p^{x}q^{n-x}</math>
:<math> = \sum_{x=1}^n \frac{n!}{(x-1)!(n-x)!} p^{x}q^{n-x}</math> :summation starts from x=1 and not x=0 now
:<math> = np \sum_{x=1}^n \frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}q^{n-x}</math> :factor out <math>np</math> : replace n-1 with m everywhere and it looks like binomial distribution
:<math> = np \sum_{y=0}^{n-1} \frac{(n-1)!}{(y)!(n-y-1)!} p^{y}q^{n-y-1}</math> :change summation index so y=x-1, now n become n-1
:<math> = np \sum_{y=0}^{n-1} \frac{(n-1)!}{(y)!(n-1-y)!} p^{y}q^{n-1-y}</math> :
:<math> = np (q+p)^{n-1}</math> :definition of binomial expansion
:<math> = np 1^{n-1}</math> :q+p =1
:<math> = np </math>

====variance ====

:<math>\sigma^2 = npq</math>

;Remember: <math>\frac{\sum(x_i-\mu)^2}{N} = \frac{\sum (x_i^2 -2x_i \mu + \mu^2)}{N} = \frac{\sum x_i^2}{N} - 2 \mu \frac{\sum x_i}{N} + \frac{N \mu^2}{N} </math>
:<math> = \frac{\sum x_i^2}{N} -2 \mu^2 + \mu^2 =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\frac{\sum(x_i-\mu)^2}{N} =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\Rightarrow \sigma^2 = E[(x-\mu)^2] = \sum_{x=0}^n (x_i - \mu)^2 P_B(x_i)</math>
: <math>= E[x^2] - \left ( E[x]\right )^2 = \sum_{x=0}^n x_i^2 P_B(x_i) - \left ( \sum_{x=0}^n x_i P_B(x_i)\right )^2</math>

To calculate the variance of the Binomial distribution I will just calculate <math>E[x^2]</math> and then subtract off <math>\left ( E[x]\right )^2</math>.

:<math>E[x^2] = \sum_{x=0}^n x^2 P_B(x)</math>
: <math>= \sum_{x=1}^n x^2 P_B(x)</math> : x=0 term is zero so no contribution
:<math>=\sum_{x=1}^n x^2 \frac{n!}{x!(n-x)!} p^{x}q^{n-x}</math>
: <math>= np \sum_{x=1}^n x \frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}q^{n-x}</math>

Let m=n-1 and y=x-1

: <math>= np \sum_{y=0}^n (y+1) \frac{m!}{(y)!(m-1-y+1)!} p^{y}q^{m-1-y+1}</math>
: <math>= np \sum_{y=0}^n (y+1) P(y)</math>
: <math>= np \left ( \sum_{y=0}^n y P(y) + \sum_{y=0}^n (1) P(y) \right)</math>
: <math>= np \left ( mp + 1 \right)</math>
: <math>= np \left ( (n-1)p + 1 \right)</math>

:<math>\sigma^2 = E[x^2] - \left ( E[x] \right)^2 = np \left ( (n-1)p + 1 \right) - (np)^2 = np(1-p) = npq</math>

=== Examples===

==== The number of times a coin toss is heads.====

The probability of a coin landing with the head of the coin facing up is

:<math>P = \frac{\mbox{number of desired outcomes}}{\mbox{number of possible outcomes}} = \frac{1}{2}</math> = Uniform distribution with a=0 (tails) b=1 (heads).

Suppose you toss a coin 4 times. Here are the possible outcomes

{| border="1" |cellpadding="20" cellspacing="0
|order Number
|colspan= "4" | Trial #
| # of Heads
|-
| || 1|| 2 || 3|| 4 ||
|-
|1 ||t || t || t|| t ||0
|-
|2||h || t || t|| t ||1
|-
|3||t || h || t|| t ||1
|-
|4||t || t || h|| t ||1
|-
|5||t || t || t|| h ||1
|-
|6||h || h || t|| t ||2
|-
|7||h || t || h|| t ||2
|-
|8||h || t || t|| h||2
|-
|9||t || h || h|| t ||2
|-
|10||t || h || t|| h ||2
|-
|11||t || t || h|| h ||2
|-
|12||t|| h || h|| h||3
|-
|13||h|| t || h|| h||3
|-
|14||h|| h || t|| h||3
|-
|15||h|| h || h|| t||3
|-
|16||h|| h || h|| h||4
|}

The probability of order #1 happening is

P( order #1) = <math>\left ( \frac{1}{2} \right )^0\left ( \frac{1}{2} \right )^4 = \frac{1}{16}</math>

P( order #2) = <math>\left ( \frac{1}{2} \right )^1\left ( \frac{1}{2} \right )^3 = \frac{1}{16}</math>

The probability of observing the coin land on heads 3 times out of 4 trials is.

<math>P(x=3) = \frac{4}{16} = \frac{1}{4} = {n\choose x}p^{x}q^{n-x} = \frac{4 !}{ \left [ (4-3) ! 3 !\right ]} \left ( \frac{1}{2}\right )^{3}\left ( \frac{1}{2}\right )^{4-3} = \frac{24}{1 \times 6} \frac{1}{16} = \frac{1}{4}</math>

==== A 6 sided die====

A die is a 6 sided cube with dots on each side. Each side has a unique number of dots with at most 6 dots on any one side.

P=1/6 = probability of landing on any side of the cube.

Expectation value :
; The expected (average) value for rolling a single die.
: <math>E({\rm Roll\ With\ 6\ Sided\ Die}) =\sum_i x_i P(x_i) =1 \left ( \frac{1}{6} \right) + 2\left ( \frac{1}{6} \right)+ 3\left ( \frac{1}{6} \right)+ 4\left ( \frac{1}{6} \right)+ 5\left ( \frac{1}{6} \right)+ 6\left ( \frac{1}{6} \right)=\frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5</math>

The variance:

: <math>E({\rm Roll\ With\ 6\ Sided\ Die}) =\sum_i (x_i - \mu)^2 P(x_i) </math>
: <math>= (1-3.5)^2 \left ( \frac{1}{6} \right) + (2-3.5)^2\left ( \frac{1}{6} \right)+ (3-3.5)^2\left ( \frac{1}{6} \right)+ (4-3.5)^2\left ( \frac{1}{6} \right)+ (5-3.5)^2\left ( \frac{1}{6} \right)+ (6-3.5)^2\left ( \frac{1}{6} \right) =2.92</math>
: <math>= \sum_i (x_i)^2 P(x_i) - \mu^2 = \left [ 1 \left ( \frac{1}{6} \right) + 4\left ( \frac{1}{6} \right)+ 9\left ( \frac{1}{6} \right)+ 16\left ( \frac{1}{6} \right)+ 25\left ( \frac{1}{6} \right)+ 36\left ( \frac{1}{6} \right) \right ] - (3.5)^3 =2.92</math>

If we roll the die 10 times what is the probability that X dice will show a 6?

A success will be that the die landed with 6 dots face up.

So the probability of this is 1/6 (p=1/6) , we toss it 10 times (n=10) so the binomial distribution function for a success/fail experiment says

<math>P_B(x) = {n\choose x}p^{x}q^{n-x} = \frac{10 !}{ \left [ (10-x) ! x !\right ]} \left ( \frac{1}{6}\right )^{x}\left ( \frac{5}{6}\right )^{10-x} </math>

So the probability the die will have 6 dots face up in 4/10 rolls is

<math>P_B(x=4) = \frac{10 !}{ \left [ (10-4) ! 4 !\right ]} \left ( \frac{1}{6}\right )^{4}\left ( \frac{5}{6}\right )^{10-4} </math>
:<math> = \frac{10 !}{ \left [ (6) ! 4 !\right ]} \left ( \frac{1}{6}\right )^{4}\left ( \frac{5}{6}\right )^{6} = \frac{210 \times 5^6}{6^10}=0.054 </math>

Mean = np =<math>\mu = 10/6 = 1.67</math>
Variance = <math>\sigma^2 = 10 (1/6)(5/6) = 1.38</math>

===Poisson Distribution===

The Poisson distribution is an approximation to the binomial distribution in the event that the probability of a success is quite small <math>(p \ll 1)</math>. As the number of repeated observations (n) gets large, the binomial distribution becomes more difficult to evaluate because of the leading term

:<math>\frac{n !}{ \left [ (n-x) ! x !\right ]}</math>

The poisson distribution overcomes this problem by defining the probability in terms of the average <math>\mu</math>.

:<math>P_P(x) = \frac{\mu^x e^{-\mu}}{x!}</math>

====Poisson as approximation to Binomial====

To drive home the idea that the Poisson distribution approximates a Binomial distribution at small p and large n consider the following derivation

The Binomial Probability Distriubtions is

:<math>P_B(x) = \frac{n!}{x!(n-x)!}p^{x}q^{n-x}</math>

The term

:<math> \frac{n!}{(n-x)!} = \frac{(n-x)! (n-x+1) (n-x + 2) \dots (n-1)(n)}{(n-x)!}</math>
:<math>= n (n-1)(n-2) \dots (n-x+2) (n-x+1)</math>

;IFF <math>x \ll n \Rightarrow</math> we have x terms above
:then <math>\frac{n!}{(n-x)!} =n^x</math>
:example:<math> \frac{100!}{(100-1)!} = \frac{99! \times 100}{99!} = 100^1</math>

This leave us with

:<math>P(x) = \frac{n^x}{x!}p^{x}q^{n-x}= \frac{(np)^x}{x!}(1-p)^{n-x}</math>
: <math>= \frac{(\mu)^x}{x!}(1-p)^{n}(1-p)^{-x}</math>

:<math>(1-p)^{-x} = \frac{1}{(1-p)^x} = 1+px = 1 : p \ll 1</math>

:<math>P(x) = \frac{(\mu)^x}{x!}(1-p)^{n}</math>

:<math>(1-p)^{n} = \left [(1-p)^{1/p} \right]^{\mu}</math>

: <math>\lim_{p \rightarrow 0} \left [(1-p)^{1/p} \right]^{\mu} = \left ( \frac{1}{e} \right)^{\mu} = e^{- \mu}</math>

;For <math>x \ll n</math>
:<math>\lim_{p \rightarrow 0}P_B(x,n,p ) = P_P(x,\mu)</math>

==== Derivation of Poisson Distribution====

The mean free path of a particle traversing a volume of material is a common problem in nuclear and particle physics. If you want to shield your apparatus or yourself from radiation you want to know how far the radiation travels through material.

The mean free path is the average distance a particle travels through a material before interacting with the material.
;If we let <math>\lambda</math> represent the mean free path
;Then the probability of having an interaction after a distance x is
: <math>\frac{x}{\lambda}</math>

as a result

: <math>1-\frac{x}{\lambda}= P(0,x, \lambda)</math> = probability of getting no events after a length dx

When we consider <math>\frac{x}{\lambda} \ll 1</math> ( we are looking for small distances such that the probability of no interactions is high)

:<math>P(0,x, \lambda) = e^{\frac{-x}{\lambda}} \approx 1 - \frac{x}{\lambda}</math>

Now we wish to find the probability of finding <math>N</math> events over a distance <math>x</math> given the mean free path.

This is calculated as a joint probability. If it were the case that we wanted to know the probability of only one interaction over a distance <math>L</math>. Then we would want to multiply the probability that an interaction happened after a distance <math>dx</math> by the probability that no more interactions happen by the time the particle reaches the distance <math>L</math>.

For the case of <math>N</math> interactions, we have a series of <math>N</math> interactions happening over N intervals of <math>dx</math> with the probability <math>dx/\lambda</math>

:<math>P(N,x,\lambda)</math> = probability of finding <math>N</math> events within the length <math>x</math>
: <math>= \frac{dx_1}{\lambda}\frac{dx_2}{\lambda}\frac{dx_3}{\lambda} \dots \frac{dx_N}{\lambda} e^{\frac{-x}{\lambda}}</math>

The above expression represents the probability for a particular sequence of events in which an interaction occurs after a distance <math>dx_1</math> then a interaction after <math>dx_2</math> , <math>\dots</math>

So in essence the above expression is a "probability element" where another probability element may be

: <math> P(N,x, \lambda)=\frac{dx_2}{\lambda}\frac{dx_1}{\lambda}\frac{dx_3}{\lambda} \dots \frac{dx_N}{\lambda} e^{\frac{-x}{\lambda}}</math>

where the first interaction occurs after the distance <math>x_2</math>.

: <math>= \Pi_{i=1}^{N} \left [ \frac{dx_i}{\lambda} \right ] e^{\frac{-x}{\lambda}}</math>

So we can write a differential probability element which we need to add up as

:<math>d^NP(N,x, \lambda)=\frac{1}{N!} \Pi_{i=1}^{N} \left [ \frac{dx_i}{\lambda} \right ] e^{\frac{-x}{\lambda}}</math>

The N! accounts for the degeneracy in which for every N! permutations there is really only one new combination. ie we are double counting when we integrate.

Using the integral formula
: <math> \Pi_{i=1}^{N} \left [\int_0^x \frac{dx_i}{\lambda} \right ]= \left [ \frac{x}{\lambda}\right]^N</math>

we end up with

<math>P(N,x, \lambda) = \frac{\left [ \frac{x}{\lambda}\right]^N}{N!} e^{\frac{-x}{\lambda}}</math>

====Mean of Poisson Dist====

:<math>\mu = \sum_{i=1}^{\infty} i P(i,x, \lambda)</math>
: <math>= \sum_{i=1}^{\infty} i \frac{\left [ \frac{x}{\lambda}\right]^i}{i!} e^{\frac{-x}{\lambda}}
= \frac{x}{\lambda} \sum_{i=1}^{\infty} \frac{\left [ \frac{x}{\lambda}\right]^{(i-1)}}{(i-1)!} e^{\frac{-x}{\lambda}} = \frac{x}{\lambda}
</math>

:<math>P_P(x,\mu) = \frac{\mu^x e^{-\mu}}{x!} </math>

====Variance of Poisson Dist====

For [http://wiki.iac.isu.edu/index.php/TF_ErrAna_Homework#Poisson_Prob_Dist Homework] you will show, in a manner similar to the above mean calculation, that the variance of the Poisson distribution is

:<math>\sigma^2 = \mu</math>

===Gaussian===

The Gaussian (Normal) distribution is an approximation of the Binomial distribution for the case of a large number of possible different observations. Poisson approximated the binomial distribution for the case when p<<1 ( the average number of successes is a lot smaller than the number of trials <math>(\mu = np)</math> ).

The Gaussian distribution is accepted as one of the most likely distributions to describe measurements.

A Gaussian distribution which is normalized such that its integral is unity is refered to as the Normal distribution. You could mathematically construct a Gaussian distribution which is not normalized to unity (this is often done when fitting experimental data).

:<math>P_G(x,\mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{x -\mu}{\sigma} \right) ^2}</math> = probability of observing <math>x</math> from a Gaussian parent distribution with a mean <math>\mu</math> and standard deviation <math>\sigma</math>.

==== Half-Width <math>\Gamma</math> (a.k.a. Full Width as Half Max)====

The half width <math>\Gamma</math> is used to describe the range of <math>x</math> through which the distributions amplitude decreases to half of its maximum value.

;ie: <math>P_G(\mu \pm \frac{\Gamma}{2}, \mu, \sigma) = \frac{P_G(\mu,\mu,\sigma)}{2}</math>

;Side note:the point of steepest descent is located at <math>x \pm \sigma</math> such that

; <math>P_G(\mu \pm \sigma, \mu, \sigma) = e^{1/2} P_G(\mu,\mu,\sigma)</math>

==== Probable Error (P.E.)====

The probable error is the range of <math>x</math> in which half of the observations (values of <math>x</math>) are expected to fall.

; <math>x= \mu \pm P.E.</math>

==== Binomial with Large N becomes Gaussian====

Consider the binomial distribution in which a fair coin is tossed a large number of times (N is very large and an EVEN number N=2n)

What is the probability you get exactly <math>\frac{1}{2}N -s</math> heads and <math>\frac{1}{2}N +s</math> tails where s is an integer?

The Binomial Probability distribution is given as

:<math>P_B(x) = {N\choose x}p^{x}q^{N-x} = \frac{N!}{x!(N-x)!}p^{x}q^{N-x}</math>

p = probability of success= 1/2

q= 1-p = 1/2

N = number of trials =2n

x= number of successes=n-s

:<math>P_B(n-s) = \frac{2n!}{(n-s)!(2n-n+s)!}p^{n-s}q^{2n-n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!}p^{n-s}q^{n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{n-s} \left(\frac{1}{2}\right)^{n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{2n}</math>

Now let's cast this probability with respect to the probability that we get an even number of heads and tails by defining the following ratio R such that

:<math>R \equiv \frac{P_B(n-s)}{P_B(n)}</math>

:<math>P_B(x=n) = \frac{N!}{n!(N-n)!}p^{n}q^{N-n} = \frac{(2n)!}{n!(n)!}p^{n}q^{n} = \frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n}</math>

:<math>R = \frac{\frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{2n}}{\frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n}} = \frac{n! n!}{(n-s)! (n+s)!}</math>

Take the natural logarithm of both sides

:<math> \ln (R) = \ln \left ( \frac{n! n!}{(n-s)! (n+s)!} \right) = \ln(n!)+\ln(n!) - \ln\left[(n-s)!\right ] - \ln \left[(n+s)!\right] = 2 \ln(n!) - \ln\left [ (n-s)! \right ] - \ln \left [ (n+s)! \right ]</math>

Stirling's Approximation says
:<math>n! \sim \left (2 \pi n\right)^{1/2} n^n e^{-n}</math>
:<math>\Rightarrow </math>
;<math>\ln(n!) \sim \ln \left [ \left (2 \pi n\right)^{1/2} n^n e^{-n}\right ] = \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +\ln\left [ n^n \right ] + \ln \left [e^{-n}\right ]</math>
:<math>= \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n\ln\left [ n \right ] + (-n)</math>
:<math>= \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n(\ln\left [ n \right ] -1 )</math>

similarly

:<math>\ln\left [(n-s)! \right ] \sim \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n-1)^{1/2} \right ] + (n-s)(\ln\left [ (n-s) \right ] -1 )</math>
:<math>\ln\left [(n+s)! \right ] \sim \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n+1)^{1/2} \right ] + (n+s)(\ln\left [ (n+s) \right ] -1 )</math>

:<math>\Rightarrow \ln (R) = 2 \times \left (\ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n(\ln\left [ n \right ] -1 ) \right ) </math>
:<math>- \left ( \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n-1)^{1/2} \right ] + (n-s)(\ln\left [ (n-s) \right ] -1 )\right )</math>
:<math> -\left ( \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n+1)^{1/2} \right ] + (n+s)(\ln\left [ (n+s) \right ] -1 )\right ) </math>
: <math>= 2 \ln \left [ n^{1/2} \right ] +2 n(\ln\left [ n \right ] -1 ) - \ln \left [ (n-1)^{1/2} \right ] - (n-s)(\ln\left [ (n-s) \right ] -1 ) -\ln \left [ (n+1)^{1/2} \right ] - (n+s)(\ln\left [ (n+s) \right ] -1 )</math>

: <math>\ln \left [ n^{1/2} \right ] = \ln \left [ (n-1)^{1/2} \right ] = \ln \left [ (n+1)^{1/2} \right ]</math> For Large n

:<math> \ln (R) = 2 n(\ln\left [ n \right ] -1 ) - (n-s)(\ln\left [ (n-s) \right ] -1 ) - (n+s)(\ln\left [ (n+s) \right ] -1 )</math>
:<math> =2 n(\ln\left [ n \right ] -1 ) - (n-s)(\ln\left [ n(1-s/n) \right ] -1 ) - (n+s)(\ln\left [ n(1+s/n) \right ] -1 )</math>
: <math>= 2n \ln (n) - 2n - (n-s) \left [ \ln (n) + \ln (1-s/n) -1\right ] - (n+s) \left [ \ln (n) + \ln (1+s/n) -1\right ]</math>
: <math>= - 2n - (n-s) \left [ \ln (1-s/n) -1\right ] - (n+s) \left [ \ln (1+s/n) -1\right ]</math>
: <math>= - (n-s) \left [ \ln (1-s/n) \right ] - (n+s) \left [ \ln (1+s/n) \right ]</math>

If <math>-1 < s/n \le 1</math>

Then

: <math>\ln (1+s/n) = s/n - \frac{s^2}{2n^2} + \frac{s^3}{3 n^3} \dots</math>

<math>\Rightarrow</math>

: <math>\ln(R) =- (n-s) \left [ -s/n - \frac{s^2}{2n^2} - \frac{s^3}{3 n^3} \right ] - (n+s) \left [ s/n - \frac{s^2}{2n^2} + \frac{s^3}{3 n^3} \right ]</math>
: <math>= - \frac{s^2}{n} = - \frac{2s^2}{N}</math>

or

<math>R \sim e^{-2s^2/N}</math>

as a result

:<math>P(n-s) = R P_B(n)</math>

:<math> P_B(x=n)= \frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n} = \frac{(\left ( \left (2 \pi 2n\right)^{1/2} (2n)^{2n} e^{-2n}\right ) }{\left(\left (2 \pi n\right)^{1/2} n^n e^{-n}\right ) \left ( \left (2 \pi n\right)^{1/2} n^n e^{-n}\right)} \left(\frac{1}{2}\right)^{2n}</math>
:<math>= \left(\frac{1}{\pi n} \right )^{1/2} = \left(\frac{2}{\pi N} \right )^{1/2}</math>

<math>P(n-s) = \left(\frac{2}{\pi N} \right )^{1/2} e^{-2s^2/N}</math>

In binomial distributions

<math>\sigma^2 = Npq = \frac{N}{4}</math> for this problem

or

<math>N = 4 \sigma^2</math>

<math>P(n-s) = \left(\frac{2}{\pi 4 \sigma^2} \right )^{1/2} e^{-2s^2/N} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{2s^2}{4 \sigma^2}} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{s}{\sigma} \right) ^2}</math>
= probability of exactly <math>(\frac{N}{2} -s)</math> heads AND <math>(\frac{N}{2} +s)</math> tails after flipping the coin N times (N is and even number and s is an integer).

If we let <math>x = n-s</math> and realize that for a binomial distributions

<math>\mu = Np = N/2 = n</math>

Then

<math>P(x) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{n-x}{\sigma} \right) ^2} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right) ^2}</math>

; So when N gets big the Gaussian distribution is a good approximation to the Binomianl

==== Gaussian approximation to Poisson when <math>\mu \gg 1</math> ====

:<math>P_P(r) = \frac{\mu^r e^{-\mu}}{r!}</math> = Poisson probability distribution

substitute

<math>x \equiv r - \mu</math>

:<math>P_P(x + \mu) = \frac{\mu^{x + \mu} e^{-\mu}}{(x+\mu)!} = e^{-\mu} \frac{\mu^{\mu} \mu^x}{(\mu + x)!} = e^{-\mu} \mu^{\mu}\frac{\mu^x}{(\mu)! (\mu+1) \dots (\mu+x)}</math>
:<math> = e^{-\mu} \frac{\mu^{\mu}}{\mu!} \left [ \frac{\mu}{(\mu+1)} \cdot \frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] </math>

:<math>e^{-\mu} \frac{\mu^{\mu}}{\mu!} = e^{-\mu} \frac{\mu^{\mu}}{\sqrt{2 \pi \mu} \mu^{\mu}e^{-\mu}}= \frac{1}{\sqrt{2 \pi \mu}}</math> '''Stirling's Approximation when <math>\mu \gg 1</math>'''

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} \left [ \frac{\mu}{(\mu+1)} \cdot \frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] </math>

:<math>\left [ \frac{\mu}{(\mu+1)} \cdot\frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] = \frac{1}{1 + \frac{1}{\mu}} \cdot \frac{1}{1 + \frac{2}{\mu}} \dots \frac{1}{1 + \frac{x}{\mu}}</math>

: <math>e^{x/\mu} \approx 1 + \frac{x}{\mu}</math> : if <math>x/\mu \ll 1</math> Note:<math>x \equiv r - \mu</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} \left [ \frac{1}{1 + \frac{1}{\mu}} \cdot \frac{1}{1 + \frac{2}{\mu}} \dots \frac{1}{1 + \frac{x}{\mu}} \right ] = \frac{1}{\sqrt{2 \pi \mu}} \left [ e^{-1/\mu} \times e^{-2/\mu} \cdots e^{-x/\mu} \right ] = \frac{1}{\sqrt{2 \pi \mu}} e^{-1 \left[ \frac{1}{\mu} +\frac{2}{\mu} \cdots \frac{x}{\mu} \right ]}</math>
: <math>= \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \sum_1^x i \right ]}</math>

another mathematical identity

:<math>\sum_{i=1}^{x} i = \frac{x}{2}(1+x)</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \frac{x}{2}(1+x) \right ]}</math>

if<math> x \gg 1</math> then

:<math>\frac{x}{2}(1+x) \approx \frac{x^2}{2}</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \frac{x^2}{2} \right ]} = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-x^2}{2\mu} }</math>

In the Poisson distribution

:<math>\sigma^2 = \mu</math>

replacing dummy variable x with r - <math>\mu</math>

:<math>P_P(r) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{-(r - \mu)^2}{2\sigma^2} } =\frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{r -\mu}{\sigma} \right) ^2}</math> = Gaussian distribution when <math>\mu \gg 1</math>

==== Integral Probability (Cumulative Distribution Function)====

The Poisson and Binomial distributions are discrete probability distributions (integers).

The Gaussian distribution is our first continuous distribution as the variables are real numbers. It is not very meaningful to speak of the probability that the variate (x) assumes a specific value.

One could consider defining a probability element <math>A_G</math> which is really an integral over a finite region <math>\Delta x</math> such that

:<math>A_G(\Delta x, \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} \int_{\mu - \Delta x}^{\mu + \Delta x} e^{- \frac{1}{2} \left ( \frac{x - \mu}{\sigma}\right )^2} dx</math>

The advantage of this definition becomes apparent when you are interesting in quantifying the probability that a measurement would fall outside a range <math>\Delta x</math>.

: <math>P_G( x - \Delta x > x > x + \Delta x) = 1 - A_G(\Delta x, \mu, \sigma)</math>

The Cumulative Distribution Function (CDF), however, is defined in terms of the integral from the variates min value

:<math>CDF \equiv \int_{x_{min}}^{x} P_G( x, \mu, \sigma) = \int_{-\infty}^{x} P_G( x, \mu, \sigma) = P_G(X \le x) =</math> Probability that you measure a value less than or equal to <math>x</math>

===== discrete CDF example =====

The probability that a student fails this class is 7.3%.

What is the probability more than 5 student will fail in a class of 32 students?

Answ: <math>P_B(x\ge 5) = \sum_{x=5}^{32} P_B(x) = CDF( x \ge 5) = 1- \sum_{x=0}^4 P_B(x) = 1 - CDF(x<5) </math>
:<math>= 1 - P_B(x=0)- P_B(x=1)- P_B(x=2)- P_B(x=3)- P_B(x=4)</math>
: <math>= 1 - 0.088 - 0.223 - 0.272 - 0.214 - 0.122 = 0.92 \Rightarrow P_B(x \ge 5) = 0.08</math>= 8%

There is an 8% probability that 5 or more student will fail the class

===== 2 SD rule of thumb for Gaussian PDF =====

In the above example you calculated the probability that more than 5 student will fail a class. You can extend this principle to calculate the probability of taking a measurement which exceeds the expected mean value.

One of the more common consistency checks you can make on a sample data set which you expect to be from a Gaussian distribution is to ask how many data points appear more than 2 S.D. (<math>\sigma</math>) from the mean value.

The CDF for this is
: <math>P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \int_{-\infty}^{\mu - 2\sigma} P_G(x, \mu, \sigma) dx</math>
: <math>= \frac{1}{\sigma \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} e^{- \frac{1}{2} \left ( \frac{x - \mu}{\sigma}\right )^2} dx</math>

Let

: <math>z = \frac{x-2}{\sigma}</math>
: <math>dz = \frac{dx}{\sigma}</math>

: <math>\Rightarrow P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \frac{1}{ \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} e^{- \frac{z^2}{2} } dz</math>

The above integral can only be done numerically by expanding the exponential in a power series

:<math>e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}</math>
:<math>\Rightarrow e^{-x} = 1 -x + \frac{x^2}{2!} - \frac{x^3}{3!} \cdots</math>
:<math>\Rightarrow e^{-z^2/2} = 1 -\frac{z^2}{2}+ \frac{z^4}{8} - \frac{z^6}{48} \cdots</math>

: <math>P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \frac{1}{ \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} \left ( 1 -\frac{z^2}{2}+ \frac{z^4}{8} - \frac{z^6}{48} \cdots \right)dz</math>
:<math> = \left . \frac{1}{ \sqrt{2 \pi}} \left ( z -\frac{z^3}{6}+ \frac{z^5}{40} - \frac{z^7}{48 \times 7} \cdots \right ) \right |_{-\infty}^{\mu - 2\sigma}</math>
: <math>=\left . \frac{1}{\pi} \sum_{j=0}^{\infty} \frac{(-1)^j \left (\frac{x}{\sqrt{2}} \right)^{2j+1}}{j! (2j+1)} \right |_{x=\mu - 2\sigma}</math>

No analytical for the probability but one which you can compute.

Below is a table representing the cumulative probability <math>P_G(x< \mu - \delta \mbox{ and } x> \mu + \delta , \mu, \sigma)</math> for events to occur outside and interval of <math>\pm \delta</math> in a Gaussian distribution

{| border="1" |cellpadding="20" cellspacing="0
|-
|<math>P_G(x< \mu - \delta \mbox{ and } x> \mu + \delta , \mu, \sigma)</math> || <math>\delta</math>
|-
|<math>3.2 \times 10^{-1}</math> ||1<math>\sigma</math>
|-
|<math>4.4 \times 10^{-2}</math> ||2<math>\sigma</math>
|-
|<math>2.7 \times 10^{-3}</math> ||3<math>\sigma</math>
|-
|<math>6.3 \times 10^{-5}</math> ||4<math>\sigma</math>
|}

[[File:TF_Error_CDF_Gauss.png| 400 px]]

===Cauchy/Lorentzian/Breit-Wigner Distribution===
In Mathematics, the Cauchy distribution is written as
:<math>P_{CL}(x, x_0, \Gamma) = \frac{1}{\pi} \frac{\Gamma/2}{(x -x_0)^2 + (\Gamma/2)^2}</math> = Cauchy-Lorentian Distribution

:Note; The probability does not fall as rapidly to zero as the Gaussian. As a result, the Gaussian's central peak contributes more to the area than the Lorentzian's.

This distribution happens to be a solution to physics problems involving forced resonances (spring systems driven by a source, or a nuclear interaction which induces a metastable state).

:<math>P_{BW} = \sigma(E)= \frac{1}{2\pi}\frac{\Gamma}{(E-E_0)^2 + (\Gamma/2)^2}</math> = Breit-Wigner distribution

:<math>E_0 =</math> mass resonance
:<math>\Gamma = </math>FWHM
: <math>\Delta E \Delta t = \Gamma \tau = \frac{h}{2 \pi}</math> = uncertainty principle
:<math>\tau=</math>lifetime of resonance/intermediate state particle

A Beit-Wigner function fit to cross section measured as a function of energy will allow one to evaluate the rate increases that are produced when the probing energy excites a resonant state that has a mass <math>E_0</math> and lasts for the time <math>\tau</math> derived from the Half Width <math>\Gamma</math>.

==== mean====

Mean is not defined

Mode = Median = <math>x_0</math> or <math>E_0</math>

==== Variance ====

The variance is also not defined but rather the distribution is parameterized in terms of the Half Width <math>\Gamma</math>

Let
:<math>z = \frac{x-\mu}{\Gamma/2}</math>

Then

:<math>\sigma^2 = \frac{\Gamma^2}{4\pi} \int_{-\infty}^{\infty} \frac{z^2}{1+z^2} dz</math>

The above integral does not converge for large deviations <math>(x -\mu)</math> . The width of the distribution is instead characterized by <math>\Gamma</math> = FWHM

===Landau===

:<math>P_L(x) = \frac{1}{2 \pi i} \int_{c-i\infty}^{c+i\infty}\! e^{s \log s + x s}\, ds </math>
where <math>c</math> is any positive real number.

To simplify computation it is more convenient to use the equivalent expression

:<math>P_L(x) = \frac{1}{\pi} \int_0^\infty\! e^{-t \log t - x t} \sin(\pi t)\, dt.</math>

The above distribution was derived by Landau (L. Landau, "On the Energy Loss of Fast Particles by Ionization", J. Phys., vol 8 (1944), pg 201 ) to describe the energy loss by particles traveling through thin material ( materials with a thickness on the order of a few radiation lengths).

Bethe-Bloch derived an expression to determine the AVERAGE amount of energy lost by a particle traversing a given material <math>(\frac{dE}{dx})</math> assuming several collisions which span the physical limits of the interaction.

For the case a thin absorbers, the number of collisions is so small that the central limit theorem used to average over several collision doesn't apply and there is a finite possibility of observing large energy losses.

As a result one would expect a distribution which is Gaussian like but with a "tail" on the <math>\mu + \sigma</math> side of the distribution.

===Gamma===

:<math> P_{\gamma}(x,k,\theta) = x^{k-1} \frac{e^{-x/\theta}}{\theta^k \, \Gamma(k)}\text{ for } x > 0\text{ and }k, \theta > 0.\,</math>

where

::<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The distribution is used for "waiting time" models. How long do you need to wait for a rain storm, how long do you need to wait to die,...

Climatologists use this for predicting how rain fluctuates from season to season.

If <math>k =</math> integer then the above distribution is a sum of <math>k</math> independent exponential distributions

:<math> P_{\gamma}(x,k,\theta) = 1 - e^{-x/\theta} \sum_{j=0}^{k-1}\frac{1}{j!}\left ( \frac{x}{\theta}\right)^j </math>

==== Mean====

:<math>\mu = k \theta</math>

====Variance====

:<math>\sigma^2 = k \theta^2</math>

====Properties====

:<math>\lim_{X \rightarrow \infty} P_{\gamma}(x,k,\theta) = \left \{ {\infty \;\;\;\; k <1 \atop 0 \;\;\;\; k>1} \right .</math>
: <math>= \frac{1}{\theta} \;\; \mbox{if} k=1</math>

===Beta===
:<math> P_{\beta}(x;\alpha,\beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\int_0^1 u^{\alpha-1} (1-u)^{\beta-1}\, du} \!</math>

::<math>= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\, x^{\alpha-1}(1-x)^{\beta-1}\!</math>

::<math>= \frac{1}{\mathrm{B}(\alpha,\beta)}\, x
^{\alpha-1}(1-x)^{\beta-1}\!</math>

====Mean====

:<math>\mu = \frac{\alpha}{\alpha + \beta}</math>

==== Variance====

:<math>\sigma^2 = \frac{\alpha \beta }{(\alpha + \beta)^2 (\alpha + \beta + 1)}</math>

===Exponential===

The exponential distribution may be used to describe the processes that are in between Binomial and Poisson (exponential decay)

:<math> P_{e}(x,\lambda) = \left \{ {\lambda e^{-\lambda x} \;\;\;\; x \ge 0\atop 0 \;\;\;\; x<0} \right .</math>

:<math> CDF_{e}(x,\lambda) = \left \{ {\lambda 1-e^{-\lambda x} \;\;\;\; x \ge 0\atop 0 \;\;\;\; x<0} \right .</math>

==== Mean====

:<math>\mu = \frac{1}{\lambda}</math>

==== Variance ====

:<math>\sigma = \frac{1}{\lambda^2}</math>

== Skewness and Kurtosis==

Distributions may also be characterized by how they look in terms of Skewness and Kurtosis

=== Skewness===

Measures the symmetry of the distribution

Skewness = <math>\frac{\sum (x_i - \bar{x})^3}{(N-1)s^3} = \frac{\mbox{3rd moment}}{\mbox{2nd moment}}</math>

where
:<math>s^2 = \frac{\sum (x_i - \bar{x})^2}{N-1}</math>

;The higher the number the more asymmetric (or skewed) the distribution is. The closer to zero the more symmetric.

A negative skewness indicates a tail on the left side of the distribution.
Positive skewness indicates a tail on the right.

===Kurtosis===

Measures the "pointyness" of the distribution

Kurtosis = <math>\frac{\sum (x_i - \bar{x})^4}{(N-1)s^4}</math>

where
:<math>s^2 = \frac{\sum (x_i - \bar{x})^2}{N-1}</math>

K=3 for Normal Distribution

In ROOT the Kurtosis entry in the statistics box is really the "excess kurtosis" which is the subtraction of the kurtosis by 3

Excess Kurtosis = <math>\frac{\sum (x_i - \bar{x})^4}{(N-1)s^4} - 3</math>

In this case a Positive excess Kurtosis will indicate a peak that is sharper than a gaussian while a negative value will indicate a peak that is flatter than a comparable Gaussian distribution.

[[File:ForeErrAna_Gaus-Cauchy_SkeKurt.gif|200 px]][[File:ForeErrAna_Gaus-Landau_SkeKurt.gif|200 px]][[File:ForeErrAna_Gaus-gamma_SkeKurt.gif|200 px]]

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrorAna PropOfErr

2014-02-28T20:45:09Z

Stocjas2: /* Example: Table Area */

=Taylor Expansion=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

The Taylor series expansion of a function f(x) about the point a is given as

<math>f(x) = f(a) + \left . f^{\prime}(x)\right |_{x=a} \frac{x-a}{1!} + \left . f^{\prime \prime}(x)\right |_{x=a} \frac{(x-a)^2}{2!} + ...</math>
;<math>= \left . \sum_{n=0}^{\infty} f^{(n)}(x)\right |_{x=a} \frac{(x-a)^n}{n!}</math>

For small values of x (x << 1) we can expand the function about 0 such that

<math>\sqrt{1+x} = \left . \sqrt{1-0} \frac{1}{2}(1+x)^{-1/2}\right |_{x=0} \frac{x^1}{1!}+ \left . \frac{1}{2}\frac{-1}{2}(1+x)^{-3/2} \right |_{x=0} \frac{x^2}{2!}</math>
;<math>=1 + \frac{x}{2} - \frac{x^2}{8}</math>

The taylor expansion of a function with two variables<math> (x_1 , x_2)</math> about the average of the two variables<math> (\bar {x_1} , \bar{x_2} )</math> is given by

<math>f(x, y)=f(\bar {x}, \bar{x})+(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{x})}</math>

or

<math>f(x, y)-f(\bar {x}, \bar{y})=(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{y})}</math>

The average

:<math>f(\bar {x}, \bar{y}) \equiv \frac{\sum f(x,y)_i}{N}</math>

The term

<math>\delta f = f(x, y)-f(\bar {x}, \bar{y})</math>

represents a small fluctuation <math>(\delta f)</math> of the function <math>f</math> from its average <math>f(\bar {x}, \bar{y})</math> if we ignore higher order terms in the Taylor expansion ( this means the fluctuations are small)then we can write the variance using the definition as

:<math>\sigma^2 = \frac{\sum \left [ f(x,y)_i - f(\bar {x}, \bar{y})\right ]^2}{N}</math>
: <math>= \frac{\sum \left [(x_i-\bar {x}) \frac{\partial f}{\partial x}+(y_i-\bar{y}) \frac{\partial f}{\partial y}\right ]^2}{N}</math>
:<math>= \frac{\sum (x_i-\bar {x})^2 \left ( \frac{\partial f}{\partial x}\right )^2}{N} + \frac{\sum (y_i-\bar {y})^2 \left ( \frac{\partial f}{\partial y}\right )^2}{N} + 2 \frac{\sum (x_i-\bar {x}) \left ( \frac{\partial f}{\partial x} \right ) (y_i-\bar {y}) \left ( \frac{\partial f}{\partial y}\right )}{N} </math>
:<math>\sigma^2 = \sigma_x^2 \left ( \frac{\partial f}{\partial x}\right )^2 + \sigma_y^2\left ( \frac{\partial f}{\partial y}\right )^2 + 2 \sigma_{x,y}^2 \left ( \frac{\partial f}{\partial x} \right ) \left ( \frac{\partial f}{\partial y}\right ) </math>

where

:<math>\sigma_{x,y}^2 = \frac{\sum (x_i-\bar {x}) (y_i-\bar {y}) }{N} \equiv</math> Covariance

The above can be reproduced for functions with multiple variables.

=Instrumental and Statistical Uncertainties=

http://www.physics.uoguelph.ca/~reception/2440/StatsErrorsJuly26-06.pdf
==Counting Experiment Example==

The table below reports 8 measurements of the coincidence rate observed by two scintillators detecting cosmic rays. The scintillator are place a distance (x) away from each other in order to detect cosmic rays falling on the earth's surface. The time and observed coincidence counts are reported in separate columns as well as the angle made by the normal to the detector with the earths surface.

{| border="5"
! Date || Time (hrs) || <math>\theta</math> ||Coincidence Counts || Mean Coinc/Hr || <math>\sigma_{Poisson} = \sqrt{\mbox{Mean Counts/Hr}}</math> || <math>\left | \sigma \right |</math> from Mean
|-
|9/12/07 || 20.5 || 30|| 2233 || 109 || 10.4 ||1
|-
|9/14/07 || 21 || 30 || 1582 || 75 || 8.7||2
|-
|10/3/07 || 21 || 30 || 2282 || 100 || 10.4||1
|-
|10/4/07 || 21 || 30 || 2029 || 97 || 9.8|| 0.1
|-
|10/15/07 || 21 || 30 || 2180 || 100 || 10|| 0.6
|-
|10/18/07 || 21 || 30 || 2064 || 99 || 9.9||0.1
|-
| 10/23/07 || 21 || 30 || 2003 || 95 || 9.8||0.2
|-
| 10/26/07 || 21 || 30 || 1943 || 93 || 9.6 || 0.5
|}

The average count rate for a given trial is given in the 5th column by diving column 4 by column 2.

One can expect a Poisson parent distribution because the probability of a cosmic ray interacting with the scintillator is low. The variance of measurement in each trial is related to the counting rate by

: <math>\sigma^2 = \mu =</math> average counting rate

as a result of the assumption that the parent distribution is Poisson. The value of this <math>\sigma</math> is shown in column 6.

; Is the Poisson distribution the parent distribution in this experiment?

To try and answer the above question lets determine the mean and variance of the data:

:<math>\bar{x} =\frac{\sum CPM_i}{8} = 96.44</math>

:<math>s = \sqrt{\frac{\sum (x_i-\mu)^2}{8-1}} = 10.8</math>

If you approximate the Poisson distribution by a Gaussian then the probability any one measurement is within 1 <math>\sigma</math> of the mean is 68% = Probability that a measurement of a Gaussian variant will lie within 1 <math>\sigma</math> of the mean. For the Poisson distribution with a mean of 97 you would have 66% of the data occur within 1 <math>\sigma = \sqrt{97}</math>.

<pre>
root [26] ROOT::Math::poisson_cdf(97-sqrt(97),97)
(double)1.67580969302001004e-01
root [30] 1-2*ROOT::Math::poisson_cdf(97-sqrt(97),97)
(const double)6.64838061395997992e-01

root [28] ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(double)1.58655253931457185e-01
root [29] 1-2*ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(const double)6.82689492137085630e-01

</pre>

The 7th column above identifies how many sigma the mean of that trial is from the average <math>\bar{x}</math>.

: <math>= 0.68 * 8 = 5</math>

Looks like we have 7/8 events within 1<math> \sigma</math> = 87.5%

How about the average sigma assuming poisson?

If you take the average of sigma estimate in column 6 you would get
:<math>\overline{\sigma(Poisson)} = \frac{\sum \sigma_i(Poisson)}{8} = 9.86</math>

Using this one can calculate the variance of the variance as

:<math>\frac{\sum \left ( \sigma_i(Poisson) - \overline{\sigma(Poisson)}\right)^2}{8-1} = (0.56)^2</math>

comparing the <math>\sigma</math> from the 8 trials to the <math>\sigma</math> from the Poisson estimate you have

: <math>10.9 = 9.9 \pm 0.56</math> In agreement within 2 <math>\sigma</math>

What is really required however is an estimate of the probability that the assumption of a Poisson distribution is correct (Hypothesis test). This will be the subject of future sections.

=== Error Propagation===

:<math>f = \bar{x} = \frac{\sum x_i}{N}</math>

:<math>\frac{\partial f}{\partial x_i} = \frac{1}{N}</math>
: <math>\delta f = \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n}</math>
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2</math>
: <math>= \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 + 2 \left ( \frac{\partial^2 f}{\partial x_1\partial x_2} \right) \sigma_{x_1}\sigma_{x_2} + \cdots</math>

:<math>\frac{\partial^2 f}{\partial x_i\partial x_j} = \frac{\partial }{\partial x_j} \frac{1}{N} = 0 \Rightarrow</math> no Covariances
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 </math>
: <math> = \left ( \frac{1}{N} \sigma_{x_1} \right )^2 + \left ( \frac{1}{N}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{1}{N}\sigma_{x_n} \right )^2 </math>

If
:<math> \sigma_i = \sigma</math>

Then

: <math>\left ( \delta f \right)^2 = \left ( \frac{1}{N} \sigma \right )^2 + \left ( \frac{1}{N}\sigma \right )^2 + \cdots \left ( \frac{1}{N}\sigma \right )^2 </math>
: <math>=\frac{ \sigma^2}{N}</math>

;Does this mean that we get an infinitely precise measurement if <math>N \rightarrow \infty</math>?
: No! In reality there are systematic errors in every experiment so the best you can do is reduce your statistical precision to a point where the systematic errors dominate. There is also the observation that in practice it is difficult to find an experiment absent of "non-statistical fluctuations".

=Example: Table Area=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a Tayler expansion

Consider a calculation of a Table's Area

<math>A= L \times W</math>

The mean that the Area (A) is a function of the Length (L) and the Width (W) of the table.

<math>A = f(L,W)</math>

We can write the variance of the area

:<math>\sigma^2_A = \frac{\sum_{i=1}^{i=N} (A_i - \bar{A})^2}{N}</math>

:<math>=\frac{ \sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} + (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \right ] ^2}{N} + \frac{\sum_{i=1}^{i=N} \left [ (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2 }{N}</math>

:<math>+2 \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right]^2}{N} </math>

: <math>= \sigma^2_L \left ( \frac{\partial A}{\partial L} \right )^2 +\sigma^2_W \left ( \frac{\partial A}{\partial W} \right )^2 + 2 \sigma^2_{LW} \frac{\partial A}{\partial L} \frac{\partial A}{\partial W} </math>

where
<math>\sigma^2_{LW} = \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \right ]^2}{N}</math> is defined as the '''Covariance''' between <math>L</math> and <math>W</math>.

= Weighted Mean and variance =

The variance <math>(\sigma)</math> in the above examples was assumed to be the same for all measurement from the parent distribution.

What happens when you wish to combine measurements with unequal variances (different experiments measuring the same quantity)?

== Weighted Mean==

Let's assume we have a measured quantity having a mean <math> X</math> from a Gaussian parent distribution.

If you attempt to measure X with several different experiments you will likely have a series of results which vary in their precision.

Lets assume you have 2 experiments which obtained the averages <math>X_A</math> and <math>X_B</math>.

If we assume that each measurement is governed by a Gaussian distribution,

Then the probability of one experiment observing the value X_A is given by

:<math>P(x=X_A) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A}</math>

similarly the probability of the other experiment observing the average X_B is

:<math>P(x=X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B}</math>

Now the combined probability that the first experiment measures the average <math>X_A</math> and the second <math>X_B</math> is given as the product of the two probabilities suth that

:<math>P(x=X_A,X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A} \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B} = \frac{e^{-\frac{1}{2}\left [ \left ( \frac{X_A-x}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2\right ]}}{\sigma_A \sigma_B}\equiv \frac{e^{-\frac{1}{2}\left [ \chi^2\right ]}}{\sigma_A \sigma_B}</math>

where

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2</math>

;The principle of maximum likelihood (to be the cornerstone of hypothesis testing) may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Applying this principle to the two experiments means that the best estimate of <math>X</math> is made when <math>P(x=X_A,X_B)</math> is a maximum which occurs when

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2 = </math>Minimum

or

: <math>\frac{\partial \chi^2}{\partial X} =2 \left ( \frac{X_A-X}{\sigma_A^2}\right )(-1)+2 \left ( \frac{X_B-X}{\sigma_B^2}\right )(-1)= 0</math>

:<math>\Rightarrow X = \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}</math>

If each observable (<math>x_i</math>) is accompanied by an estimate of the uncertainty in that observable (<math>\sigma_i</math>) then the weighted mean is defined as

:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\sigma_i^2}}{\sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}}</math>

==Weighted Variance==

To determine the variance of the measurements you should follow the Taylor series based prescription denoted above in that

:<math>\sigma^2 = \sum \sigma_i^2 \left ( \frac{\partial X}{\partial X_i}\right)^2 = \sigma_A^2\left ( \frac{\partial X}{\partial X_A}\right)^2 + \sigma_B^2\left ( \frac{\partial X}{\partial X_B}\right)^2</math> : Assuming no covariance

:<math>\frac{\partial X}{\partial X_A} = \frac{\partial}{\partial X_A} \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} = \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} </math>

: <math>\sigma^2 =\sigma_A^2 \left ( \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2 + \sigma_B^2 \left ( \frac{\frac{1}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2</math>
: <math>= \frac{\frac{1}{\sigma_A^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2} + \frac{\frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{1}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})}</math>

The variance of the distribution is defined as

:<math>\frac{1}{\sigma^2} = \sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}</math> = weighted variance

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrorAna PropOfErr

2014-02-28T20:44:25Z

Stocjas2: /* Example: Table Area */

=Taylor Expansion=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

The Taylor series expansion of a function f(x) about the point a is given as

<math>f(x) = f(a) + \left . f^{\prime}(x)\right |_{x=a} \frac{x-a}{1!} + \left . f^{\prime \prime}(x)\right |_{x=a} \frac{(x-a)^2}{2!} + ...</math>
;<math>= \left . \sum_{n=0}^{\infty} f^{(n)}(x)\right |_{x=a} \frac{(x-a)^n}{n!}</math>

For small values of x (x << 1) we can expand the function about 0 such that

<math>\sqrt{1+x} = \left . \sqrt{1-0} \frac{1}{2}(1+x)^{-1/2}\right |_{x=0} \frac{x^1}{1!}+ \left . \frac{1}{2}\frac{-1}{2}(1+x)^{-3/2} \right |_{x=0} \frac{x^2}{2!}</math>
;<math>=1 + \frac{x}{2} - \frac{x^2}{8}</math>

The taylor expansion of a function with two variables<math> (x_1 , x_2)</math> about the average of the two variables<math> (\bar {x_1} , \bar{x_2} )</math> is given by

<math>f(x, y)=f(\bar {x}, \bar{x})+(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{x})}</math>

or

<math>f(x, y)-f(\bar {x}, \bar{y})=(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{y})}</math>

The average

:<math>f(\bar {x}, \bar{y}) \equiv \frac{\sum f(x,y)_i}{N}</math>

The term

<math>\delta f = f(x, y)-f(\bar {x}, \bar{y})</math>

represents a small fluctuation <math>(\delta f)</math> of the function <math>f</math> from its average <math>f(\bar {x}, \bar{y})</math> if we ignore higher order terms in the Taylor expansion ( this means the fluctuations are small)then we can write the variance using the definition as

:<math>\sigma^2 = \frac{\sum \left [ f(x,y)_i - f(\bar {x}, \bar{y})\right ]^2}{N}</math>
: <math>= \frac{\sum \left [(x_i-\bar {x}) \frac{\partial f}{\partial x}+(y_i-\bar{y}) \frac{\partial f}{\partial y}\right ]^2}{N}</math>
:<math>= \frac{\sum (x_i-\bar {x})^2 \left ( \frac{\partial f}{\partial x}\right )^2}{N} + \frac{\sum (y_i-\bar {y})^2 \left ( \frac{\partial f}{\partial y}\right )^2}{N} + 2 \frac{\sum (x_i-\bar {x}) \left ( \frac{\partial f}{\partial x} \right ) (y_i-\bar {y}) \left ( \frac{\partial f}{\partial y}\right )}{N} </math>
:<math>\sigma^2 = \sigma_x^2 \left ( \frac{\partial f}{\partial x}\right )^2 + \sigma_y^2\left ( \frac{\partial f}{\partial y}\right )^2 + 2 \sigma_{x,y}^2 \left ( \frac{\partial f}{\partial x} \right ) \left ( \frac{\partial f}{\partial y}\right ) </math>

where

:<math>\sigma_{x,y}^2 = \frac{\sum (x_i-\bar {x}) (y_i-\bar {y}) }{N} \equiv</math> Covariance

The above can be reproduced for functions with multiple variables.

=Instrumental and Statistical Uncertainties=

http://www.physics.uoguelph.ca/~reception/2440/StatsErrorsJuly26-06.pdf
==Counting Experiment Example==

The table below reports 8 measurements of the coincidence rate observed by two scintillators detecting cosmic rays. The scintillator are place a distance (x) away from each other in order to detect cosmic rays falling on the earth's surface. The time and observed coincidence counts are reported in separate columns as well as the angle made by the normal to the detector with the earths surface.

{| border="5"
! Date || Time (hrs) || <math>\theta</math> ||Coincidence Counts || Mean Coinc/Hr || <math>\sigma_{Poisson} = \sqrt{\mbox{Mean Counts/Hr}}</math> || <math>\left | \sigma \right |</math> from Mean
|-
|9/12/07 || 20.5 || 30|| 2233 || 109 || 10.4 ||1
|-
|9/14/07 || 21 || 30 || 1582 || 75 || 8.7||2
|-
|10/3/07 || 21 || 30 || 2282 || 100 || 10.4||1
|-
|10/4/07 || 21 || 30 || 2029 || 97 || 9.8|| 0.1
|-
|10/15/07 || 21 || 30 || 2180 || 100 || 10|| 0.6
|-
|10/18/07 || 21 || 30 || 2064 || 99 || 9.9||0.1
|-
| 10/23/07 || 21 || 30 || 2003 || 95 || 9.8||0.2
|-
| 10/26/07 || 21 || 30 || 1943 || 93 || 9.6 || 0.5
|}

The average count rate for a given trial is given in the 5th column by diving column 4 by column 2.

One can expect a Poisson parent distribution because the probability of a cosmic ray interacting with the scintillator is low. The variance of measurement in each trial is related to the counting rate by

: <math>\sigma^2 = \mu =</math> average counting rate

as a result of the assumption that the parent distribution is Poisson. The value of this <math>\sigma</math> is shown in column 6.

; Is the Poisson distribution the parent distribution in this experiment?

To try and answer the above question lets determine the mean and variance of the data:

:<math>\bar{x} =\frac{\sum CPM_i}{8} = 96.44</math>

:<math>s = \sqrt{\frac{\sum (x_i-\mu)^2}{8-1}} = 10.8</math>

If you approximate the Poisson distribution by a Gaussian then the probability any one measurement is within 1 <math>\sigma</math> of the mean is 68% = Probability that a measurement of a Gaussian variant will lie within 1 <math>\sigma</math> of the mean. For the Poisson distribution with a mean of 97 you would have 66% of the data occur within 1 <math>\sigma = \sqrt{97}</math>.

<pre>
root [26] ROOT::Math::poisson_cdf(97-sqrt(97),97)
(double)1.67580969302001004e-01
root [30] 1-2*ROOT::Math::poisson_cdf(97-sqrt(97),97)
(const double)6.64838061395997992e-01

root [28] ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(double)1.58655253931457185e-01
root [29] 1-2*ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(const double)6.82689492137085630e-01

</pre>

The 7th column above identifies how many sigma the mean of that trial is from the average <math>\bar{x}</math>.

: <math>= 0.68 * 8 = 5</math>

Looks like we have 7/8 events within 1<math> \sigma</math> = 87.5%

How about the average sigma assuming poisson?

If you take the average of sigma estimate in column 6 you would get
:<math>\overline{\sigma(Poisson)} = \frac{\sum \sigma_i(Poisson)}{8} = 9.86</math>

Using this one can calculate the variance of the variance as

:<math>\frac{\sum \left ( \sigma_i(Poisson) - \overline{\sigma(Poisson)}\right)^2}{8-1} = (0.56)^2</math>

comparing the <math>\sigma</math> from the 8 trials to the <math>\sigma</math> from the Poisson estimate you have

: <math>10.9 = 9.9 \pm 0.56</math> In agreement within 2 <math>\sigma</math>

What is really required however is an estimate of the probability that the assumption of a Poisson distribution is correct (Hypothesis test). This will be the subject of future sections.

=== Error Propagation===

:<math>f = \bar{x} = \frac{\sum x_i}{N}</math>

:<math>\frac{\partial f}{\partial x_i} = \frac{1}{N}</math>
: <math>\delta f = \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n}</math>
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2</math>
: <math>= \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 + 2 \left ( \frac{\partial^2 f}{\partial x_1\partial x_2} \right) \sigma_{x_1}\sigma_{x_2} + \cdots</math>

:<math>\frac{\partial^2 f}{\partial x_i\partial x_j} = \frac{\partial }{\partial x_j} \frac{1}{N} = 0 \Rightarrow</math> no Covariances
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 </math>
: <math> = \left ( \frac{1}{N} \sigma_{x_1} \right )^2 + \left ( \frac{1}{N}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{1}{N}\sigma_{x_n} \right )^2 </math>

If
:<math> \sigma_i = \sigma</math>

Then

: <math>\left ( \delta f \right)^2 = \left ( \frac{1}{N} \sigma \right )^2 + \left ( \frac{1}{N}\sigma \right )^2 + \cdots \left ( \frac{1}{N}\sigma \right )^2 </math>
: <math>=\frac{ \sigma^2}{N}</math>

;Does this mean that we get an infinitely precise measurement if <math>N \rightarrow \infty</math>?
: No! In reality there are systematic errors in every experiment so the best you can do is reduce your statistical precision to a point where the systematic errors dominate. There is also the observation that in practice it is difficult to find an experiment absent of "non-statistical fluctuations".

=Example: Table Area=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a Tayler expansion

Consider a calculation of a Table's Area

<math>A= L \times W</math>

The mean that the Area (A) is a function of the Length (L) and the Width (W) of the table.

<math>A = f(L,W)</math>

We can write the variance of the area

:<math>\sigma^2_A = \frac{\sum_{i=1}^{i=N} (A_i - \bar{A})^2}{N}</math>

:<math>= \sum_{i=1}^{i=N}\frac{ \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} + (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \right ] ^2}{N} + \frac{\sum_{i=1}^{i=N} \left [ (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2 }{N}</math>

:<math>+2 \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right]^2}{N} </math>

: <math>= \sigma^2_L \left ( \frac{\partial A}{\partial L} \right )^2 +\sigma^2_W \left ( \frac{\partial A}{\partial W} \right )^2 + 2 \sigma^2_{LW} \frac{\partial A}{\partial L} \frac{\partial A}{\partial W} </math>

where
<math>\sigma^2_{LW} = \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \right ]^2}{N}</math> is defined as the '''Covariance''' between <math>L</math> and <math>W</math>.

= Weighted Mean and variance =

The variance <math>(\sigma)</math> in the above examples was assumed to be the same for all measurement from the parent distribution.

What happens when you wish to combine measurements with unequal variances (different experiments measuring the same quantity)?

== Weighted Mean==

Let's assume we have a measured quantity having a mean <math> X</math> from a Gaussian parent distribution.

If you attempt to measure X with several different experiments you will likely have a series of results which vary in their precision.

Lets assume you have 2 experiments which obtained the averages <math>X_A</math> and <math>X_B</math>.

If we assume that each measurement is governed by a Gaussian distribution,

Then the probability of one experiment observing the value X_A is given by

:<math>P(x=X_A) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A}</math>

similarly the probability of the other experiment observing the average X_B is

:<math>P(x=X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B}</math>

Now the combined probability that the first experiment measures the average <math>X_A</math> and the second <math>X_B</math> is given as the product of the two probabilities suth that

:<math>P(x=X_A,X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A} \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B} = \frac{e^{-\frac{1}{2}\left [ \left ( \frac{X_A-x}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2\right ]}}{\sigma_A \sigma_B}\equiv \frac{e^{-\frac{1}{2}\left [ \chi^2\right ]}}{\sigma_A \sigma_B}</math>

where

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2</math>

;The principle of maximum likelihood (to be the cornerstone of hypothesis testing) may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Applying this principle to the two experiments means that the best estimate of <math>X</math> is made when <math>P(x=X_A,X_B)</math> is a maximum which occurs when

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2 = </math>Minimum

or

: <math>\frac{\partial \chi^2}{\partial X} =2 \left ( \frac{X_A-X}{\sigma_A^2}\right )(-1)+2 \left ( \frac{X_B-X}{\sigma_B^2}\right )(-1)= 0</math>

:<math>\Rightarrow X = \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}</math>

If each observable (<math>x_i</math>) is accompanied by an estimate of the uncertainty in that observable (<math>\sigma_i</math>) then the weighted mean is defined as

:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\sigma_i^2}}{\sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}}</math>

==Weighted Variance==

To determine the variance of the measurements you should follow the Taylor series based prescription denoted above in that

:<math>\sigma^2 = \sum \sigma_i^2 \left ( \frac{\partial X}{\partial X_i}\right)^2 = \sigma_A^2\left ( \frac{\partial X}{\partial X_A}\right)^2 + \sigma_B^2\left ( \frac{\partial X}{\partial X_B}\right)^2</math> : Assuming no covariance

:<math>\frac{\partial X}{\partial X_A} = \frac{\partial}{\partial X_A} \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} = \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} </math>

: <math>\sigma^2 =\sigma_A^2 \left ( \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2 + \sigma_B^2 \left ( \frac{\frac{1}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2</math>
: <math>= \frac{\frac{1}{\sigma_A^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2} + \frac{\frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{1}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})}</math>

The variance of the distribution is defined as

:<math>\frac{1}{\sigma^2} = \sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}</math> = weighted variance

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrorAna PropOfErr

2014-02-28T20:43:52Z

Stocjas2: /* Example: Table Area */

=Taylor Expansion=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

The Taylor series expansion of a function f(x) about the point a is given as

<math>f(x) = f(a) + \left . f^{\prime}(x)\right |_{x=a} \frac{x-a}{1!} + \left . f^{\prime \prime}(x)\right |_{x=a} \frac{(x-a)^2}{2!} + ...</math>
;<math>= \left . \sum_{n=0}^{\infty} f^{(n)}(x)\right |_{x=a} \frac{(x-a)^n}{n!}</math>

For small values of x (x << 1) we can expand the function about 0 such that

<math>\sqrt{1+x} = \left . \sqrt{1-0} \frac{1}{2}(1+x)^{-1/2}\right |_{x=0} \frac{x^1}{1!}+ \left . \frac{1}{2}\frac{-1}{2}(1+x)^{-3/2} \right |_{x=0} \frac{x^2}{2!}</math>
;<math>=1 + \frac{x}{2} - \frac{x^2}{8}</math>

The taylor expansion of a function with two variables<math> (x_1 , x_2)</math> about the average of the two variables<math> (\bar {x_1} , \bar{x_2} )</math> is given by

<math>f(x, y)=f(\bar {x}, \bar{x})+(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{x})}</math>

or

<math>f(x, y)-f(\bar {x}, \bar{y})=(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{y})}</math>

The average

:<math>f(\bar {x}, \bar{y}) \equiv \frac{\sum f(x,y)_i}{N}</math>

The term

<math>\delta f = f(x, y)-f(\bar {x}, \bar{y})</math>

represents a small fluctuation <math>(\delta f)</math> of the function <math>f</math> from its average <math>f(\bar {x}, \bar{y})</math> if we ignore higher order terms in the Taylor expansion ( this means the fluctuations are small)then we can write the variance using the definition as

:<math>\sigma^2 = \frac{\sum \left [ f(x,y)_i - f(\bar {x}, \bar{y})\right ]^2}{N}</math>
: <math>= \frac{\sum \left [(x_i-\bar {x}) \frac{\partial f}{\partial x}+(y_i-\bar{y}) \frac{\partial f}{\partial y}\right ]^2}{N}</math>
:<math>= \frac{\sum (x_i-\bar {x})^2 \left ( \frac{\partial f}{\partial x}\right )^2}{N} + \frac{\sum (y_i-\bar {y})^2 \left ( \frac{\partial f}{\partial y}\right )^2}{N} + 2 \frac{\sum (x_i-\bar {x}) \left ( \frac{\partial f}{\partial x} \right ) (y_i-\bar {y}) \left ( \frac{\partial f}{\partial y}\right )}{N} </math>
:<math>\sigma^2 = \sigma_x^2 \left ( \frac{\partial f}{\partial x}\right )^2 + \sigma_y^2\left ( \frac{\partial f}{\partial y}\right )^2 + 2 \sigma_{x,y}^2 \left ( \frac{\partial f}{\partial x} \right ) \left ( \frac{\partial f}{\partial y}\right ) </math>

where

:<math>\sigma_{x,y}^2 = \frac{\sum (x_i-\bar {x}) (y_i-\bar {y}) }{N} \equiv</math> Covariance

The above can be reproduced for functions with multiple variables.

=Instrumental and Statistical Uncertainties=

http://www.physics.uoguelph.ca/~reception/2440/StatsErrorsJuly26-06.pdf
==Counting Experiment Example==

The table below reports 8 measurements of the coincidence rate observed by two scintillators detecting cosmic rays. The scintillator are place a distance (x) away from each other in order to detect cosmic rays falling on the earth's surface. The time and observed coincidence counts are reported in separate columns as well as the angle made by the normal to the detector with the earths surface.

{| border="5"
! Date || Time (hrs) || <math>\theta</math> ||Coincidence Counts || Mean Coinc/Hr || <math>\sigma_{Poisson} = \sqrt{\mbox{Mean Counts/Hr}}</math> || <math>\left | \sigma \right |</math> from Mean
|-
|9/12/07 || 20.5 || 30|| 2233 || 109 || 10.4 ||1
|-
|9/14/07 || 21 || 30 || 1582 || 75 || 8.7||2
|-
|10/3/07 || 21 || 30 || 2282 || 100 || 10.4||1
|-
|10/4/07 || 21 || 30 || 2029 || 97 || 9.8|| 0.1
|-
|10/15/07 || 21 || 30 || 2180 || 100 || 10|| 0.6
|-
|10/18/07 || 21 || 30 || 2064 || 99 || 9.9||0.1
|-
| 10/23/07 || 21 || 30 || 2003 || 95 || 9.8||0.2
|-
| 10/26/07 || 21 || 30 || 1943 || 93 || 9.6 || 0.5
|}

The average count rate for a given trial is given in the 5th column by diving column 4 by column 2.

One can expect a Poisson parent distribution because the probability of a cosmic ray interacting with the scintillator is low. The variance of measurement in each trial is related to the counting rate by

: <math>\sigma^2 = \mu =</math> average counting rate

as a result of the assumption that the parent distribution is Poisson. The value of this <math>\sigma</math> is shown in column 6.

; Is the Poisson distribution the parent distribution in this experiment?

To try and answer the above question lets determine the mean and variance of the data:

:<math>\bar{x} =\frac{\sum CPM_i}{8} = 96.44</math>

:<math>s = \sqrt{\frac{\sum (x_i-\mu)^2}{8-1}} = 10.8</math>

If you approximate the Poisson distribution by a Gaussian then the probability any one measurement is within 1 <math>\sigma</math> of the mean is 68% = Probability that a measurement of a Gaussian variant will lie within 1 <math>\sigma</math> of the mean. For the Poisson distribution with a mean of 97 you would have 66% of the data occur within 1 <math>\sigma = \sqrt{97}</math>.

<pre>
root [26] ROOT::Math::poisson_cdf(97-sqrt(97),97)
(double)1.67580969302001004e-01
root [30] 1-2*ROOT::Math::poisson_cdf(97-sqrt(97),97)
(const double)6.64838061395997992e-01

root [28] ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(double)1.58655253931457185e-01
root [29] 1-2*ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(const double)6.82689492137085630e-01

</pre>

The 7th column above identifies how many sigma the mean of that trial is from the average <math>\bar{x}</math>.

: <math>= 0.68 * 8 = 5</math>

Looks like we have 7/8 events within 1<math> \sigma</math> = 87.5%

How about the average sigma assuming poisson?

If you take the average of sigma estimate in column 6 you would get
:<math>\overline{\sigma(Poisson)} = \frac{\sum \sigma_i(Poisson)}{8} = 9.86</math>

Using this one can calculate the variance of the variance as

:<math>\frac{\sum \left ( \sigma_i(Poisson) - \overline{\sigma(Poisson)}\right)^2}{8-1} = (0.56)^2</math>

comparing the <math>\sigma</math> from the 8 trials to the <math>\sigma</math> from the Poisson estimate you have

: <math>10.9 = 9.9 \pm 0.56</math> In agreement within 2 <math>\sigma</math>

What is really required however is an estimate of the probability that the assumption of a Poisson distribution is correct (Hypothesis test). This will be the subject of future sections.

=== Error Propagation===

:<math>f = \bar{x} = \frac{\sum x_i}{N}</math>

:<math>\frac{\partial f}{\partial x_i} = \frac{1}{N}</math>
: <math>\delta f = \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n}</math>
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2</math>
: <math>= \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 + 2 \left ( \frac{\partial^2 f}{\partial x_1\partial x_2} \right) \sigma_{x_1}\sigma_{x_2} + \cdots</math>

:<math>\frac{\partial^2 f}{\partial x_i\partial x_j} = \frac{\partial }{\partial x_j} \frac{1}{N} = 0 \Rightarrow</math> no Covariances
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 </math>
: <math> = \left ( \frac{1}{N} \sigma_{x_1} \right )^2 + \left ( \frac{1}{N}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{1}{N}\sigma_{x_n} \right )^2 </math>

If
:<math> \sigma_i = \sigma</math>

Then

: <math>\left ( \delta f \right)^2 = \left ( \frac{1}{N} \sigma \right )^2 + \left ( \frac{1}{N}\sigma \right )^2 + \cdots \left ( \frac{1}{N}\sigma \right )^2 </math>
: <math>=\frac{ \sigma^2}{N}</math>

;Does this mean that we get an infinitely precise measurement if <math>N \rightarrow \infty</math>?
: No! In reality there are systematic errors in every experiment so the best you can do is reduce your statistical precision to a point where the systematic errors dominate. There is also the observation that in practice it is difficult to find an experiment absent of "non-statistical fluctuations".

=Example: Table Area=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

Consider a calculation of a Table's Area

<math>A= L \times W</math>

The mean that the Area (A) is a function of the Length (L) and the Width (W) of the table.

<math>A = f(L,W)</math>

We can write the variance of the area

:<math>\sigma^2_A = \frac{\sum_{i=1}^{i=N} (A_i - \bar{A})^2}{N}</math>

:<math>= \sum_{i=1}^{i=N}\frac{ \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} + (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \right ] ^2}{N} + \frac{\sum_{i=1}^{i=N} \left [ (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2 }{N}</math>

:<math>+2 \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right]^2}{N} </math>

: <math>= \sigma^2_L \left ( \frac{\partial A}{\partial L} \right )^2 +\sigma^2_W \left ( \frac{\partial A}{\partial W} \right )^2 + 2 \sigma^2_{LW} \frac{\partial A}{\partial L} \frac{\partial A}{\partial W} </math>

where
<math>\sigma^2_{LW} = \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \right ]^2}{N}</math> is defined as the '''Covariance''' between <math>L</math> and <math>W</math>.

= Weighted Mean and variance =

The variance <math>(\sigma)</math> in the above examples was assumed to be the same for all measurement from the parent distribution.

What happens when you wish to combine measurements with unequal variances (different experiments measuring the same quantity)?

== Weighted Mean==

Let's assume we have a measured quantity having a mean <math> X</math> from a Gaussian parent distribution.

If you attempt to measure X with several different experiments you will likely have a series of results which vary in their precision.

Lets assume you have 2 experiments which obtained the averages <math>X_A</math> and <math>X_B</math>.

If we assume that each measurement is governed by a Gaussian distribution,

Then the probability of one experiment observing the value X_A is given by

:<math>P(x=X_A) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A}</math>

similarly the probability of the other experiment observing the average X_B is

:<math>P(x=X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B}</math>

Now the combined probability that the first experiment measures the average <math>X_A</math> and the second <math>X_B</math> is given as the product of the two probabilities suth that

:<math>P(x=X_A,X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A} \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B} = \frac{e^{-\frac{1}{2}\left [ \left ( \frac{X_A-x}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2\right ]}}{\sigma_A \sigma_B}\equiv \frac{e^{-\frac{1}{2}\left [ \chi^2\right ]}}{\sigma_A \sigma_B}</math>

where

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2</math>

;The principle of maximum likelihood (to be the cornerstone of hypothesis testing) may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Applying this principle to the two experiments means that the best estimate of <math>X</math> is made when <math>P(x=X_A,X_B)</math> is a maximum which occurs when

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2 = </math>Minimum

or

: <math>\frac{\partial \chi^2}{\partial X} =2 \left ( \frac{X_A-X}{\sigma_A^2}\right )(-1)+2 \left ( \frac{X_B-X}{\sigma_B^2}\right )(-1)= 0</math>

:<math>\Rightarrow X = \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}</math>

If each observable (<math>x_i</math>) is accompanied by an estimate of the uncertainty in that observable (<math>\sigma_i</math>) then the weighted mean is defined as

:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\sigma_i^2}}{\sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}}</math>

==Weighted Variance==

To determine the variance of the measurements you should follow the Taylor series based prescription denoted above in that

:<math>\sigma^2 = \sum \sigma_i^2 \left ( \frac{\partial X}{\partial X_i}\right)^2 = \sigma_A^2\left ( \frac{\partial X}{\partial X_A}\right)^2 + \sigma_B^2\left ( \frac{\partial X}{\partial X_B}\right)^2</math> : Assuming no covariance

:<math>\frac{\partial X}{\partial X_A} = \frac{\partial}{\partial X_A} \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} = \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} </math>

: <math>\sigma^2 =\sigma_A^2 \left ( \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2 + \sigma_B^2 \left ( \frac{\frac{1}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2</math>
: <math>= \frac{\frac{1}{\sigma_A^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2} + \frac{\frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{1}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})}</math>

The variance of the distribution is defined as

:<math>\frac{1}{\sigma^2} = \sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}</math> = weighted variance

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrorAna PropOfErr

2014-02-28T20:15:42Z

Stocjas2: /* Taylor Expansion */

=Taylor Expansion=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

The Taylor series expansion of a function f(x) about the point a is given as

<math>f(x) = f(a) + \left . f^{\prime}(x)\right |_{x=a} \frac{x-a}{1!} + \left . f^{\prime \prime}(x)\right |_{x=a} \frac{(x-a)^2}{2!} + ...</math>
;<math>= \left . \sum_{n=0}^{\infty} f^{(n)}(x)\right |_{x=a} \frac{(x-a)^n}{n!}</math>

For small values of x (x << 1) we can expand the function about 0 such that

<math>\sqrt{1+x} = \left . \sqrt{1-0} \frac{1}{2}(1+x)^{-1/2}\right |_{x=0} \frac{x^1}{1!}+ \left . \frac{1}{2}\frac{-1}{2}(1+x)^{-3/2} \right |_{x=0} \frac{x^2}{2!}</math>
;<math>=1 + \frac{x}{2} - \frac{x^2}{8}</math>

The taylor expansion of a function with two variables<math> (x_1 , x_2)</math> about the average of the two variables<math> (\bar {x_1} , \bar{x_2} )</math> is given by

<math>f(x, y)=f(\bar {x}, \bar{x})+(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{x})}</math>

or

<math>f(x, y)-f(\bar {x}, \bar{y})=(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{y})}</math>

The average

:<math>f(\bar {x}, \bar{y}) \equiv \frac{\sum f(x,y)_i}{N}</math>

The term

<math>\delta f = f(x, y)-f(\bar {x}, \bar{y})</math>

represents a small fluctuation <math>(\delta f)</math> of the function <math>f</math> from its average <math>f(\bar {x}, \bar{y})</math> if we ignore higher order terms in the Taylor expansion ( this means the fluctuations are small)then we can write the variance using the definition as

:<math>\sigma^2 = \frac{\sum \left [ f(x,y)_i - f(\bar {x}, \bar{y})\right ]^2}{N}</math>
: <math>= \frac{\sum \left [(x_i-\bar {x}) \frac{\partial f}{\partial x}+(y_i-\bar{y}) \frac{\partial f}{\partial y}\right ]^2}{N}</math>
:<math>= \frac{\sum (x_i-\bar {x})^2 \left ( \frac{\partial f}{\partial x}\right )^2}{N} + \frac{\sum (y_i-\bar {y})^2 \left ( \frac{\partial f}{\partial y}\right )^2}{N} + 2 \frac{\sum (x_i-\bar {x}) \left ( \frac{\partial f}{\partial x} \right ) (y_i-\bar {y}) \left ( \frac{\partial f}{\partial y}\right )}{N} </math>
:<math>\sigma^2 = \sigma_x^2 \left ( \frac{\partial f}{\partial x}\right )^2 + \sigma_y^2\left ( \frac{\partial f}{\partial y}\right )^2 + 2 \sigma_{x,y}^2 \left ( \frac{\partial f}{\partial x} \right ) \left ( \frac{\partial f}{\partial y}\right ) </math>

where

:<math>\sigma_{x,y}^2 = \frac{\sum (x_i-\bar {x}) (y_i-\bar {y}) }{N} \equiv</math> Covariance

The above can be reproduced for functions with multiple variables.

=Instrumental and Statistical Uncertainties=

http://www.physics.uoguelph.ca/~reception/2440/StatsErrorsJuly26-06.pdf
==Counting Experiment Example==

The table below reports 8 measurements of the coincidence rate observed by two scintillators detecting cosmic rays. The scintillator are place a distance (x) away from each other in order to detect cosmic rays falling on the earth's surface. The time and observed coincidence counts are reported in separate columns as well as the angle made by the normal to the detector with the earths surface.

{| border="5"
! Date || Time (hrs) || <math>\theta</math> ||Coincidence Counts || Mean Coinc/Hr || <math>\sigma_{Poisson} = \sqrt{\mbox{Mean Counts/Hr}}</math> || <math>\left | \sigma \right |</math> from Mean
|-
|9/12/07 || 20.5 || 30|| 2233 || 109 || 10.4 ||1
|-
|9/14/07 || 21 || 30 || 1582 || 75 || 8.7||2
|-
|10/3/07 || 21 || 30 || 2282 || 100 || 10.4||1
|-
|10/4/07 || 21 || 30 || 2029 || 97 || 9.8|| 0.1
|-
|10/15/07 || 21 || 30 || 2180 || 100 || 10|| 0.6
|-
|10/18/07 || 21 || 30 || 2064 || 99 || 9.9||0.1
|-
| 10/23/07 || 21 || 30 || 2003 || 95 || 9.8||0.2
|-
| 10/26/07 || 21 || 30 || 1943 || 93 || 9.6 || 0.5
|}

The average count rate for a given trial is given in the 5th column by diving column 4 by column 2.

One can expect a Poisson parent distribution because the probability of a cosmic ray interacting with the scintillator is low. The variance of measurement in each trial is related to the counting rate by

: <math>\sigma^2 = \mu =</math> average counting rate

as a result of the assumption that the parent distribution is Poisson. The value of this <math>\sigma</math> is shown in column 6.

; Is the Poisson distribution the parent distribution in this experiment?

To try and answer the above question lets determine the mean and variance of the data:

:<math>\bar{x} =\frac{\sum CPM_i}{8} = 96.44</math>

:<math>s = \sqrt{\frac{\sum (x_i-\mu)^2}{8-1}} = 10.8</math>

If you approximate the Poisson distribution by a Gaussian then the probability any one measurement is within 1 <math>\sigma</math> of the mean is 68% = Probability that a measurement of a Gaussian variant will lie within 1 <math>\sigma</math> of the mean. For the Poisson distribution with a mean of 97 you would have 66% of the data occur within 1 <math>\sigma = \sqrt{97}</math>.

<pre>
root [26] ROOT::Math::poisson_cdf(97-sqrt(97),97)
(double)1.67580969302001004e-01
root [30] 1-2*ROOT::Math::poisson_cdf(97-sqrt(97),97)
(const double)6.64838061395997992e-01

root [28] ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(double)1.58655253931457185e-01
root [29] 1-2*ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(const double)6.82689492137085630e-01

</pre>

The 7th column above identifies how many sigma the mean of that trial is from the average <math>\bar{x}</math>.

: <math>= 0.68 * 8 = 5</math>

Looks like we have 7/8 events within 1<math> \sigma</math> = 87.5%

How about the average sigma assuming poisson?

If you take the average of sigma estimate in column 6 you would get
:<math>\overline{\sigma(Poisson)} = \frac{\sum \sigma_i(Poisson)}{8} = 9.86</math>

Using this one can calculate the variance of the variance as

:<math>\frac{\sum \left ( \sigma_i(Poisson) - \overline{\sigma(Poisson)}\right)^2}{8-1} = (0.56)^2</math>

comparing the <math>\sigma</math> from the 8 trials to the <math>\sigma</math> from the Poisson estimate you have

: <math>10.9 = 9.9 \pm 0.56</math> In agreement within 2 <math>\sigma</math>

What is really required however is an estimate of the probability that the assumption of a Poisson distribution is correct (Hypothesis test). This will be the subject of future sections.

=== Error Propagation===

:<math>f = \bar{x} = \frac{\sum x_i}{N}</math>

:<math>\frac{\partial f}{\partial x_i} = \frac{1}{N}</math>
: <math>\delta f = \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n}</math>
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2</math>
: <math>= \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 + 2 \left ( \frac{\partial^2 f}{\partial x_1\partial x_2} \right) \sigma_{x_1}\sigma_{x_2} + \cdots</math>

:<math>\frac{\partial^2 f}{\partial x_i\partial x_j} = \frac{\partial }{\partial x_j} \frac{1}{N} = 0 \Rightarrow</math> no Covariances
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 </math>
: <math> = \left ( \frac{1}{N} \sigma_{x_1} \right )^2 + \left ( \frac{1}{N}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{1}{N}\sigma_{x_n} \right )^2 </math>

If
:<math> \sigma_i = \sigma</math>

Then

: <math>\left ( \delta f \right)^2 = \left ( \frac{1}{N} \sigma \right )^2 + \left ( \frac{1}{N}\sigma \right )^2 + \cdots \left ( \frac{1}{N}\sigma \right )^2 </math>
: <math>=\frac{ \sigma^2}{N}</math>

;Does this mean that we get an infinitely precise measurement if <math>N \rightarrow \infty</math>?
: No! In reality there are systematic errors in every experiment so the best you can do is reduce your statistical precision to a point where the systematic errors dominate. There is also the observation that in practice it is difficult to find an experiment absent of "non-statistical fluctuations".

=Example: Table Area=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

Consider a calculation of a Table's Area

<math>A= L \times W</math>

The mean that the Area (A) is a function of the Length (L) and the Width (W) of the table.

<math>A = f(L,W)</math>

We can write the variance of the area

:<math>\sigma^2_A = \frac{\sum_{i=1}^{i=N} (A_i - \bar{A})^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} + (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar WW} \right] ^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \right ] ^2}{N} + \frac{\sum_{i=1}^{i=N} \left [ (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2 }{N}</math>

:<math>+2 \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right]^2}{N} </math>

: <math>= \sigma^2_L \left ( \frac{\partial A}{\partial L} \right )^2 +\sigma^2_W \left ( \frac{\partial A}{\partial W} \right )^2 + 2 \sigma^2_{LW} \frac{\partial A}{\partial L} \frac{\partial A}{\partial W} </math>

where
<math>\sigma^2_{LW} = \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \right ]^2}{N}</math> is defined as the '''Covariance''' between <math>L</math> and <math>W</math>.

= Weighted Mean and variance =

The variance <math>(\sigma)</math> in the above examples was assumed to be the same for all measurement from the parent distribution.

What happens when you wish to combine measurements with unequal variances (different experiments measuring the same quantity)?

== Weighted Mean==

Let's assume we have a measured quantity having a mean <math> X</math> from a Gaussian parent distribution.

If you attempt to measure X with several different experiments you will likely have a series of results which vary in their precision.

Lets assume you have 2 experiments which obtained the averages <math>X_A</math> and <math>X_B</math>.

If we assume that each measurement is governed by a Gaussian distribution,

Then the probability of one experiment observing the value X_A is given by

:<math>P(x=X_A) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A}</math>

similarly the probability of the other experiment observing the average X_B is

:<math>P(x=X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B}</math>

Now the combined probability that the first experiment measures the average <math>X_A</math> and the second <math>X_B</math> is given as the product of the two probabilities suth that

:<math>P(x=X_A,X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A} \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B} = \frac{e^{-\frac{1}{2}\left [ \left ( \frac{X_A-x}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2\right ]}}{\sigma_A \sigma_B}\equiv \frac{e^{-\frac{1}{2}\left [ \chi^2\right ]}}{\sigma_A \sigma_B}</math>

where

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2</math>

;The principle of maximum likelihood (to be the cornerstone of hypothesis testing) may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Applying this principle to the two experiments means that the best estimate of <math>X</math> is made when <math>P(x=X_A,X_B)</math> is a maximum which occurs when

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2 = </math>Minimum

or

: <math>\frac{\partial \chi^2}{\partial X} =2 \left ( \frac{X_A-X}{\sigma_A^2}\right )(-1)+2 \left ( \frac{X_B-X}{\sigma_B^2}\right )(-1)= 0</math>

:<math>\Rightarrow X = \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}</math>

If each observable (<math>x_i</math>) is accompanied by an estimate of the uncertainty in that observable (<math>\sigma_i</math>) then the weighted mean is defined as

:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\sigma_i^2}}{\sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}}</math>

==Weighted Variance==

To determine the variance of the measurements you should follow the Taylor series based prescription denoted above in that

:<math>\sigma^2 = \sum \sigma_i^2 \left ( \frac{\partial X}{\partial X_i}\right)^2 = \sigma_A^2\left ( \frac{\partial X}{\partial X_A}\right)^2 + \sigma_B^2\left ( \frac{\partial X}{\partial X_B}\right)^2</math> : Assuming no covariance

:<math>\frac{\partial X}{\partial X_A} = \frac{\partial}{\partial X_A} \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} = \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} </math>

: <math>\sigma^2 =\sigma_A^2 \left ( \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2 + \sigma_B^2 \left ( \frac{\frac{1}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2</math>
: <math>= \frac{\frac{1}{\sigma_A^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2} + \frac{\frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{1}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})}</math>

The variance of the distribution is defined as

:<math>\frac{1}{\sigma^2} = \sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}</math> = weighted variance

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrorAna PropOfErr

2014-02-28T20:15:19Z

Stocjas2: /* Taylor Expansion */

=Taylor Expansion=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

The Taylor series expansion of a function f(x) about the point a is given as

<math>f(x) = f(a) + \left . f^{\prime}(x)\right |_{x=a} \frac{x-a}{1!} + \left . f^{\prime \prime}(x)\right |_{x=a} \frac{(x-a)^2}{2!} + ...</math>
;<math>= \left . \sum_{n=0}^{\infty} f^{(n)}(x)\right |_{x=a} \frac{x^n}{n!}</math>

For small values of x (x << 1) we can expand the function about 0 such that

<math>\sqrt{1+x} = \left . \sqrt{1-0} \frac{1}{2}(1+x)^{-1/2}\right |_{x=0} \frac{x^1}{1!}+ \left . \frac{1}{2}\frac{-1}{2}(1+x)^{-3/2} \right |_{x=0} \frac{x^2}{2!}</math>
;<math>=1 + \frac{x}{2} - \frac{x^2}{8}</math>

The taylor expansion of a function with two variables<math> (x_1 , x_2)</math> about the average of the two variables<math> (\bar {x_1} , \bar{x_2} )</math> is given by

<math>f(x, y)=f(\bar {x}, \bar{x})+(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{x})}</math>

or

<math>f(x, y)-f(\bar {x}, \bar{y})=(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{y})}</math>

The average

:<math>f(\bar {x}, \bar{y}) \equiv \frac{\sum f(x,y)_i}{N}</math>

The term

<math>\delta f = f(x, y)-f(\bar {x}, \bar{y})</math>

represents a small fluctuation <math>(\delta f)</math> of the function <math>f</math> from its average <math>f(\bar {x}, \bar{y})</math> if we ignore higher order terms in the Taylor expansion ( this means the fluctuations are small)then we can write the variance using the definition as

:<math>\sigma^2 = \frac{\sum \left [ f(x,y)_i - f(\bar {x}, \bar{y})\right ]^2}{N}</math>
: <math>= \frac{\sum \left [(x_i-\bar {x}) \frac{\partial f}{\partial x}+(y_i-\bar{y}) \frac{\partial f}{\partial y}\right ]^2}{N}</math>
:<math>= \frac{\sum (x_i-\bar {x})^2 \left ( \frac{\partial f}{\partial x}\right )^2}{N} + \frac{\sum (y_i-\bar {y})^2 \left ( \frac{\partial f}{\partial y}\right )^2}{N} + 2 \frac{\sum (x_i-\bar {x}) \left ( \frac{\partial f}{\partial x} \right ) (y_i-\bar {y}) \left ( \frac{\partial f}{\partial y}\right )}{N} </math>
:<math>\sigma^2 = \sigma_x^2 \left ( \frac{\partial f}{\partial x}\right )^2 + \sigma_y^2\left ( \frac{\partial f}{\partial y}\right )^2 + 2 \sigma_{x,y}^2 \left ( \frac{\partial f}{\partial x} \right ) \left ( \frac{\partial f}{\partial y}\right ) </math>

where

:<math>\sigma_{x,y}^2 = \frac{\sum (x_i-\bar {x}) (y_i-\bar {y}) }{N} \equiv</math> Covariance

The above can be reproduced for functions with multiple variables.

=Instrumental and Statistical Uncertainties=

http://www.physics.uoguelph.ca/~reception/2440/StatsErrorsJuly26-06.pdf
==Counting Experiment Example==

The table below reports 8 measurements of the coincidence rate observed by two scintillators detecting cosmic rays. The scintillator are place a distance (x) away from each other in order to detect cosmic rays falling on the earth's surface. The time and observed coincidence counts are reported in separate columns as well as the angle made by the normal to the detector with the earths surface.

{| border="5"
! Date || Time (hrs) || <math>\theta</math> ||Coincidence Counts || Mean Coinc/Hr || <math>\sigma_{Poisson} = \sqrt{\mbox{Mean Counts/Hr}}</math> || <math>\left | \sigma \right |</math> from Mean
|-
|9/12/07 || 20.5 || 30|| 2233 || 109 || 10.4 ||1
|-
|9/14/07 || 21 || 30 || 1582 || 75 || 8.7||2
|-
|10/3/07 || 21 || 30 || 2282 || 100 || 10.4||1
|-
|10/4/07 || 21 || 30 || 2029 || 97 || 9.8|| 0.1
|-
|10/15/07 || 21 || 30 || 2180 || 100 || 10|| 0.6
|-
|10/18/07 || 21 || 30 || 2064 || 99 || 9.9||0.1
|-
| 10/23/07 || 21 || 30 || 2003 || 95 || 9.8||0.2
|-
| 10/26/07 || 21 || 30 || 1943 || 93 || 9.6 || 0.5
|}

The average count rate for a given trial is given in the 5th column by diving column 4 by column 2.

One can expect a Poisson parent distribution because the probability of a cosmic ray interacting with the scintillator is low. The variance of measurement in each trial is related to the counting rate by

: <math>\sigma^2 = \mu =</math> average counting rate

as a result of the assumption that the parent distribution is Poisson. The value of this <math>\sigma</math> is shown in column 6.

; Is the Poisson distribution the parent distribution in this experiment?

To try and answer the above question lets determine the mean and variance of the data:

:<math>\bar{x} =\frac{\sum CPM_i}{8} = 96.44</math>

:<math>s = \sqrt{\frac{\sum (x_i-\mu)^2}{8-1}} = 10.8</math>

If you approximate the Poisson distribution by a Gaussian then the probability any one measurement is within 1 <math>\sigma</math> of the mean is 68% = Probability that a measurement of a Gaussian variant will lie within 1 <math>\sigma</math> of the mean. For the Poisson distribution with a mean of 97 you would have 66% of the data occur within 1 <math>\sigma = \sqrt{97}</math>.

<pre>
root [26] ROOT::Math::poisson_cdf(97-sqrt(97),97)
(double)1.67580969302001004e-01
root [30] 1-2*ROOT::Math::poisson_cdf(97-sqrt(97),97)
(const double)6.64838061395997992e-01

root [28] ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(double)1.58655253931457185e-01
root [29] 1-2*ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(const double)6.82689492137085630e-01

</pre>

The 7th column above identifies how many sigma the mean of that trial is from the average <math>\bar{x}</math>.

: <math>= 0.68 * 8 = 5</math>

Looks like we have 7/8 events within 1<math> \sigma</math> = 87.5%

How about the average sigma assuming poisson?

If you take the average of sigma estimate in column 6 you would get
:<math>\overline{\sigma(Poisson)} = \frac{\sum \sigma_i(Poisson)}{8} = 9.86</math>

Using this one can calculate the variance of the variance as

:<math>\frac{\sum \left ( \sigma_i(Poisson) - \overline{\sigma(Poisson)}\right)^2}{8-1} = (0.56)^2</math>

comparing the <math>\sigma</math> from the 8 trials to the <math>\sigma</math> from the Poisson estimate you have

: <math>10.9 = 9.9 \pm 0.56</math> In agreement within 2 <math>\sigma</math>

What is really required however is an estimate of the probability that the assumption of a Poisson distribution is correct (Hypothesis test). This will be the subject of future sections.

=== Error Propagation===

:<math>f = \bar{x} = \frac{\sum x_i}{N}</math>

:<math>\frac{\partial f}{\partial x_i} = \frac{1}{N}</math>
: <math>\delta f = \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n}</math>
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2</math>
: <math>= \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 + 2 \left ( \frac{\partial^2 f}{\partial x_1\partial x_2} \right) \sigma_{x_1}\sigma_{x_2} + \cdots</math>

:<math>\frac{\partial^2 f}{\partial x_i\partial x_j} = \frac{\partial }{\partial x_j} \frac{1}{N} = 0 \Rightarrow</math> no Covariances
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 </math>
: <math> = \left ( \frac{1}{N} \sigma_{x_1} \right )^2 + \left ( \frac{1}{N}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{1}{N}\sigma_{x_n} \right )^2 </math>

If
:<math> \sigma_i = \sigma</math>

Then

: <math>\left ( \delta f \right)^2 = \left ( \frac{1}{N} \sigma \right )^2 + \left ( \frac{1}{N}\sigma \right )^2 + \cdots \left ( \frac{1}{N}\sigma \right )^2 </math>
: <math>=\frac{ \sigma^2}{N}</math>

;Does this mean that we get an infinitely precise measurement if <math>N \rightarrow \infty</math>?
: No! In reality there are systematic errors in every experiment so the best you can do is reduce your statistical precision to a point where the systematic errors dominate. There is also the observation that in practice it is difficult to find an experiment absent of "non-statistical fluctuations".

=Example: Table Area=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

Consider a calculation of a Table's Area

<math>A= L \times W</math>

The mean that the Area (A) is a function of the Length (L) and the Width (W) of the table.

<math>A = f(L,W)</math>

We can write the variance of the area

:<math>\sigma^2_A = \frac{\sum_{i=1}^{i=N} (A_i - \bar{A})^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} + (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar WW} \right] ^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \right ] ^2}{N} + \frac{\sum_{i=1}^{i=N} \left [ (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2 }{N}</math>

:<math>+2 \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right]^2}{N} </math>

: <math>= \sigma^2_L \left ( \frac{\partial A}{\partial L} \right )^2 +\sigma^2_W \left ( \frac{\partial A}{\partial W} \right )^2 + 2 \sigma^2_{LW} \frac{\partial A}{\partial L} \frac{\partial A}{\partial W} </math>

where
<math>\sigma^2_{LW} = \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \right ]^2}{N}</math> is defined as the '''Covariance''' between <math>L</math> and <math>W</math>.

= Weighted Mean and variance =

The variance <math>(\sigma)</math> in the above examples was assumed to be the same for all measurement from the parent distribution.

What happens when you wish to combine measurements with unequal variances (different experiments measuring the same quantity)?

== Weighted Mean==

Let's assume we have a measured quantity having a mean <math> X</math> from a Gaussian parent distribution.

If you attempt to measure X with several different experiments you will likely have a series of results which vary in their precision.

Lets assume you have 2 experiments which obtained the averages <math>X_A</math> and <math>X_B</math>.

If we assume that each measurement is governed by a Gaussian distribution,

Then the probability of one experiment observing the value X_A is given by

:<math>P(x=X_A) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A}</math>

similarly the probability of the other experiment observing the average X_B is

:<math>P(x=X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B}</math>

Now the combined probability that the first experiment measures the average <math>X_A</math> and the second <math>X_B</math> is given as the product of the two probabilities suth that

:<math>P(x=X_A,X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A} \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B} = \frac{e^{-\frac{1}{2}\left [ \left ( \frac{X_A-x}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2\right ]}}{\sigma_A \sigma_B}\equiv \frac{e^{-\frac{1}{2}\left [ \chi^2\right ]}}{\sigma_A \sigma_B}</math>

where

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2</math>

;The principle of maximum likelihood (to be the cornerstone of hypothesis testing) may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Applying this principle to the two experiments means that the best estimate of <math>X</math> is made when <math>P(x=X_A,X_B)</math> is a maximum which occurs when

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2 = </math>Minimum

or

: <math>\frac{\partial \chi^2}{\partial X} =2 \left ( \frac{X_A-X}{\sigma_A^2}\right )(-1)+2 \left ( \frac{X_B-X}{\sigma_B^2}\right )(-1)= 0</math>

:<math>\Rightarrow X = \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}</math>

If each observable (<math>x_i</math>) is accompanied by an estimate of the uncertainty in that observable (<math>\sigma_i</math>) then the weighted mean is defined as

:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\sigma_i^2}}{\sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}}</math>

==Weighted Variance==

To determine the variance of the measurements you should follow the Taylor series based prescription denoted above in that

:<math>\sigma^2 = \sum \sigma_i^2 \left ( \frac{\partial X}{\partial X_i}\right)^2 = \sigma_A^2\left ( \frac{\partial X}{\partial X_A}\right)^2 + \sigma_B^2\left ( \frac{\partial X}{\partial X_B}\right)^2</math> : Assuming no covariance

:<math>\frac{\partial X}{\partial X_A} = \frac{\partial}{\partial X_A} \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} = \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} </math>

: <math>\sigma^2 =\sigma_A^2 \left ( \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2 + \sigma_B^2 \left ( \frac{\frac{1}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2</math>
: <math>= \frac{\frac{1}{\sigma_A^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2} + \frac{\frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{1}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})}</math>

The variance of the distribution is defined as

:<math>\frac{1}{\sigma^2} = \sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}</math> = weighted variance

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrorAna PropOfErr

2014-02-28T20:14:33Z

Stocjas2: /* Taylor Expansion */

=Taylor Expansion=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

The Taylor series expansion of a function f(x) about the point a is given as

<math>f(x) = f(a) + \left . f^{\prime}(x)\right |_{x=a} \frac{x}{1!} + \left . f^{\prime \prime}(x)\right |_{x=a} \frac{x^2}{2!} + ...</math>
;<math>= \left . \sum_{n=0}^{\infty} f^{(n)}(x)\right |_{x=a} \frac{x^n}{n!}</math>

For small values of x (x << 1) we can expand the function about 0 such that

<math>\sqrt{1+x} = \left . \sqrt{1-0} \frac{1}{2}(1+x)^{-1/2}\right |_{x=0} \frac{x^1}{1!}+ \left . \frac{1}{2}\frac{-1}{2}(1+x)^{-3/2} \right |_{x=0} \frac{x^2}{2!}</math>
;<math>=1 + \frac{x}{2} - \frac{x^2}{8}</math>

The taylor expansion of a function with two variables<math> (x_1 , x_2)</math> about the average of the two variables<math> (\bar {x_1} , \bar{x_2} )</math> is given by

<math>f(x, y)=f(\bar {x}, \bar{x})+(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{x})}</math>

or

<math>f(x, y)-f(\bar {x}, \bar{y})=(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{y})}</math>

The average

:<math>f(\bar {x}, \bar{y}) \equiv \frac{\sum f(x,y)_i}{N}</math>

The term

<math>\delta f = f(x, y)-f(\bar {x}, \bar{y})</math>

represents a small fluctuation <math>(\delta f)</math> of the function <math>f</math> from its average <math>f(\bar {x}, \bar{y})</math> if we ignore higher order terms in the Taylor expansion ( this means the fluctuations are small)then we can write the variance using the definition as

:<math>\sigma^2 = \frac{\sum \left [ f(x,y)_i - f(\bar {x}, \bar{y})\right ]^2}{N}</math>
: <math>= \frac{\sum \left [(x_i-\bar {x}) \frac{\partial f}{\partial x}+(y_i-\bar{y}) \frac{\partial f}{\partial y}\right ]^2}{N}</math>
:<math>= \frac{\sum (x_i-\bar {x})^2 \left ( \frac{\partial f}{\partial x}\right )^2}{N} + \frac{\sum (y_i-\bar {y})^2 \left ( \frac{\partial f}{\partial y}\right )^2}{N} + 2 \frac{\sum (x_i-\bar {x}) \left ( \frac{\partial f}{\partial x} \right ) (y_i-\bar {y}) \left ( \frac{\partial f}{\partial y}\right )}{N} </math>
:<math>\sigma^2 = \sigma_x^2 \left ( \frac{\partial f}{\partial x}\right )^2 + \sigma_y^2\left ( \frac{\partial f}{\partial y}\right )^2 + 2 \sigma_{x,y}^2 \left ( \frac{\partial f}{\partial x} \right ) \left ( \frac{\partial f}{\partial y}\right ) </math>

where

:<math>\sigma_{x,y}^2 = \frac{\sum (x_i-\bar {x}) (y_i-\bar {y}) }{N} \equiv</math> Covariance

The above can be reproduced for functions with multiple variables.

=Instrumental and Statistical Uncertainties=

http://www.physics.uoguelph.ca/~reception/2440/StatsErrorsJuly26-06.pdf
==Counting Experiment Example==

The table below reports 8 measurements of the coincidence rate observed by two scintillators detecting cosmic rays. The scintillator are place a distance (x) away from each other in order to detect cosmic rays falling on the earth's surface. The time and observed coincidence counts are reported in separate columns as well as the angle made by the normal to the detector with the earths surface.

{| border="5"
! Date || Time (hrs) || <math>\theta</math> ||Coincidence Counts || Mean Coinc/Hr || <math>\sigma_{Poisson} = \sqrt{\mbox{Mean Counts/Hr}}</math> || <math>\left | \sigma \right |</math> from Mean
|-
|9/12/07 || 20.5 || 30|| 2233 || 109 || 10.4 ||1
|-
|9/14/07 || 21 || 30 || 1582 || 75 || 8.7||2
|-
|10/3/07 || 21 || 30 || 2282 || 100 || 10.4||1
|-
|10/4/07 || 21 || 30 || 2029 || 97 || 9.8|| 0.1
|-
|10/15/07 || 21 || 30 || 2180 || 100 || 10|| 0.6
|-
|10/18/07 || 21 || 30 || 2064 || 99 || 9.9||0.1
|-
| 10/23/07 || 21 || 30 || 2003 || 95 || 9.8||0.2
|-
| 10/26/07 || 21 || 30 || 1943 || 93 || 9.6 || 0.5
|}

The average count rate for a given trial is given in the 5th column by diving column 4 by column 2.

One can expect a Poisson parent distribution because the probability of a cosmic ray interacting with the scintillator is low. The variance of measurement in each trial is related to the counting rate by

: <math>\sigma^2 = \mu =</math> average counting rate

as a result of the assumption that the parent distribution is Poisson. The value of this <math>\sigma</math> is shown in column 6.

; Is the Poisson distribution the parent distribution in this experiment?

To try and answer the above question lets determine the mean and variance of the data:

:<math>\bar{x} =\frac{\sum CPM_i}{8} = 96.44</math>

:<math>s = \sqrt{\frac{\sum (x_i-\mu)^2}{8-1}} = 10.8</math>

If you approximate the Poisson distribution by a Gaussian then the probability any one measurement is within 1 <math>\sigma</math> of the mean is 68% = Probability that a measurement of a Gaussian variant will lie within 1 <math>\sigma</math> of the mean. For the Poisson distribution with a mean of 97 you would have 66% of the data occur within 1 <math>\sigma = \sqrt{97}</math>.

<pre>
root [26] ROOT::Math::poisson_cdf(97-sqrt(97),97)
(double)1.67580969302001004e-01
root [30] 1-2*ROOT::Math::poisson_cdf(97-sqrt(97),97)
(const double)6.64838061395997992e-01

root [28] ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(double)1.58655253931457185e-01
root [29] 1-2*ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(const double)6.82689492137085630e-01

</pre>

The 7th column above identifies how many sigma the mean of that trial is from the average <math>\bar{x}</math>.

: <math>= 0.68 * 8 = 5</math>

Looks like we have 7/8 events within 1<math> \sigma</math> = 87.5%

How about the average sigma assuming poisson?

If you take the average of sigma estimate in column 6 you would get
:<math>\overline{\sigma(Poisson)} = \frac{\sum \sigma_i(Poisson)}{8} = 9.86</math>

Using this one can calculate the variance of the variance as

:<math>\frac{\sum \left ( \sigma_i(Poisson) - \overline{\sigma(Poisson)}\right)^2}{8-1} = (0.56)^2</math>

comparing the <math>\sigma</math> from the 8 trials to the <math>\sigma</math> from the Poisson estimate you have

: <math>10.9 = 9.9 \pm 0.56</math> In agreement within 2 <math>\sigma</math>

What is really required however is an estimate of the probability that the assumption of a Poisson distribution is correct (Hypothesis test). This will be the subject of future sections.

=== Error Propagation===

:<math>f = \bar{x} = \frac{\sum x_i}{N}</math>

:<math>\frac{\partial f}{\partial x_i} = \frac{1}{N}</math>
: <math>\delta f = \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n}</math>
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2</math>
: <math>= \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 + 2 \left ( \frac{\partial^2 f}{\partial x_1\partial x_2} \right) \sigma_{x_1}\sigma_{x_2} + \cdots</math>

:<math>\frac{\partial^2 f}{\partial x_i\partial x_j} = \frac{\partial }{\partial x_j} \frac{1}{N} = 0 \Rightarrow</math> no Covariances
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 </math>
: <math> = \left ( \frac{1}{N} \sigma_{x_1} \right )^2 + \left ( \frac{1}{N}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{1}{N}\sigma_{x_n} \right )^2 </math>

If
:<math> \sigma_i = \sigma</math>

Then

: <math>\left ( \delta f \right)^2 = \left ( \frac{1}{N} \sigma \right )^2 + \left ( \frac{1}{N}\sigma \right )^2 + \cdots \left ( \frac{1}{N}\sigma \right )^2 </math>
: <math>=\frac{ \sigma^2}{N}</math>

;Does this mean that we get an infinitely precise measurement if <math>N \rightarrow \infty</math>?
: No! In reality there are systematic errors in every experiment so the best you can do is reduce your statistical precision to a point where the systematic errors dominate. There is also the observation that in practice it is difficult to find an experiment absent of "non-statistical fluctuations".

=Example: Table Area=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

Consider a calculation of a Table's Area

<math>A= L \times W</math>

The mean that the Area (A) is a function of the Length (L) and the Width (W) of the table.

<math>A = f(L,W)</math>

We can write the variance of the area

:<math>\sigma^2_A = \frac{\sum_{i=1}^{i=N} (A_i - \bar{A})^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} + (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar WW} \right] ^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \right ] ^2}{N} + \frac{\sum_{i=1}^{i=N} \left [ (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2 }{N}</math>

:<math>+2 \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right]^2}{N} </math>

: <math>= \sigma^2_L \left ( \frac{\partial A}{\partial L} \right )^2 +\sigma^2_W \left ( \frac{\partial A}{\partial W} \right )^2 + 2 \sigma^2_{LW} \frac{\partial A}{\partial L} \frac{\partial A}{\partial W} </math>

where
<math>\sigma^2_{LW} = \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \right ]^2}{N}</math> is defined as the '''Covariance''' between <math>L</math> and <math>W</math>.

= Weighted Mean and variance =

The variance <math>(\sigma)</math> in the above examples was assumed to be the same for all measurement from the parent distribution.

What happens when you wish to combine measurements with unequal variances (different experiments measuring the same quantity)?

== Weighted Mean==

Let's assume we have a measured quantity having a mean <math> X</math> from a Gaussian parent distribution.

If you attempt to measure X with several different experiments you will likely have a series of results which vary in their precision.

Lets assume you have 2 experiments which obtained the averages <math>X_A</math> and <math>X_B</math>.

If we assume that each measurement is governed by a Gaussian distribution,

Then the probability of one experiment observing the value X_A is given by

:<math>P(x=X_A) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A}</math>

similarly the probability of the other experiment observing the average X_B is

:<math>P(x=X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B}</math>

Now the combined probability that the first experiment measures the average <math>X_A</math> and the second <math>X_B</math> is given as the product of the two probabilities suth that

:<math>P(x=X_A,X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A} \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B} = \frac{e^{-\frac{1}{2}\left [ \left ( \frac{X_A-x}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2\right ]}}{\sigma_A \sigma_B}\equiv \frac{e^{-\frac{1}{2}\left [ \chi^2\right ]}}{\sigma_A \sigma_B}</math>

where

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2</math>

;The principle of maximum likelihood (to be the cornerstone of hypothesis testing) may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Applying this principle to the two experiments means that the best estimate of <math>X</math> is made when <math>P(x=X_A,X_B)</math> is a maximum which occurs when

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2 = </math>Minimum

or

: <math>\frac{\partial \chi^2}{\partial X} =2 \left ( \frac{X_A-X}{\sigma_A^2}\right )(-1)+2 \left ( \frac{X_B-X}{\sigma_B^2}\right )(-1)= 0</math>

:<math>\Rightarrow X = \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}</math>

If each observable (<math>x_i</math>) is accompanied by an estimate of the uncertainty in that observable (<math>\sigma_i</math>) then the weighted mean is defined as

:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\sigma_i^2}}{\sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}}</math>

==Weighted Variance==

To determine the variance of the measurements you should follow the Taylor series based prescription denoted above in that

:<math>\sigma^2 = \sum \sigma_i^2 \left ( \frac{\partial X}{\partial X_i}\right)^2 = \sigma_A^2\left ( \frac{\partial X}{\partial X_A}\right)^2 + \sigma_B^2\left ( \frac{\partial X}{\partial X_B}\right)^2</math> : Assuming no covariance

:<math>\frac{\partial X}{\partial X_A} = \frac{\partial}{\partial X_A} \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} = \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} </math>

: <math>\sigma^2 =\sigma_A^2 \left ( \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2 + \sigma_B^2 \left ( \frac{\frac{1}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2</math>
: <math>= \frac{\frac{1}{\sigma_A^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2} + \frac{\frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{1}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})}</math>

The variance of the distribution is defined as

:<math>\frac{1}{\sigma^2} = \sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}</math> = weighted variance

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

TF ErrorAna PropOfErr

2014-02-28T20:13:53Z

Stocjas2: /* Taylor Expansion */

=Taylor Expansion=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

The Taylor series expansion of a function f(x) about the point a=0 is given as

<math>f(x) = f(a) + \left . f^{\prime}(x)\right |_{x=a} \frac{x}{1!} + \left . f^{\prime \prime}(x)\right |_{x=a} \frac{x^2}{2!} + ...</math>
;<math>= \left . \sum_{n=0}^{\infty} f^{(n)}(x)\right |_{x=a} \frac{x^n}{n!}</math>

For small values of x (x << 1) we can expand the function about 0 such that

<math>\sqrt{1+x} = \left . \sqrt{1-0} \frac{1}{2}(1+x)^{-1/2}\right |_{x=0} \frac{x^1}{1!}+ \left . \frac{1}{2}\frac{-1}{2}(1+x)^{-3/2} \right |_{x=0} \frac{x^2}{2!}</math>
;<math>=1 + \frac{x}{2} - \frac{x^2}{8}</math>

The taylor expansion of a function with two variables<math> (x_1 , x_2)</math> about the average of the two variables<math> (\bar {x_1} , \bar{x_2} )</math> is given by

<math>f(x, y)=f(\bar {x}, \bar{x})+(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{x})}</math>

or

<math>f(x, y)-f(\bar {x}, \bar{y})=(x-\bar {x}) \frac{\partial f}{\partial x}\bigg |_{(x = \bar {x}, y = \bar{y})} +(y-\bar{y}) \frac{\partial f}{\partial y}\bigg |_{(x = \bar {x}, y = \bar{y})}</math>

The average

:<math>f(\bar {x}, \bar{y}) \equiv \frac{\sum f(x,y)_i}{N}</math>

The term

<math>\delta f = f(x, y)-f(\bar {x}, \bar{y})</math>

represents a small fluctuation <math>(\delta f)</math> of the function <math>f</math> from its average <math>f(\bar {x}, \bar{y})</math> if we ignore higher order terms in the Taylor expansion ( this means the fluctuations are small)then we can write the variance using the definition as

:<math>\sigma^2 = \frac{\sum \left [ f(x,y)_i - f(\bar {x}, \bar{y})\right ]^2}{N}</math>
: <math>= \frac{\sum \left [(x_i-\bar {x}) \frac{\partial f}{\partial x}+(y_i-\bar{y}) \frac{\partial f}{\partial y}\right ]^2}{N}</math>
:<math>= \frac{\sum (x_i-\bar {x})^2 \left ( \frac{\partial f}{\partial x}\right )^2}{N} + \frac{\sum (y_i-\bar {y})^2 \left ( \frac{\partial f}{\partial y}\right )^2}{N} + 2 \frac{\sum (x_i-\bar {x}) \left ( \frac{\partial f}{\partial x} \right ) (y_i-\bar {y}) \left ( \frac{\partial f}{\partial y}\right )}{N} </math>
:<math>\sigma^2 = \sigma_x^2 \left ( \frac{\partial f}{\partial x}\right )^2 + \sigma_y^2\left ( \frac{\partial f}{\partial y}\right )^2 + 2 \sigma_{x,y}^2 \left ( \frac{\partial f}{\partial x} \right ) \left ( \frac{\partial f}{\partial y}\right ) </math>

where

:<math>\sigma_{x,y}^2 = \frac{\sum (x_i-\bar {x}) (y_i-\bar {y}) }{N} \equiv</math> Covariance

The above can be reproduced for functions with multiple variables.

=Instrumental and Statistical Uncertainties=

http://www.physics.uoguelph.ca/~reception/2440/StatsErrorsJuly26-06.pdf
==Counting Experiment Example==

The table below reports 8 measurements of the coincidence rate observed by two scintillators detecting cosmic rays. The scintillator are place a distance (x) away from each other in order to detect cosmic rays falling on the earth's surface. The time and observed coincidence counts are reported in separate columns as well as the angle made by the normal to the detector with the earths surface.

{| border="5"
! Date || Time (hrs) || <math>\theta</math> ||Coincidence Counts || Mean Coinc/Hr || <math>\sigma_{Poisson} = \sqrt{\mbox{Mean Counts/Hr}}</math> || <math>\left | \sigma \right |</math> from Mean
|-
|9/12/07 || 20.5 || 30|| 2233 || 109 || 10.4 ||1
|-
|9/14/07 || 21 || 30 || 1582 || 75 || 8.7||2
|-
|10/3/07 || 21 || 30 || 2282 || 100 || 10.4||1
|-
|10/4/07 || 21 || 30 || 2029 || 97 || 9.8|| 0.1
|-
|10/15/07 || 21 || 30 || 2180 || 100 || 10|| 0.6
|-
|10/18/07 || 21 || 30 || 2064 || 99 || 9.9||0.1
|-
| 10/23/07 || 21 || 30 || 2003 || 95 || 9.8||0.2
|-
| 10/26/07 || 21 || 30 || 1943 || 93 || 9.6 || 0.5
|}

The average count rate for a given trial is given in the 5th column by diving column 4 by column 2.

One can expect a Poisson parent distribution because the probability of a cosmic ray interacting with the scintillator is low. The variance of measurement in each trial is related to the counting rate by

: <math>\sigma^2 = \mu =</math> average counting rate

as a result of the assumption that the parent distribution is Poisson. The value of this <math>\sigma</math> is shown in column 6.

; Is the Poisson distribution the parent distribution in this experiment?

To try and answer the above question lets determine the mean and variance of the data:

:<math>\bar{x} =\frac{\sum CPM_i}{8} = 96.44</math>

:<math>s = \sqrt{\frac{\sum (x_i-\mu)^2}{8-1}} = 10.8</math>

If you approximate the Poisson distribution by a Gaussian then the probability any one measurement is within 1 <math>\sigma</math> of the mean is 68% = Probability that a measurement of a Gaussian variant will lie within 1 <math>\sigma</math> of the mean. For the Poisson distribution with a mean of 97 you would have 66% of the data occur within 1 <math>\sigma = \sqrt{97}</math>.

<pre>
root [26] ROOT::Math::poisson_cdf(97-sqrt(97),97)
(double)1.67580969302001004e-01
root [30] 1-2*ROOT::Math::poisson_cdf(97-sqrt(97),97)
(const double)6.64838061395997992e-01

root [28] ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(double)1.58655253931457185e-01
root [29] 1-2*ROOT::Math::normal_cdf(97-sqrt(97),sqrt(97),97)
(const double)6.82689492137085630e-01

</pre>

The 7th column above identifies how many sigma the mean of that trial is from the average <math>\bar{x}</math>.

: <math>= 0.68 * 8 = 5</math>

Looks like we have 7/8 events within 1<math> \sigma</math> = 87.5%

How about the average sigma assuming poisson?

If you take the average of sigma estimate in column 6 you would get
:<math>\overline{\sigma(Poisson)} = \frac{\sum \sigma_i(Poisson)}{8} = 9.86</math>

Using this one can calculate the variance of the variance as

:<math>\frac{\sum \left ( \sigma_i(Poisson) - \overline{\sigma(Poisson)}\right)^2}{8-1} = (0.56)^2</math>

comparing the <math>\sigma</math> from the 8 trials to the <math>\sigma</math> from the Poisson estimate you have

: <math>10.9 = 9.9 \pm 0.56</math> In agreement within 2 <math>\sigma</math>

What is really required however is an estimate of the probability that the assumption of a Poisson distribution is correct (Hypothesis test). This will be the subject of future sections.

=== Error Propagation===

:<math>f = \bar{x} = \frac{\sum x_i}{N}</math>

:<math>\frac{\partial f}{\partial x_i} = \frac{1}{N}</math>
: <math>\delta f = \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n}</math>
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} + \frac{\partial f}{\partial x_2}\sigma_{x_2} + \cdots \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2</math>
: <math>= \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 + 2 \left ( \frac{\partial^2 f}{\partial x_1\partial x_2} \right) \sigma_{x_1}\sigma_{x_2} + \cdots</math>

:<math>\frac{\partial^2 f}{\partial x_i\partial x_j} = \frac{\partial }{\partial x_j} \frac{1}{N} = 0 \Rightarrow</math> no Covariances
: <math>\left ( \delta f \right)^2 = \left ( \frac{\partial f}{\partial x_1}\sigma_{x_1} \right )^2 + \left ( \frac{\partial f}{\partial x_2}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{\partial f}{\partial x_n}\sigma_{x_n} \right )^2 </math>
: <math> = \left ( \frac{1}{N} \sigma_{x_1} \right )^2 + \left ( \frac{1}{N}\sigma_{x_2} \right )^2 + \cdots \left ( \frac{1}{N}\sigma_{x_n} \right )^2 </math>

If
:<math> \sigma_i = \sigma</math>

Then

: <math>\left ( \delta f \right)^2 = \left ( \frac{1}{N} \sigma \right )^2 + \left ( \frac{1}{N}\sigma \right )^2 + \cdots \left ( \frac{1}{N}\sigma \right )^2 </math>
: <math>=\frac{ \sigma^2}{N}</math>

;Does this mean that we get an infinitely precise measurement if <math>N \rightarrow \infty</math>?
: No! In reality there are systematic errors in every experiment so the best you can do is reduce your statistical precision to a point where the systematic errors dominate. There is also the observation that in practice it is difficult to find an experiment absent of "non-statistical fluctuations".

=Example: Table Area=

A quantity which is calculated using quantities with known uncertainties will have an uncertainty based upon the uncertainty of the quantities used in the calculation.

To determine the uncertainty in a quantity which is a function of other quantities, you can consider the dependence of these quantities in terms of a tayler expansion

Consider a calculation of a Table's Area

<math>A= L \times W</math>

The mean that the Area (A) is a function of the Length (L) and the Width (W) of the table.

<math>A = f(L,W)</math>

We can write the variance of the area

:<math>\sigma^2_A = \frac{\sum_{i=1}^{i=N} (A_i - \bar{A})^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} + (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar WW} \right] ^2}{N}</math>

:<math>= \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \right ] ^2}{N} + \frac{\sum_{i=1}^{i=N} \left [ (W-\bar W) \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right] ^2 }{N}</math>

:<math>+2 \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \frac{\partial A}{\partial L} \bigg |_{\bar L \bar W} \frac{\partial A}{\partial W} \bigg |_{\bar L \bar W} \right]^2}{N} </math>

: <math>= \sigma^2_L \left ( \frac{\partial A}{\partial L} \right )^2 +\sigma^2_W \left ( \frac{\partial A}{\partial W} \right )^2 + 2 \sigma^2_{LW} \frac{\partial A}{\partial L} \frac{\partial A}{\partial W} </math>

where
<math>\sigma^2_{LW} = \frac{\sum_{i=1}^{i=N} \left [ (L-\bar{L}) (W-\bar W) \right ]^2}{N}</math> is defined as the '''Covariance''' between <math>L</math> and <math>W</math>.

= Weighted Mean and variance =

The variance <math>(\sigma)</math> in the above examples was assumed to be the same for all measurement from the parent distribution.

What happens when you wish to combine measurements with unequal variances (different experiments measuring the same quantity)?

== Weighted Mean==

Let's assume we have a measured quantity having a mean <math> X</math> from a Gaussian parent distribution.

If you attempt to measure X with several different experiments you will likely have a series of results which vary in their precision.

Lets assume you have 2 experiments which obtained the averages <math>X_A</math> and <math>X_B</math>.

If we assume that each measurement is governed by a Gaussian distribution,

Then the probability of one experiment observing the value X_A is given by

:<math>P(x=X_A) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A}</math>

similarly the probability of the other experiment observing the average X_B is

:<math>P(x=X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B}</math>

Now the combined probability that the first experiment measures the average <math>X_A</math> and the second <math>X_B</math> is given as the product of the two probabilities suth that

:<math>P(x=X_A,X_B) \propto \frac{e^{-\frac{1}{2} \left ( \frac{X_A-X}{\sigma_A}\right )^2}}{\sigma_A} \frac{e^{-\frac{1}{2} \left ( \frac{X_B-X}{\sigma_B}\right )^2}}{\sigma_B} = \frac{e^{-\frac{1}{2}\left [ \left ( \frac{X_A-x}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2\right ]}}{\sigma_A \sigma_B}\equiv \frac{e^{-\frac{1}{2}\left [ \chi^2\right ]}}{\sigma_A \sigma_B}</math>

where

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2</math>

;The principle of maximum likelihood (to be the cornerstone of hypothesis testing) may be written as
:The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Applying this principle to the two experiments means that the best estimate of <math>X</math> is made when <math>P(x=X_A,X_B)</math> is a maximum which occurs when

:<math> \chi^2 \equiv \left ( \frac{X_A-X}{\sigma_A}\right )^2+\left ( \frac{X_B-X}{\sigma_B}\right )^2 = </math>Minimum

or

: <math>\frac{\partial \chi^2}{\partial X} =2 \left ( \frac{X_A-X}{\sigma_A^2}\right )(-1)+2 \left ( \frac{X_B-X}{\sigma_B^2}\right )(-1)= 0</math>

:<math>\Rightarrow X = \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}</math>

If each observable (<math>x_i</math>) is accompanied by an estimate of the uncertainty in that observable (<math>\sigma_i</math>) then the weighted mean is defined as

:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\sigma_i^2}}{\sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}}</math>

==Weighted Variance==

To determine the variance of the measurements you should follow the Taylor series based prescription denoted above in that

:<math>\sigma^2 = \sum \sigma_i^2 \left ( \frac{\partial X}{\partial X_i}\right)^2 = \sigma_A^2\left ( \frac{\partial X}{\partial X_A}\right)^2 + \sigma_B^2\left ( \frac{\partial X}{\partial X_B}\right)^2</math> : Assuming no covariance

:<math>\frac{\partial X}{\partial X_A} = \frac{\partial}{\partial X_A} \frac{\frac{X_A}{\sigma_A^2} + \frac{X_B}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} = \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} </math>

: <math>\sigma^2 =\sigma_A^2 \left ( \frac{\frac{1}{\sigma_A^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2 + \sigma_B^2 \left ( \frac{\frac{1}{\sigma_B^2}}{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}} \right)^2</math>
: <math>= \frac{\frac{1}{\sigma_A^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2} + \frac{\frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2}}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})^2}</math>
: <math>= \frac{1}{(\frac{1}{\sigma_A^2} + \frac{1}{\sigma_B^2})}</math>

The variance of the distribution is defined as

:<math>\frac{1}{\sigma^2} = \sum_{i=1}^{i=n} \frac{1}{\sigma_i^2}</math> = weighted variance

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

Forest ErrAna StatDist

2014-01-31T20:32:03Z

Stocjas2: /* Arithmetic Mean and variance */

==Parent Distribution==

Let <math>x_i</math> represent our ith attempt to measurement the quantity <math>x</math>

Due to the random errors present in any experiment we should not expect <math>x_i = x</math>.

If we neglect systematic errors, then we should expect <math> x_i</math> to, on average, follow some probability distribution around the correct value <math>x</math>.

This probability distribution can be referred to as the "parent population".

== Average and Variance ==
===Average===

The word "average" is used to describe a property of a "parent" probability distribution or a set of observations/measurements made in an experiment which gives an indication of a likely outcome of an experiment.

The symbol

: <math>\mu</math>

is usually used to represent the "mean" of a known probability (parent) distribution (parent mean) while the "average" of a set of observations/measurements is denoted as

: <math>\bar{x}</math>

and is commonly referred to as the "sample" average or "sample mean".

Definition of the mean
:<math>\mu \equiv \lim_{N\rightarrow \infty} \frac{\sum x_i}{N}</math>

Here the above average of a parent distribution is defined in terms of an infinite sum of observations (x_i) of an observable x divided by the number of observations.

<math>\bar{x}</math> is a calculation of the mean using a finite number of observations

:<math> \bar{x} \equiv \frac{\sum x_i}{N}</math>

This definition uses the assumption that the result of an experiment, measuring a sample average of <math>(\bar{x})</math>, asymptotically approaches the "true" average of the parent distribution <math>\mu</math> .

===Variance===

The word "variance" is used to describe a property of a probability distribution or a set of observations/measurements made in an experiment which gives an indication how much an observation will deviate from and average value.

A deviation <math>(d_i)</math> of any measurement <math>(x_i)</math> from a parent distribution with a mean <math>\mu</math> can be defined as

:<math>d_i\equiv x_i - \mu</math>

the deviations should average to ZERO for an infinite number of observations by definition of the mean.

Definition of the average
:<math>\mu \equiv \lim_{N\rightarrow \infty} \frac{\sum x_i}{N}</math>

:<math>\lim_{N\rightarrow \infty} \frac{\sum (x_i - \mu)}{N}</math>
: <math>= \left ( \lim_{N\rightarrow \infty} \frac{\sum (x_i }{N}\right ) - \mu</math>
: <math>= \left ( \lim_{N\rightarrow \infty} \frac{\sum (x_i }{N}\right ) - \lim_{N\rightarrow \infty} \frac{\sum x_i}{N} = 0</math>

But the AVERAGE DEVIATION <math>(\bar{d})</math> is given by an average of the magnitude of the deviations given by

:<math>\bar{d} = \lim_{N\rightarrow \infty} \frac{\sum \left | (x_i - \mu)\right |}{N}</math> = a measure of the dispersion of the expected observations about the mean

Taking the absolute value though is cumbersome when performing a statistical analysis so one may express this dispersion in terms of the variance

A typical variable used to denote the variance is

:<math>\sigma^2</math>

and is defined as

:<math>\sigma^2 = \lim_{N\rightarrow \infty}\left [ \frac{\sum (x_i-\mu)^2 }{N}\right ]</math>

====Standard Deviation====

The standard deviation is defined as the square root of the variance

:S.D. = <math>\sqrt{\sigma^2}</math>

The mean should be thought of as a parameter which characterizes the observations we are making in an experiment. In general the mean specifies the probability distribution that is representative of the observable we are trying to measure through experimentation.

The variance characterizes the uncertainty associated with our experimental attempts to determine the "true" value. Although the mean and true value may not be equal, their difference should be less than the uncertainty given by the governing probability distribution.

==== Another Expression for Variance====

Using the definition of variance (omitting the limit as <math>n \rightarrow \infty</math>)
;Evaluating the definition of variance: <math>\sigma^2 \equiv \frac{\sum(x_i-\mu)^2}{N} = \frac{\sum (x_i^2 -2x_i \mu + \mu^2)}{N} = \frac{\sum x_i^2}{N} - 2 \mu \frac{\sum x_i}{N} + \frac{N \mu^2}{N} </math>
:<math> = \frac{\sum x_i^2}{N} -2 \mu^2 + \mu^2 =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\frac{\sum(x_i-\mu)^2}{N} =\frac{\sum x_i^2}{N} - \mu^2</math>

You can recast the above in terms of expectation value where

<math>E[x] \equiv \sum x_i P_x(x)</math>

:<math>\Rightarrow \sigma^2 = E[(x-\mu)^2] = \sum_{x=0}^n (x_i - \mu)^2 P(x_i)</math>
: <math>= E[x^2] - \left ( E[x]\right )^2 = \sum_{x=0}^n x_i^2 P(x_i) - \left ( \sum_{x=0}^n x_i P(x_i)\right )^2</math>

== Average for an unknown probability distribution (parent population)==

If the "Parent Population" is not known, you are just given a list of numbers with no indication of the probability distribution that they were drawn from, then the average and variance may be calculate as shown below.

===Arithmetic Mean and variance===

If <math>n</math> observables are mode in an experiment then the arithmetic mean of those observables is defined as

:<math>\bar{x} = \frac{\sum_{i=1}^{i=N} x_i}{N}</math>

The "unbiased" variance of the above sample is defined as

:<math>s^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N-1}</math>

;If you were told that the average is <math>\bar{x}</math> then you can calculate the
"true" variance of the above sample as

:<math>\sigma^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N}</math> = RMS Error= Root Mean Squared Error

;Note:RMS = Root Mean Square = <math>\frac{\sum_i^n x_i^2}{N}</math> =

==== Statistical Variance decreases with N====

The repetition of an experiment can decrease the STATISTICAL error of the experiment

Consider the following:

The average value of the mean of a sample of n observations drawn from the parent population is the same as the average value of each observation. (The average of the averages is the same as one of the averages)

: <math>\bar{x} = \frac{\sum x_i}{N} =</math> sample mean

:<math>\overline{\left ( \bar{x} \right ) } = \frac{\sum{\bar{x}_i}}{N} =\frac{1}{N} N \bar{x_i} = \bar{x}</math> if all means are the same

This is the reason why the sample mean is a measure of the population average ( <math>\bar{x} \sim \mu</math>)

Now consider the variance of the average of the averages (this is not the variance of the individual measurements but the variance of their means)

:<math>\sigma^2_{\bar{x}} = \frac{\sum \left (\bar{x} -\overline{\left ( \bar{x} \right ) } \right )^2}{N} =\frac{\sum \bar{x_i}^2}{N} -\left( \overline{\left ( \bar{x} \right ) } \right )^2</math>
:<math>=\frac{\sum \bar{x_i}^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{\sum \left( \sum \frac{x_i}{N}\right)^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2}\frac{\sum \left( \sum x_i\right)^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2}\frac{\sum \left (\sum x_i^2 + \sum_{i \ne j} x_ix_j \right )}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ \frac{\sum \left(\sum x_i^2 \right)}{N} + \frac{ \sum \left (\sum_{i \ne j} x_ix_j \right )}{N} \right ] -\left( \bar{x} \right )^2</math>

;If the measurements are all independent
:Then <math> \frac{\sum_{i \ne j} x_i x_j}{N} = \frac{\sum x_i}{N} \frac{ \sum x_j}{N}</math> : if <math>x_i</math> is independent of <math>x_j</math> (<math>i \ne j</math>)
:<math>= \left ( \frac{\sum x_i}{N} \right)^2 = \bar{x}^2</math>

example:
:(x_1x_2 + x_1x_3 + x_2x_1+x_2x_3+x_3x_1+x_3x_2+ ...) = (x_1+x_2+x_3)
The above part of the proof needs work

:<math>\sigma^2_{\bar{x}}=\frac{1}{N^2} \left [ \frac{\sum \left(\sum x_i^2 \right)}{N} + \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>

I use the expression <math>\sigma^2 = E[x^2] - \left ( E[x] \right)^2</math> again, except for<math> x_i</math> and not <math>\bar{x}</math> and turn it around so

: <math>\frac{\left(\sum x_i^2 \right)}{N} = \sigma^2 + \left ( \frac{\sum x_i}{N}\right)^2</math>

Now I have

:<math>\sigma^2_{\bar{x}}=\frac{1}{N^2} \left [ \sum \left (\sigma^2 + \left ( \frac{\sum x_i}{N} \right )^2 \right )+ \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + N(N-1) \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math> Number of cross terms is N*(N-1)
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + (N^2 -N) \left ( \frac{\sum x_i}{N} \right )^2 \right ] -\left( \bar{x} \right )^2</math> Number of cross terms is N*(N-1)
:<math>= \left [ \frac{\sigma^2}{N} + \left ( \frac{\sum x_i}{N} \right )^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>= \left [ \frac{\sigma^2}{N} + \left ( \bar{x}\right )^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>= \frac{\sigma^2}{N} </math>

The above is the essence of counting statistics.

It says that the STATISTICAL error in an experiment decreases as a function of <math>\frac{1}{\sqrt N}</math>

==== Biased and Unbiased variance====

Where does this idea of an unbiased variance come from?

Using the same procedure as the previous section let's look at the average variance of the variances.

A sample variance of <math>n</math> measurements of <math>x_i</math> is
: <math>\sigma_n^2 = \frac{\sum(x_i-\bar{x})^2}{n} = E[x^2] - \left ( E[x] \right)^2 = \frac{\sum x_i^2}{n} -\left ( \bar{x} \right)^2</math>

To determine the "true" variance consider taking average of several sample variances (this is the same argument used above which let to <math>\overline{(\bar{x})} = \bar{x}</math> )

:<math>\frac{\sum \sigma_i^2}{N} = E[\sigma^2] - \left ( E[\sigma] \right)^2 = \frac{ \sum_j \left [ \frac{\sum_i x_i^2}{n} -\left ( \bar{x} \right)^2 \right ]_j}{N}</math>
:<math>= \frac{1}{n}\sum_i \left ( \frac{\sum_j x_j^2}{N} \right )_i - \frac {\sum_j \left ( \bar{x} \right)^2 }{N}</math>
:<math>= \frac{1}{n}\sum_i \left ( \frac{\sum_j x_j^2}{N} \right )_i - \left [ \left ( \frac {\sum_j \bar{x}}{N} \right)^2 + \sigma_{\bar{x}}^2\right ]</math> : as shown previously <math>E[\bar{x}^2] = \left ( E[\bar{x}] \right )^2 + \sigma_{\bar{x}}^2</math>
:<math>= \frac{1}{n}\sum_i \left ( \left [ \left (\frac{\sum_j x_j}{N}\right)^2 + \sigma^2 \right ]\right )_i - \left [ \left ( \frac {\sum_j x_j}{N} \right)^2 + \frac{\sigma^2}{n}\right ]</math> : also shown previously<math>\overline{\left ( \bar{x} \right ) } = \bar{x}</math> the universe average is the same as the sample average
:<math>= \frac{1}{n} \left ( n\left [ \left (\frac{\sum_j x_j}{N}\right)^2 + n\sigma^2 \right ]\right )_i - \left [ \left ( \frac {\sum_j x_j}{N} \right)^2 + \frac{\sigma^2}{n}\right ]</math>
:<math>= \sigma^2 - \frac{\sigma^2}{n}</math>
: <math>= \frac{n-1}{n}\sigma^2</math>

:<math>\Rightarrow \sigma^2 = \frac{n}{n-1}\frac{\sum \sigma_i^2}{N}</math>

Here

:<math>\sigma^2 =</math> the sample variance

:<math>\frac{\sum \sigma_i^2}{N} =</math> an average of all possible sample variance which should be equivalent to the "true" population variance.

:<math>\Rightarrow \frac{\sum \sigma_i^2}{N} \sim \sum \frac{x_i-\bar{x}}{n}</math> : if all the variances are the same this would be equivalent

: <math>\sigma^2 = \frac{n}{n-1}\frac{\sum(x_i-\bar{x})}{n}</math>
: <math>= \frac{\sum(x_i-\bar{x})}{n-1} =</math> unbiased sample variance

==Probability Distributions==

=== Mean(Expectation value) and variance===
====Mean of Discrete Probability Distribution====

In the case that you know the probability distribution you can calculate the mean<math> (\mu)</math> or expectation value E(x) and standard deviation as

For a Discrete probability distribution

<math>\mu = E[x]=\lim_{N \rightarrow \infty} \frac{\sum_{i=1}^n x_i N P(x_i)}{N}</math>

where

<math>N=</math> number of observations

<math>n=</math> number of different possible observable variables

<math>x_i =</math> ith observable quantity

<math>P(x_i) =</math> probability of observing <math>x_i</math> = Probability Mass Distribution for a discrete probability distribution

====Mean of a continuous probability distibution====
The average (mean) of a sample drawn from any probability distribution is defined in terms of the expectation value E(x) such that

The expectation value for a continuous probability distribution is calculated as

: <math>\mu = E(x) = \int_{-\infty}^{\infty} x P(x)dx</math>

===Variance===

==== Variance of a discrete PDF====

<math>\sigma^2 = \sum_{i=1}^n \left [ (x_i - \mu)^2 P(x_i)\right ]</math>

==== Variance of a Continuous PDF ====

<math>\sigma^2 = \int_{-\infty}^{\infty} \left [ (x - \mu)^2 P(x)\right ]dx</math>

==== Expectation of Arbitrary function====

If <math>f(x)</math> is an arbitrary function of a variable <math>x</math> governed by a probability distribution <math>P(x)</math>

then the expectation value of <math>f(x)</math> is

<math>E[f(x)] = \sum_{i=1}^N f(x_i) P(x_i) </math>

or if a continuous distribtion

<math>E[f(x)] = \int_{-\infty}^{\infty} f(x) P(x)dx</math>

===Uniform===

The Uniform probability distribution function is a continuous probability function over a specified interval in which any value within the interval has the same probability of occurring.

Mathematically the uniform distribution over an interval from a to b is given by

:<math>P_U(x) =\left \{ {\frac{1}{b-a} \;\;\;\; x >a \mbox{ and } x b \mbox{ or } x < a} \right .</math>

====Mean of Uniform PDF====

:<math>\mu = \int_{-\infty}^{\infty} xP_U(x)dx = \int_{a}^{b} \frac{x}{b-a} dx = \left . \frac{x^2}{2(b-a)} \right |_a^b = \frac{1}{2}\frac{b^2 - a^2}{b-a} = \frac{1}{2}(b+a)</math>

====Variance of Uniform PDF====

:<math>\sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2 P_U(x)dx = \int_{a}^{b} \frac{\left (x-\frac{b+a}{2}\right )^2}{b-a} dx = \left . \frac{(x -\frac{b+a}{2})^3}{3(b-a)} \right |_a^b </math>
:<math>=\frac{1}{3(b-a)}\left [ \left (b -\frac{b+a}{2} \right )^3 - \left (a -\frac{b+a}{2} \right)^3\right ]</math>
:<math>=\frac{1}{3(b-a)}\left [ \left (\frac{b-a}{2} \right )^3 - \left (\frac{a-b}{2} \right)^3\right ]</math>
:<math>=\frac{1}{24(b-a)}\left [ (b-a)^3 - (-1)^3 (b-a)^3\right ]</math>
:<math>=\frac{1}{12}(b-a)^2</math>

Now use ROOT to generate uniform distributions.
http://wiki.iac.isu.edu/index.php/TF_ErrAna_InClassLab#Day_3

===Binomial Distribution===

Binomial random variable describes experiments in which the outcome has only 2 possibilities. The two possible outcomes can be labeled as "success" or "failure". The probabilities may be defined as

;p
: the probability of a success

and

;q
:the probability of a failure.

If we let <math>X</math> represent the number of successes after repeating the experiment <math>n</math> times

Experiments with <math>n=1</math> are also known as Bernoulli trails.

Then <math>X</math> is the Binomial random variable with parameters <math>n</math> and <math>p</math>.

The number of ways in which the <math>x</math> successful outcomes can be organized in <math>n</math> repeated trials is

:<math>\frac{n !}{ \left [ (n-x) ! x !\right ]}</math> where the <math> !</math> denotes a factorial such that <math>5! = 5\times4\times3\times2\times1</math>.

The expression is known as the binomial coefficient and is represented as

<math>{n\choose x}=\frac{n!}{x!(n-x)!}</math>

The probability of any one ordering of the success and failures is given by

<math>P( \mbox{experimental ordering}) = p^{x}q^{n-x}</math>

This means the probability of getting exactly k successes after n trials is

:<math>P_B(x) = {n\choose x}p^{x}q^{n-x} </math>

==== Mean====

It can be shown that the Expectation Value of the distribution is

:<math>\mu = n p</math>

:<math>\mu = \sum_{x=0}^n x P_B(x) = \sum_{x=0}^n x \frac{n!}{x!(n-x)!} p^{x}q^{n-x}</math>
:<math> = \sum_{x=1}^n \frac{n!}{(x-1)!(n-x)!} p^{x}q^{n-x}</math> :summation starts from x=1 and not x=0 now
:<math> = np \sum_{x=1}^n \frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}q^{n-x}</math> :factor out <math>np</math> : replace n-1 with m everywhere and it looks like binomial distribution
:<math> = np \sum_{y=0}^{n-1} \frac{(n-1)!}{(y)!(n-y-1)!} p^{y}q^{n-y-1}</math> :change summation index so y=x-1, now n become n-1
:<math> = np \sum_{y=0}^{n-1} \frac{(n-1)!}{(y)!(n-1-y)!} p^{y}q^{n-1-y}</math> :
:<math> = np (q+p)^{n-1}</math> :definition of binomial expansion
:<math> = np 1^{n-1}</math> :q+p =1
:<math> = np </math>

====variance ====

:<math>\sigma^2 = npq</math>

;Remember: <math>\frac{\sum(x_i-\mu)^2}{N} = \frac{\sum (x_i^2 -2x_i \mu + \mu^2)}{N} = \frac{\sum x_i^2}{N} - 2 \mu \frac{\sum x_i}{N} + \frac{N \mu^2}{N} </math>
:<math> = \frac{\sum x_i^2}{N} -2 \mu^2 + \mu^2 =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\frac{\sum(x_i-\mu)^2}{N} =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\Rightarrow \sigma^2 = E[(x-\mu)^2] = \sum_{x=0}^n (x_i - \mu)^2 P_B(x_i)</math>
: <math>= E[x^2] - \left ( E[x]\right )^2 = \sum_{x=0}^n x_i^2 P_B(x_i) - \left ( \sum_{x=0}^n x_i P_B(x_i)\right )^2</math>

To calculate the variance of the Binomial distribution I will just calculate <math>E[x^2]</math> and then subtract off <math>\left ( E[x]\right )^2</math>.

:<math>E[x^2] = \sum_{x=0}^n x^2 P_B(x)</math>
: <math>= \sum_{x=1}^n x^2 P_B(x)</math> : x=0 term is zero so no contribution
:<math>=\sum_{x=1}^n x^2 \frac{n!}{x!(n-x)!} p^{x}q^{n-x}</math>
: <math>= np \sum_{x=1}^n x \frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}q^{n-x}</math>

Let m=n-1 and y=x-1

: <math>= np \sum_{y=0}^n (y+1) \frac{m!}{(y)!(m-1-y+1)!} p^{y}q^{m-1-y+1}</math>
: <math>= np \sum_{y=0}^n (y+1) P(y)</math>
: <math>= np \left ( \sum_{y=0}^n y P(y) + \sum_{y=0}^n (1) P(y) \right)</math>
: <math>= np \left ( mp + 1 \right)</math>
: <math>= np \left ( (n-1)p + 1 \right)</math>

:<math>\sigma^2 = E[x^2] - \left ( E[x] \right)^2 = np \left ( (n-1)p + 1 \right) - (np)^2 = np(1-p) = npq</math>

=== Examples===

==== The number of times a coin toss is heads.====

The probability of a coin landing with the head of the coin facing up is

:<math>P = \frac{\mbox{number of desired outcomes}}{\mbox{number of possible outcomes}} = \frac{1}{2}</math> = Uniform distribution with a=0 (tails) b=1 (heads).

Suppose you toss a coin 4 times. Here are the possible outcomes

{| border="1" |cellpadding="20" cellspacing="0
|order Number
|colspan= "4" | Trial #
| # of Heads
|-
| || 1|| 2 || 3|| 4 ||
|-
|1 ||t || t || t|| t ||0
|-
|2||h || t || t|| t ||1
|-
|3||t || h || t|| t ||1
|-
|4||t || t || h|| t ||1
|-
|5||t || t || t|| h ||1
|-
|6||h || h || t|| t ||2
|-
|7||h || t || h|| t ||2
|-
|8||h || t || t|| h||2
|-
|9||t || h || h|| t ||2
|-
|10||t || h || t|| h ||2
|-
|11||t || t || h|| h ||2
|-
|12||t|| h || h|| h||3
|-
|13||h|| t || h|| h||3
|-
|14||h|| h || t|| h||3
|-
|15||h|| h || h|| t||3
|-
|16||h|| h || h|| h||4
|}

The probability of order #1 happening is

P( order #1) = <math>\left ( \frac{1}{2} \right )^0\left ( \frac{1}{2} \right )^4 = \frac{1}{16}</math>

P( order #2) = <math>\left ( \frac{1}{2} \right )^1\left ( \frac{1}{2} \right )^3 = \frac{1}{16}</math>

The probability of observing the coin land on heads 3 times out of 4 trials is.

<math>P(x=3) = \frac{4}{16} = \frac{1}{4} = {n\choose x}p^{x}q^{n-x} = \frac{4 !}{ \left [ (4-3) ! 3 !\right ]} \left ( \frac{1}{2}\right )^{3}\left ( \frac{1}{2}\right )^{4-3} = \frac{24}{1 \times 6} \frac{1}{16} = \frac{1}{4}</math>

==== A 6 sided die====

A die is a 6 sided cube with dots on each side. Each side has a unique number of dots with at most 6 dots on any one side.

P=1/6 = probability of landing on any side of the cube.

Expectation value :
; The expected (average) value for rolling a single die.
: <math>E({\rm Roll\ With\ 6\ Sided\ Die}) =\sum_i x_i P(x_i) =1 \left ( \frac{1}{6} \right) + 2\left ( \frac{1}{6} \right)+ 3\left ( \frac{1}{6} \right)+ 4\left ( \frac{1}{6} \right)+ 5\left ( \frac{1}{6} \right)+ 6\left ( \frac{1}{6} \right)=\frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5</math>

The variance:

: <math>E({\rm Roll\ With\ 6\ Sided\ Die}) =\sum_i (x_i - \mu)^2 P(x_i) </math>
: <math>= (1-3.5)^2 \left ( \frac{1}{6} \right) + (2-3.5)^2\left ( \frac{1}{6} \right)+ (3-3.5)^2\left ( \frac{1}{6} \right)+ (4-3.5)^2\left ( \frac{1}{6} \right)+ (5-3.5)^2\left ( \frac{1}{6} \right)+ (6-3.5)^2\left ( \frac{1}{6} \right) =2.92</math>
: <math>= \sum_i (x_i)^2 P(x_i) - \mu^2 = \left [ 1 \left ( \frac{1}{6} \right) + 4\left ( \frac{1}{6} \right)+ 9\left ( \frac{1}{6} \right)+ 16\left ( \frac{1}{6} \right)+ 25\left ( \frac{1}{6} \right)+ 36\left ( \frac{1}{6} \right) \right ] - (3.5)^3 =2.92</math>

If we roll the die 10 times what is the probability that X dice will show a 6?

A success will be that the die landed with 6 dots face up.

So the probability of this is 1/6 (p=1/6) , we toss it 10 times (n=10) so the binomial distribution function for a success/fail experiment says

<math>P_B(x) = {n\choose x}p^{x}q^{n-x} = \frac{10 !}{ \left [ (10-x) ! x !\right ]} \left ( \frac{1}{6}\right )^{x}\left ( \frac{5}{6}\right )^{10-x} </math>

So the probability the die will have 6 dots face up in 4/10 rolls is

<math>P_B(x=4) = \frac{10 !}{ \left [ (10-4) ! 4 !\right ]} \left ( \frac{1}{6}\right )^{4}\left ( \frac{5}{6}\right )^{10-4} </math>
:<math> = \frac{10 !}{ \left [ (6) ! 4 !\right ]} \left ( \frac{1}{6}\right )^{4}\left ( \frac{5}{6}\right )^{6} = \frac{210 \times 5^6}{6^10}=0.054 </math>

Mean = np =<math>\mu = 10/6 = 1.67</math>
Variance = <math>\sigma^2 = 10 (1/6)(5/6) = 1.38</math>

===Poisson Distribution===

The Poisson distribution is an approximation to the binomial distribution in the event that the probability of a success is quite small <math>(p \ll 1)</math>. As the number of repeated observations (n) gets large, the binomial distribution becomes more difficult to evaluate because of the leading term

:<math>\frac{n !}{ \left [ (n-x) ! x !\right ]}</math>

The poisson distribution overcomes this problem by defining the probability in terms of the average <math>\mu</math>.

:<math>P_P(x) = \frac{\mu^x e^{-\mu}}{x!}</math>

====Poisson as approximation to Binomial====

To drive home the idea that the Poisson distribution approximates a Binomial distribution at small p and large n consider the following derivation

The Binomial Probability Distriubtions is

:<math>P_B(x) = \frac{n!}{x!(n-x)!}p^{x}q^{n-x}</math>

The term

:<math> \frac{n!}{(n-x)!} = \frac{(n-x)! (n-x+1) (n-x + 2) \dots (n-1)(n)}{(n-x)!}</math>
:<math>= n (n-1)(n-2) \dots (n-x+2) (n-x+1)</math>

;IFF <math>x \ll n \Rightarrow</math> we have x terms above
:then <math>\frac{n!}{(n-x)!} =n^x</math>
:example:<math> \frac{100!}{(100-1)!} = \frac{99! \times 100}{99!} = 100^1</math>

This leave us with

:<math>P(x) = \frac{n^x}{x!}p^{x}q^{n-x}= \frac{(np)^x}{x!}(1-p)^{n-x}</math>
: <math>= \frac{(\mu)^x}{x!}(1-p)^{n}(1-p)^{-x}</math>

:<math>(1-p)^{-x} = \frac{1}{(1-p)^x} = 1+px = 1 : p \ll 1</math>

:<math>P(x) = \frac{(\mu)^x}{x!}(1-p)^{n}</math>

:<math>(1-p)^{n} = \left [(1-p)^{1/p} \right]^{\mu}</math>

: <math>\lim_{p \rightarrow 0} \left [(1-p)^{1/p} \right]^{\mu} = \left ( \frac{1}{e} \right)^{\mu} = e^{- \mu}</math>

;For <math>x \ll n</math>
:<math>\lim_{p \rightarrow 0}P_B(x,n,p ) = P_P(x,\mu)</math>

==== Derivation of Poisson Distribution====

The mean free path of a particle traversing a volume of material is a common problem in nuclear and particle physics. If you want to shield your apparatus or yourself from radiation you want to know how far the radiation travels through material.

The mean free path is the average distance a particle travels through a material before interacting with the material.
;If we let <math>\lambda</math> represent the mean free path
;Then the probability of having an interaction after a distance x is
: <math>\frac{x}{\lambda}</math>

as a result

: <math>1-\frac{x}{\lambda}= P(0,x, \lambda)</math> = probability of getting no events after a length dx

When we consider <math>\frac{x}{\lambda} \ll 1</math> ( we are looking for small distances such that the probability of no interactions is high)

:<math>P(0,x, \lambda) = e^{\frac{-x}{\lambda}} \approx 1 - \frac{x}{\lambda}</math>

Now we wish to find the probability of finding <math>N</math> events over a distance <math>x</math> given the mean free path.

This is calculated as a joint probability. If it were the case that we wanted to know the probability of only one interaction over a distance <math>L</math>. Then we would want to multiply the probability that an interaction happened after a distance <math>dx</math> by the probability that no more interactions happen by the time the particle reaches the distance <math>L</math>.

For the case of <math>N</math> interactions, we have a series of <math>N</math> interactions happening over N intervals of <math>dx</math> with the probability <math>dx/\lambda</math>

:<math>P(N,x,\lambda)</math> = probability of finding <math>N</math> events within the length <math>x</math>
: <math>= \frac{dx_1}{\lambda}\frac{dx_2}{\lambda}\frac{dx_3}{\lambda} \dots \frac{dx_N}{\lambda} e^{\frac{-x}{\lambda}}</math>

The above expression represents the probability for a particular sequence of events in which an interaction occurs after a distance <math>dx_1</math> then a interaction after <math>dx_2</math> , <math>\dots</math>

So in essence the above expression is a "probability element" where another probability element may be

: <math> P(N,x, \lambda)=\frac{dx_2}{\lambda}\frac{dx_1}{\lambda}\frac{dx_3}{\lambda} \dots \frac{dx_N}{\lambda} e^{\frac{-x}{\lambda}}</math>

where the first interaction occurs after the distance <math>x_2</math>.

: <math>= \Pi_{i=1}^{N} \left [ \frac{dx_i}{\lambda} \right ] e^{\frac{-x}{\lambda}}</math>

So we can write a differential probability element which we need to add up as

:<math>d^NP(N,x, \lambda)=\frac{1}{N!} \Pi_{i=1}^{N} \left [ \frac{dx_i}{\lambda} \right ] e^{\frac{-x}{\lambda}}</math>

The N! accounts for the degeneracy in which for every N! permutations there is really only one new combination. ie we are double counting when we integrate.

Using the integral formula
: <math> \Pi_{i=1}^{N} \left [\int_0^x \frac{dx_i}{\lambda} \right ]= \left [ \frac{x}{\lambda}\right]^N</math>

we end up with

<math>P(N,x, \lambda) = \frac{\left [ \frac{x}{\lambda}\right]^N}{N!} e^{\frac{-x}{\lambda}}</math>

====Mean of Poisson Dist====

:<math>\mu = \sum_{i=1}^{\infty} i P(i,x, \lambda)</math>
: <math>= \sum_{i=1}^{\infty} i \frac{\left [ \frac{x}{\lambda}\right]^i}{i!} e^{\frac{-x}{\lambda}}
= \frac{x}{\lambda} \sum_{i=1}^{\infty} \frac{\left [ \frac{x}{\lambda}\right]^{(i-1)}}{(i-1)!} e^{\frac{-x}{\lambda}} = \frac{x}{\lambda}
</math>

:<math>P_P(x,\mu) = \frac{\mu^x e^{-\mu}}{x!} </math>

====Variance of Poisson Dist====

For [http://wiki.iac.isu.edu/index.php/TF_ErrAna_Homework#Poisson_Prob_Dist Homework] you will show, in a manner similar to the above mean calculation, that the variance of the Poisson distribution is

:<math>\sigma^2 = \mu</math>

===Gaussian===

The Gaussian (Normal) distribution is an approximation of the Binomial distribution for the case of a large number of possible different observations. Poisson approximated the binomial distribution for the case when p<<1 ( the average number of successes is a lot smaller than the number of trials <math>(\mu = np)</math> ).

The Gaussian distribution is accepted as one of the most likely distributions to describe measurements.

A Gaussian distribution which is normalized such that its integral is unity is refered to as the Normal distribution. You could mathematically construct a Gaussian distribution which is not normalized to unity (this is often done when fitting experimental data).

:<math>P_G(x,\mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{x -\mu}{\sigma} \right) ^2}</math> = probability of observing <math>x</math> from a Gaussian parent distribution with a mean <math>\mu</math> and standard deviation <math>\sigma</math>.

==== Half-Width <math>\Gamma</math> (a.k.a. Full Width as Half Max)====

The half width <math>\Gamma</math> is used to describe the range of <math>x</math> through which the distributions amplitude decreases to half of its maximum value.

;ie: <math>P_G(\mu \pm \frac{\Gamma}{2}, \mu, \sigma) = \frac{P_G(\mu,\mu,\sigma)}{2}</math>

;Side note:the point of steepest descent is located at <math>x \pm \sigma</math> such that

; <math>P_G(\mu \pm \sigma, \mu, \sigma) = e^{1/2} P_G(\mu,\mu,\sigma)</math>

==== Probable Error (P.E.)====

The probable error is the range of <math>x</math> in which half of the observations (values of <math>x</math>) are expected to fall.

; <math>x= \mu \pm P.E.</math>

==== Binomial with Large N becomes Gaussian====

Consider the binomial distribution in which a fair coin is tossed a large number of times (N is very large and and EVEN number N=2n)

What is the probability you get exactly <math>\frac{1}{2}N -s</math> heads and <math>\frac{1}{2}N +s</math> tails where s is an integer?

The Binomial Probability distribution is given as

:<math>P_B(x) = {N\choose x}p^{x}q^{N-x} = \frac{N!}{x!(N-x)!}p^{x}q^{N-x}</math>

p = probability of success= 1/2

q= 1-p = 1/2

N = number of trials =2n

x= number of successes=n-s

:<math>P_B(n-s) = \frac{2n!}{(n-s)!(2n-n+s)!}p^{n-s}q^{2n-n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!}p^{n-s}q^{n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{n-s} \left(\frac{1}{2}\right)^{n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{2n}</math>

Now let's cast this probability with respect to the probability that we get an even number of heads and tails by defining the following ratio R such that

:<math>R \equiv \frac{P_B(n-s)}{P_B(n)}</math>

:<math>P_B(x=n) = \frac{N!}{n!(N-n)!}p^{n}q^{N-n} = \frac{(2n)!}{n!(n)!}p^{n}q^{n} = \frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n}</math>

:<math>R = \frac{\frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{2n}}{\frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n}} = \frac{n! n!}{(n-s)! (n+s)!}</math>

Take the natural logarithm of both sides

:<math> \ln (R) = \ln \left ( \frac{n! n!}{(n-s)! (n+s)!} \right) = \ln(n!)+\ln(n!) - \ln\left[(n-s)!\right ] - \ln \left[(n+s)!\right] = 2 \ln(n!) - \ln\left [ (n-s)! \right ] - \ln \left [ (n+s)! \right ]</math>

Stirling's Approximation says
:<math>n! \sim \left (2 \pi n\right)^{1/2} n^n e^{-n}</math>
:<math>\Rightarrow </math>
;<math>\ln(n!) \sim \ln \left [ \left (2 \pi n\right)^{1/2} n^n e^{-n}\right ] = \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +\ln\left [ n^n \right ] + \ln \left [e^{-n}\right ]</math>
:<math>= \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n\ln\left [ n \right ] + (-n)</math>
:<math>= \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n(\ln\left [ n \right ] -1 )</math>

similarly

:<math>\ln\left [(n-s)! \right ] \sim \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n-1)^{1/2} \right ] + (n-s)(\ln\left [ (n-s) \right ] -1 )</math>
:<math>\ln\left [(n+s)! \right ] \sim \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n+1)^{1/2} \right ] + (n+s)(\ln\left [ (n+s) \right ] -1 )</math>

:<math>\Rightarrow \ln (R) = 2 \times \left (\ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n(\ln\left [ n \right ] -1 ) \right ) </math>
:<math>- \left ( \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n-1)^{1/2} \right ] + (n-s)(\ln\left [ (n-s) \right ] -1 )\right )</math>
:<math> -\left ( \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n+1)^{1/2} \right ] + (n+s)(\ln\left [ (n+s) \right ] -1 )\right ) </math>
: <math>= 2 \ln \left [ n^{1/2} \right ] +2 n(\ln\left [ n \right ] -1 ) - \ln \left [ (n-1)^{1/2} \right ] - (n-s)(\ln\left [ (n-s) \right ] -1 ) -\ln \left [ (n+1)^{1/2} \right ] - (n+s)(\ln\left [ (n+s) \right ] -1 )</math>

: <math>\ln \left [ n^{1/2} \right ] = \ln \left [ (n-1)^{1/2} \right ] = \ln \left [ (n+1)^{1/2} \right ]</math> For Large n

:<math> \ln (R) = 2 n(\ln\left [ n \right ] -1 ) - (n-s)(\ln\left [ (n-s) \right ] -1 ) - (n+s)(\ln\left [ (n+s) \right ] -1 )</math>
:<math> =2 n(\ln\left [ n \right ] -1 ) - (n-s)(\ln\left [ n(1-s/n) \right ] -1 ) - (n+s)(\ln\left [ n(1+s/n) \right ] -1 )</math>
: <math>= 2n \ln (n) - 2n - (n-s) \left [ \ln (n) + \ln (1-s/n) -1\right ] - (n+s) \left [ \ln (n) + \ln (1+s/n) -1\right ]</math>
: <math>= - 2n - (n-s) \left [ \ln (1-s/n) -1\right ] - (n+s) \left [ \ln (1+s/n) -1\right ]</math>
: <math>= - (n-s) \left [ \ln (1-s/n) \right ] - (n+s) \left [ \ln (1+s/n) \right ]</math>

If <math>-1 < s/n \le 1</math>

Then

: <math>\ln (1+s/n) = s/n - \frac{s^2}{2n^2} + \frac{s^3}{3 n^3} \dots</math>

<math>\Rightarrow</math>

: <math>\ln(R) =- (n-s) \left [ -s/n - \frac{s^2}{2n^2} - \frac{s^3}{3 n^3} \right ] - (n+s) \left [ s/n - \frac{s^2}{2n^2} + \frac{s^3}{3 n^3} \right ]</math>
: <math>= - \frac{s^2}{n} = - \frac{2s^2}{N}</math>

or

<math>R \sim e^{-2s^2/N}</math>

as a result

:<math>P(n-s) = R P_B(n)</math>

:<math> P_B(x=n)= \frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n} = \frac{(\left ( \left (2 \pi 2n\right)^{1/2} (2n)^{2n} e^{-2n}\right ) }{\left(\left (2 \pi n\right)^{1/2} n^n e^{-n}\right ) \left ( \left (2 \pi n\right)^{1/2} n^n e^{-n}\right)} \left(\frac{1}{2}\right)^{2n}</math>
:<math>= \left(\frac{1}{\pi n} \right )^{1/2} = \left(\frac{2}{\pi N} \right )^{1/2}</math>

<math>P(n-s) = \left(\frac{2}{\pi N} \right )^{1/2} e^{-2s^2/N}</math>

In binomial distributions

<math>\sigma^2 = Npq = \frac{N}{4}</math> for this problem

or

<math>N = 4 \sigma^2</math>

<math>P(n-s) = \left(\frac{2}{\pi 4 \sigma^2} \right )^{1/2} e^{-2s^2/N} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{2s^2}{4 \sigma^2}} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{s}{\sigma} \right) ^2}</math>
= probability of exactly <math>(\frac{N}{2} -s)</math> heads AND <math>(\frac{N}{2} +s)</math> tails after flipping the coin N times (N is and even number and s is an integer).

If we let <math>x = n-s</math> and realize that for a binomial distributions

<math>\mu = Np = N/2 = n</math>

Then

<math>P(x) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{n-x}{\sigma} \right) ^2} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right) ^2}</math>

; So when N gets big the Gaussian distribution is a good approximation to the Binomianl

==== Gaussian approximation to Poisson when <math>\mu \gg 1</math> ====

:<math>P_P(r) = \frac{\mu^r e^{-\mu}}{r!}</math> = Poisson probability distribution

substitute

<math>x \equiv r - \mu</math>

:<math>P_P(x + \mu) = \frac{\mu^{x + \mu} e^{-\mu}}{(x+\mu)!} = e^{-\mu} \frac{\mu^{\mu} \mu^x}{(\mu + x)!} = e^{-\mu} \mu^{\mu}\frac{\mu^x}{(\mu)! (\mu+1) \dots (\mu+x)}</math>
:<math> = e^{-\mu} \frac{\mu^{\mu}}{\mu!} \left [ \frac{\mu}{(\mu+1)} \cdot \frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] </math>

:<math>e^{-\mu} \frac{\mu^{\mu}}{\mu!} = e^{-\mu} \frac{\mu^{\mu}}{\sqrt{2 \pi \mu} \mu^{\mu}e^{-\mu}}= \frac{1}{\sqrt{2 \pi \mu}}</math> '''Stirling's Approximation when <math>\mu \gg 1</math>'''

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} \left [ \frac{\mu}{(\mu+1)} \cdot \frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] </math>

:<math>\left [ \frac{\mu}{(\mu+1)} \cdot\frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] = \frac{1}{1 + \frac{1}{\mu}} \cdot \frac{1}{1 + \frac{2}{\mu}} \dots \frac{1}{1 + \frac{x}{\mu}}</math>

: <math>e^{x/\mu} \approx 1 + \frac{x}{\mu}</math> : if <math>x/\mu \ll 1</math> Note:<math>x \equiv r - \mu</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} \left [ \frac{1}{1 + \frac{1}{\mu}} \cdot \frac{1}{1 + \frac{2}{\mu}} \dots \frac{1}{1 + \frac{x}{\mu}} \right ] = \frac{1}{\sqrt{2 \pi \mu}} \left [ e^{-1/\mu} \times e^{-2/\mu} \cdots e^{-x/\mu} \right ] = \frac{1}{\sqrt{2 \pi \mu}} e^{-1 \left[ \frac{1}{\mu} +\frac{2}{\mu} \cdots \frac{x}{\mu} \right ]}</math>
: <math>= \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \sum_1^x i \right ]}</math>

another mathematical identity

:<math>\sum_{i=1}^{x} i = \frac{x}{2}(1+x)</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \frac{x}{2}(1+x) \right ]}</math>

if<math> x \gg 1</math> then

:<math>\frac{x}{2}(1+x) \approx \frac{x^2}{2}</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \frac{x^2}{2} \right ]} = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-x^2}{2\mu} }</math>

In the Poisson distribution

:<math>\sigma^2 = \mu</math>

replacing dummy variable x with r - \mu

:<math>P_P(r) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{-(r - \mu)^2}{2\sigma^2} } =\frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{r -\mu}{\sigma} \right) ^2}</math> = Gaussian distribution when <math>\mu \gg 1</math>

==== Integral Probability (Cumulative Distribution Function)====

The Poisson and Binomial distributions are discrete probability distributions (integers).

The Gaussian distribution is our first continuous distribution as the variables are real numbers. It is not very meaningful to speak of the probability that the variate (x) assumes a specific value.

One could consider defining a probability element <math>A_G</math> which is really an integral over a finite region <math>\Delta x</math> such that

:<math>A_G(\Delta x, \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} \int_{\mu - \Delta x}^{\mu + \Delta x} e^{- \frac{1}{2} \left ( \frac{x - \mu}{\sigma}\right )^2} dx</math>

The advantage of this definition becomes apparent when you are interesting in quantifying the probability that a measurement would fall outside a range <math>\Delta x</math>.

: <math>P_G( x - \Delta x > x > x + \Delta x) = 1 - A_G(\Delta x, \mu, \sigma)</math>

The Cumulative Distribution Function (CDF), however, is defined in terms of the integral from the variates min value

:<math>CDF \equiv \int_{x_{min}}^{x} P_G( x, \mu, \sigma) = \int_{-\infty}^{x} P_G( x, \mu, \sigma) = P_G(X \le x) =</math> Probability that you measure a value less than or equal to <math>x</math>

===== discrete CDF example =====

The probability that a student fails this class is 7.3%.

What is the probability more than 5 student will fail in a class of 32 students?

Answ: <math>P_B(x\ge 5) = \sum_{x=5}^{32} P_B(x) = CDF( x \ge 5) = 1- \sum_{x=0}^4 P_B(x) = 1 - CDF(x<5) </math>
:<math>= 1 - P_B(x=0)- P_B(x=1)- P_B(x=2)- P_B(x=3)- P_B(x=4)</math>
: <math>= 1 - 0.088 - 0.223 - 0.272 - 0.214 - 0.122 = 0.92 \Rightarrow P_B(x \ge 5) = 0.08</math>= 8%

There is an 8% probability that 5 or more student will fail the class

===== 2 SD rule of thumb for Gaussian PDF =====

In the above example you calculated the probability that more than 5 student will fail a class. You can extend this principle to calculate the probability of taking a measurement which exceeds the expected mean value.

One of the more common consistency checks you can make on a sample data set which you expect to be from a Gaussian distribution is to ask how many data points appear more than 2 S.D. (<math>\sigma</math>) from the mean value.

The CDF for this is
: <math>P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \int_{-\infty}^{\mu - 2\sigma} P_G(x, \mu, \sigma) dx</math>
: <math>= \frac{1}{\sigma \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} e^{- \frac{1}{2} \left ( \frac{x - \mu}{\sigma}\right )^2} dx</math>

Let

: <math>z = \frac{x-2}{\sigma}</math>
: <math>dz = \frac{dx}{\sigma}</math>

: <math>\Rightarrow P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \frac{1}{ \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} e^{- \frac{z^2}{2} } dz</math>

The above integral can only be done numerically by expanding the exponential in a power series

:<math>e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}</math>
:<math>\Rightarrow e^{-x} = 1 -x + \frac{x^2}{2!} - \frac{x^3}{3!} \cdots</math>
:<math>\Rightarrow e^{-z^2/2} = 1 -\frac{z^2}{2}+ \frac{z^4}{8} - \frac{z^6}{48} \cdots</math>

: <math>P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \frac{1}{ \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} \left ( 1 -\frac{z^2}{2}+ \frac{z^4}{8} - \frac{z^6}{48} \cdots \right)dz</math>
:<math> = \left . \frac{1}{ \sqrt{2 \pi}} \left ( z -\frac{z^3}{6}+ \frac{z^5}{40} - \frac{z^7}{48 \times 7} \cdots \right ) \right |_{-\infty}^{\mu - 2\sigma}</math>
: <math>=\left . \frac{1}{\pi} \sum_{j=0}^{\infty} \frac{(-1)^j \left (\frac{x}{\sqrt{2}} \right)^{2j+1}}{j! (2j+1)} \right |_{x=\mu - 2\sigma}</math>

No analytical for the probability but one which you can compute.

Below is a table representing the cumulative probability <math>P_G(x< \mu - \delta \mbox{ and } x> \mu + \delta , \mu, \sigma)</math> for events to occur outside and interval of <math>\pm \delta</math> in a Gaussian distribution

{| border="1" |cellpadding="20" cellspacing="0
|-
|<math>P_G(x< \mu - \delta \mbox{ and } x> \mu + \delta , \mu, \sigma)</math> || <math>\delta</math>
|-
|<math>3.2 \times 10^{-1}</math> ||1<math>\sigma</math>
|-
|<math>4.4 \times 10^{-2}</math> ||2<math>\sigma</math>
|-
|<math>2.7 \times 10^{-3}</math> ||3<math>\sigma</math>
|-
|<math>6.3 \times 10^{-5}</math> ||4<math>\sigma</math>
|}

[[File:TF_Error_CDF_Gauss.png| 400 px]]

===Cauchy/Lorentzian/Breit-Wigner Distribution===
In Mathematics, the Cauchy distribution is written as
:<math>P_{CL}(x, x_0, \Gamma) = \frac{1}{\pi} \frac{\Gamma/2}{(x -x_0)^2 + (\Gamma/2)^2}</math> = Cauchy-Lorentian Distribution

:Note; The probability does not fall as rapidly to zero as the Gaussian. As a result, the Gaussian's central peak contributes more to the area than the Lorentzian's.

This distribution happens to be a solution to physics problems involving forced resonances (spring systems driven by a source, or a nuclear interaction which induces a metastable state).

:<math>P_{BW} = \sigma(E)= \frac{1}{2\pi}\frac{\Gamma}{(E-E_0)^2 + (\Gamma/2)^2}</math> = Breit-Wigner distribution

:<math>E_0 =</math> mass resonance
:<math>\Gamma = </math>FWHM
: <math>\Delta E \Delta t = \Gamma \tau = \frac{h}{2 \pi}</math> = uncertainty principle
:<math>\tau=</math>lifetime of resonance/intermediate state particle

A Beit-Wigner function fit to cross section measured as a function of energy will allow one to evaluate the rate increases that are produced when the probing energy excites a resonant state that has a mass <math>E_0</math> and lasts for the time <math>\tau</math> derived from the Half Width <math>\Gamma</math>.

==== mean====

Mean is not defined

Mode = Median = <math>x_0</math> or <math>E_0</math>

==== Variance ====

The variance is also not defined but rather the distribution is parameterized in terms of the Half Width <math>\Gamma</math>

Let
:<math>z = \frac{x-\mu}{\Gamma/2}</math>

Then

:<math>\sigma^2 = \frac{\Gamma^2}{4\pi} \int_{-\infty}^{\infty} \frac{z^2}{1+z^2} dz</math>

The above integral does not converge for large deviations <math>(x -\mu)</math> . The width of the distribution is instead characterized by <math>\Gamma</math> = FWHM

===Landau===

:<math>P_L(x) = \frac{1}{2 \pi i} \int_{c-i\infty}^{c+i\infty}\! e^{s \log s + x s}\, ds </math>
where <math>c</math> is any positive real number.

To simplify computation it is more convenient to use the equivalent expression

:<math>P_L(x) = \frac{1}{\pi} \int_0^\infty\! e^{-t \log t - x t} \sin(\pi t)\, dt.</math>

The above distribution was derived by Landau (L. Landau, "On the Energy Loss of Fast Particles by Ionization", J. Phys., vol 8 (1944), pg 201 ) to describe the energy loss by particles traveling through thin material ( materials with a thickness on the order of a few radiation lengths).

Bethe-Bloch derived an expression to determine the AVERAGE amount of energy lost by a particle traversing a given material <math>(\frac{dE}{dx})</math> assuming several collisions which span the physical limits of the interaction.

For the case a thin absorbers, the number of collisions is so small that the central limit theorem used to average over several collision doesn't apply and there is a finite possibility of observing large energy losses.

As a result one would expect a distribution which is Gaussian like but with a "tail" on the <math>\mu + \sigma</math> side of the distribution.

===Gamma===

:<math> P_{\gamma}(x,k,\theta) = x^{k-1} \frac{e^{-x/\theta}}{\theta^k \, \Gamma(k)}\text{ for } x > 0\text{ and }k, \theta > 0.\,</math>

where

::<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The distribution is used for "waiting time" models. How long do you need to wait for a rain storm, how long do you need to wait to die,...

Climatologists use this for predicting how rain fluctuates from season to season.

If <math>k =</math> integer then the above distribution is a sum of <math>k</math> independent exponential distributions

:<math> P_{\gamma}(x,k,\theta) = 1 - e^{-x/\theta} \sum_{j=0}^{k-1}\frac{1}{j!}\left ( \frac{x}{\theta}\right)^j </math>

==== Mean====

:<math>\mu = k \theta</math>

====Variance====

:<math>\sigma^2 = k \theta^2</math>

====Properties====

:<math>\lim_{X \rightarrow \infty} P_{\gamma}(x,k,\theta) = \left \{ {\infty \;\;\;\; k <1 \atop 0 \;\;\;\; k>1} \right .</math>
: <math>= \frac{1}{\theta} \;\; \mbox{if} k=1</math>

===Beta===
:<math> P_{\beta}(x;\alpha,\beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\int_0^1 u^{\alpha-1} (1-u)^{\beta-1}\, du} \!</math>

::<math>= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\, x^{\alpha-1}(1-x)^{\beta-1}\!</math>

::<math>= \frac{1}{\mathrm{B}(\alpha,\beta)}\, x
^{\alpha-1}(1-x)^{\beta-1}\!</math>

====Mean====

:<math>\mu = \frac{\alpha}{\alpha + \beta}</math>

==== Variance====

:<math>\sigma^2 = \frac{\alpha \beta }{(\alpha + \beta)^2 (\alpha + \beta + 1)}</math>

===Exponential===

The exponential distribution may be used to describe the processes that are in between Binomial and Poisson (exponential decay)

:<math> P_{e}(x,\lambda) = \left \{ {\lambda e^{-\lambda x} \;\;\;\; x \ge 0\atop 0 \;\;\;\; x<0} \right .</math>

:<math> CDF_{e}(x,\lambda) = \left \{ {\lambda 1-e^{-\lambda x} \;\;\;\; x \ge 0\atop 0 \;\;\;\; x<0} \right .</math>

==== Mean====

:<math>\mu = \frac{1}{\lambda}</math>

==== Variance ====

:<math>\mu = \frac{1}{\lambda^2}</math>

== Skewness and Kurtosis==

Distributions may also be characterized by how they look in terms of Skewness and Kurtosis

=== Skewness===

Measures the symmetry of the distribution

Skewness = <math>\frac{\sum (x_i - \bar{x})^3}{(N-1)s^3} = \frac{\mbox{3rd moment}}{\mbox{2nd moment}}</math>

where
:<math>s^2 = \frac{\sum (x_i - \bar{x})^2}{N-1}</math>

;The higher the number the more asymmetric (or skewed) the distribution is. The closer to zero the more symmetric.

A negative skewness indicates a tail on the left side of the distribution.
Positive skewness indicates a tail on the right.

===Kurtosis===

Measures the "pointyness" of the distribution

Kurtosis = <math>\frac{\sum (x_i - \bar{x})^4}{(N-1)s^4}</math>

where
:<math>s^2 = \frac{\sum (x_i - \bar{x})^2}{N-1}</math>

K=3 for Normal Distribution

In ROOT the Kurtosis entry in the statistics box is really the "excess kurtosis" which is the subtraction of the kurtosis by 3

Excess Kurtosis = <math>\frac{\sum (x_i - \bar{x})^4}{(N-1)s^4} - 3</math>

In this case a Positive excess Kurtosis will indicate a peak that is sharper than a gaussian while a negative value will indicate a peak that is flatter than a comparable Gaussian distribution.

[[File:ForeErrAna_Gaus-Cauchy_SkeKurt.gif|200 px]][[File:ForeErrAna_Gaus-Landau_SkeKurt.gif|200 px]][[File:ForeErrAna_Gaus-gamma_SkeKurt.gif|200 px]]

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

Forest ErrAna StatDist

2014-01-31T20:18:44Z

Stocjas2: /* Standard Deviation */

==Parent Distribution==

Let <math>x_i</math> represent our ith attempt to measurement the quantity <math>x</math>

Due to the random errors present in any experiment we should not expect <math>x_i = x</math>.

If we neglect systematic errors, then we should expect <math> x_i</math> to, on average, follow some probability distribution around the correct value <math>x</math>.

This probability distribution can be referred to as the "parent population".

== Average and Variance ==
===Average===

The word "average" is used to describe a property of a "parent" probability distribution or a set of observations/measurements made in an experiment which gives an indication of a likely outcome of an experiment.

The symbol

: <math>\mu</math>

is usually used to represent the "mean" of a known probability (parent) distribution (parent mean) while the "average" of a set of observations/measurements is denoted as

: <math>\bar{x}</math>

and is commonly referred to as the "sample" average or "sample mean".

Definition of the mean
:<math>\mu \equiv \lim_{N\rightarrow \infty} \frac{\sum x_i}{N}</math>

Here the above average of a parent distribution is defined in terms of an infinite sum of observations (x_i) of an observable x divided by the number of observations.

<math>\bar{x}</math> is a calculation of the mean using a finite number of observations

:<math> \bar{x} \equiv \frac{\sum x_i}{N}</math>

This definition uses the assumption that the result of an experiment, measuring a sample average of <math>(\bar{x})</math>, asymptotically approaches the "true" average of the parent distribution <math>\mu</math> .

===Variance===

The word "variance" is used to describe a property of a probability distribution or a set of observations/measurements made in an experiment which gives an indication how much an observation will deviate from and average value.

A deviation <math>(d_i)</math> of any measurement <math>(x_i)</math> from a parent distribution with a mean <math>\mu</math> can be defined as

:<math>d_i\equiv x_i - \mu</math>

the deviations should average to ZERO for an infinite number of observations by definition of the mean.

Definition of the average
:<math>\mu \equiv \lim_{N\rightarrow \infty} \frac{\sum x_i}{N}</math>

:<math>\lim_{N\rightarrow \infty} \frac{\sum (x_i - \mu)}{N}</math>
: <math>= \left ( \lim_{N\rightarrow \infty} \frac{\sum (x_i }{N}\right ) - \mu</math>
: <math>= \left ( \lim_{N\rightarrow \infty} \frac{\sum (x_i }{N}\right ) - \lim_{N\rightarrow \infty} \frac{\sum x_i}{N} = 0</math>

But the AVERAGE DEVIATION <math>(\bar{d})</math> is given by an average of the magnitude of the deviations given by

:<math>\bar{d} = \lim_{N\rightarrow \infty} \frac{\sum \left | (x_i - \mu)\right |}{N}</math> = a measure of the dispersion of the expected observations about the mean

Taking the absolute value though is cumbersome when performing a statistical analysis so one may express this dispersion in terms of the variance

A typical variable used to denote the variance is

:<math>\sigma^2</math>

and is defined as

:<math>\sigma^2 = \lim_{N\rightarrow \infty}\left [ \frac{\sum (x_i-\mu)^2 }{N}\right ]</math>

====Standard Deviation====

The standard deviation is defined as the square root of the variance

:S.D. = <math>\sqrt{\sigma^2}</math>

The mean should be thought of as a parameter which characterizes the observations we are making in an experiment. In general the mean specifies the probability distribution that is representative of the observable we are trying to measure through experimentation.

The variance characterizes the uncertainty associated with our experimental attempts to determine the "true" value. Although the mean and true value may not be equal, their difference should be less than the uncertainty given by the governing probability distribution.

==== Another Expression for Variance====

Using the definition of variance (omitting the limit as <math>n \rightarrow \infty</math>)
;Evaluating the definition of variance: <math>\sigma^2 \equiv \frac{\sum(x_i-\mu)^2}{N} = \frac{\sum (x_i^2 -2x_i \mu + \mu^2)}{N} = \frac{\sum x_i^2}{N} - 2 \mu \frac{\sum x_i}{N} + \frac{N \mu^2}{N} </math>
:<math> = \frac{\sum x_i^2}{N} -2 \mu^2 + \mu^2 =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\frac{\sum(x_i-\mu)^2}{N} =\frac{\sum x_i^2}{N} - \mu^2</math>

You can recast the above in terms of expectation value where

<math>E[x] \equiv \sum x_i P_x(x)</math>

:<math>\Rightarrow \sigma^2 = E[(x-\mu)^2] = \sum_{x=0}^n (x_i - \mu)^2 P(x_i)</math>
: <math>= E[x^2] - \left ( E[x]\right )^2 = \sum_{x=0}^n x_i^2 P(x_i) - \left ( \sum_{x=0}^n x_i P(x_i)\right )^2</math>

== Average for an unknown probability distribution (parent population)==

If the "Parent Population" is not known, you are just given a list of numbers with no indication of the probability distribution that they were drawn from, then the average and variance may be calculate as shown below.

===Arithmetic Mean and variance===

If <math>n</math> observables are mode in an experiment then the arithmetic mean of those observables is defined as

:<math>\bar{x} = \frac{\sum_{i=1}^{i=N} x_i}{N}</math>

The "unbiased" variance of the above sample is defined as

:<math>s^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N-1}</math>

;If you were told that the average is <math>\bar{x}</math> then you can calculate the
"true" variance of the above sample as

:<math>\sigma^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N}</math> = RMS Error= Root Mean Squared Error

;Note:RMS = Root Mean Square = <math>\frac{\sum_i^n x_i^2}{n}</math> =

==== Statistical Variance decreases with N====

The repetition of an experiment can decrease the STATISTICAL error of the experiment

Consider the following:

The average value of the mean of a sample of n observations drawn from the parent population is the same as the average value of each observation. (The average of the averages is the same as one of the averages)

: <math>\bar{x} = \frac{\sum x_i}{N} =</math> sample mean

:<math>\overline{\left ( \bar{x} \right ) } = \frac{\sum{\bar{x}_i}}{N} =\frac{1}{N} N \bar{x_i} = \bar{x}</math> if all means are the same

This is the reason why the sample mean is a measure of the population average ( <math>\bar{x} \sim \mu</math>)

Now consider the variance of the average of the averages (this is not the variance of the individual measurements but the variance of their means)

:<math>\sigma^2_{\bar{x}} = \frac{\sum \left (\bar{x} -\overline{\left ( \bar{x} \right ) } \right )^2}{N} =\frac{\sum \bar{x_i}^2}{N} -\left( \overline{\left ( \bar{x} \right ) } \right )^2</math>
:<math>=\frac{\sum \bar{x_i}^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{\sum \left( \sum \frac{x_i}{N}\right)^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2}\frac{\sum \left( \sum x_i\right)^2}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2}\frac{\sum \left (\sum x_i^2 + \sum_{i \ne j} x_ix_j \right )}{N} -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ \frac{\sum \left(\sum x_i^2 \right)}{N} + \frac{ \sum \left (\sum_{i \ne j} x_ix_j \right )}{N} \right ] -\left( \bar{x} \right )^2</math>

;If the measurements are all independent
:Then <math> \frac{\sum_{i \ne j} x_i x_j}{N} = \frac{\sum x_i}{N} \frac{ \sum x_j}{N}</math> : if <math>x_i</math> is independent of <math>x_j</math> (<math>i \ne j</math>)
:<math>= \left ( \frac{\sum x_i}{N} \right)^2 = \bar{x}^2</math>

example:
:(x_1x_2 + x_1x_3 + x_2x_1+x_2x_3+x_3x_1+x_3x_2+ ...) = (x_1+x_2+x_3)
The above part of the proof needs work

:<math>\sigma^2_{\bar{x}}=\frac{1}{N^2} \left [ \frac{\sum \left(\sum x_i^2 \right)}{N} + \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>

I use the expression <math>\sigma^2 = E[x^2] - \left ( E[x] \right)^2</math> again, except for<math> x_i</math> and not <math>\bar{x}</math> and turn it around so

: <math>\frac{\left(\sum x_i^2 \right)}{N} = \sigma^2 + \left ( \frac{\sum x_i}{N}\right)^2</math>

Now I have

:<math>\sigma^2_{\bar{x}}=\frac{1}{N^2} \left [ \sum \left (\sigma^2 + \left ( \frac{\sum x_i}{N} \right )^2 \right )+ \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + \sum_{i \ne j} \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + N(N-1) \bar{x}^2 \right ] -\left( \bar{x} \right )^2</math> Number of cross terms is N*(N-1)
:<math>=\frac{1}{N^2} \left [ N\sigma^2 + N\left ( \frac{\sum x_i}{N} \right )^2 + (N^2 -N) \left ( \frac{\sum x_i}{N} \right )^2 \right ] -\left( \bar{x} \right )^2</math> Number of cross terms is N*(N-1)
:<math>= \left [ \frac{\sigma^2}{N} + \left ( \frac{\sum x_i}{N} \right )^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>= \left [ \frac{\sigma^2}{N} + \left ( \bar{x}\right )^2 \right ] -\left( \bar{x} \right )^2</math>
:<math>= \frac{\sigma^2}{N} </math>

The above is the essence of counting statistics.

It says that the STATISTICAL error in an experiment decreases as a function of <math>\frac{1}{\sqrt N}</math>

==== Biased and Unbiased variance====

Where does this idea of an unbiased variance come from?

Using the same procedure as the previous section let's look at the average variance of the variances.

A sample variance of <math>n</math> measurements of <math>x_i</math> is
: <math>\sigma_n^2 = \frac{\sum(x_i-\bar{x})^2}{n} = E[x^2] - \left ( E[x] \right)^2 = \frac{\sum x_i^2}{n} -\left ( \bar{x} \right)^2</math>

To determine the "true" variance consider taking average of several sample variances (this is the same argument used above which let to <math>\overline{(\bar{x})} = \bar{x}</math> )

:<math>\frac{\sum \sigma_i^2}{N} = E[\sigma^2] - \left ( E[\sigma] \right)^2 = \frac{ \sum_j \left [ \frac{\sum_i x_i^2}{n} -\left ( \bar{x} \right)^2 \right ]_j}{N}</math>
:<math>= \frac{1}{n}\sum_i \left ( \frac{\sum_j x_j^2}{N} \right )_i - \frac {\sum_j \left ( \bar{x} \right)^2 }{N}</math>
:<math>= \frac{1}{n}\sum_i \left ( \frac{\sum_j x_j^2}{N} \right )_i - \left [ \left ( \frac {\sum_j \bar{x}}{N} \right)^2 + \sigma_{\bar{x}}^2\right ]</math> : as shown previously <math>E[\bar{x}^2] = \left ( E[\bar{x}] \right )^2 + \sigma_{\bar{x}}^2</math>
:<math>= \frac{1}{n}\sum_i \left ( \left [ \left (\frac{\sum_j x_j}{N}\right)^2 + \sigma^2 \right ]\right )_i - \left [ \left ( \frac {\sum_j x_j}{N} \right)^2 + \frac{\sigma^2}{n}\right ]</math> : also shown previously<math>\overline{\left ( \bar{x} \right ) } = \bar{x}</math> the universe average is the same as the sample average
:<math>= \frac{1}{n} \left ( n\left [ \left (\frac{\sum_j x_j}{N}\right)^2 + n\sigma^2 \right ]\right )_i - \left [ \left ( \frac {\sum_j x_j}{N} \right)^2 + \frac{\sigma^2}{n}\right ]</math>
:<math>= \sigma^2 - \frac{\sigma^2}{n}</math>
: <math>= \frac{n-1}{n}\sigma^2</math>

:<math>\Rightarrow \sigma^2 = \frac{n}{n-1}\frac{\sum \sigma_i^2}{N}</math>

Here

:<math>\sigma^2 =</math> the sample variance

:<math>\frac{\sum \sigma_i^2}{N} =</math> an average of all possible sample variance which should be equivalent to the "true" population variance.

:<math>\Rightarrow \frac{\sum \sigma_i^2}{N} \sim \sum \frac{x_i-\bar{x}}{n}</math> : if all the variances are the same this would be equivalent

: <math>\sigma^2 = \frac{n}{n-1}\frac{\sum(x_i-\bar{x})}{n}</math>
: <math>= \frac{\sum(x_i-\bar{x})}{n-1} =</math> unbiased sample variance

==Probability Distributions==

=== Mean(Expectation value) and variance===
====Mean of Discrete Probability Distribution====

In the case that you know the probability distribution you can calculate the mean<math> (\mu)</math> or expectation value E(x) and standard deviation as

For a Discrete probability distribution

<math>\mu = E[x]=\lim_{N \rightarrow \infty} \frac{\sum_{i=1}^n x_i N P(x_i)}{N}</math>

where

<math>N=</math> number of observations

<math>n=</math> number of different possible observable variables

<math>x_i =</math> ith observable quantity

<math>P(x_i) =</math> probability of observing <math>x_i</math> = Probability Mass Distribution for a discrete probability distribution

====Mean of a continuous probability distibution====
The average (mean) of a sample drawn from any probability distribution is defined in terms of the expectation value E(x) such that

The expectation value for a continuous probability distribution is calculated as

: <math>\mu = E(x) = \int_{-\infty}^{\infty} x P(x)dx</math>

===Variance===

==== Variance of a discrete PDF====

<math>\sigma^2 = \sum_{i=1}^n \left [ (x_i - \mu)^2 P(x_i)\right ]</math>

==== Variance of a Continuous PDF ====

<math>\sigma^2 = \int_{-\infty}^{\infty} \left [ (x - \mu)^2 P(x)\right ]dx</math>

==== Expectation of Arbitrary function====

If <math>f(x)</math> is an arbitrary function of a variable <math>x</math> governed by a probability distribution <math>P(x)</math>

then the expectation value of <math>f(x)</math> is

<math>E[f(x)] = \sum_{i=1}^N f(x_i) P(x_i) </math>

or if a continuous distribtion

<math>E[f(x)] = \int_{-\infty}^{\infty} f(x) P(x)dx</math>

===Uniform===

The Uniform probability distribution function is a continuous probability function over a specified interval in which any value within the interval has the same probability of occurring.

Mathematically the uniform distribution over an interval from a to b is given by

:<math>P_U(x) =\left \{ {\frac{1}{b-a} \;\;\;\; x >a \mbox{ and } x b \mbox{ or } x < a} \right .</math>

====Mean of Uniform PDF====

:<math>\mu = \int_{-\infty}^{\infty} xP_U(x)dx = \int_{a}^{b} \frac{x}{b-a} dx = \left . \frac{x^2}{2(b-a)} \right |_a^b = \frac{1}{2}\frac{b^2 - a^2}{b-a} = \frac{1}{2}(b+a)</math>

====Variance of Uniform PDF====

:<math>\sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2 P_U(x)dx = \int_{a}^{b} \frac{\left (x-\frac{b+a}{2}\right )^2}{b-a} dx = \left . \frac{(x -\frac{b+a}{2})^3}{3(b-a)} \right |_a^b </math>
:<math>=\frac{1}{3(b-a)}\left [ \left (b -\frac{b+a}{2} \right )^3 - \left (a -\frac{b+a}{2} \right)^3\right ]</math>
:<math>=\frac{1}{3(b-a)}\left [ \left (\frac{b-a}{2} \right )^3 - \left (\frac{a-b}{2} \right)^3\right ]</math>
:<math>=\frac{1}{24(b-a)}\left [ (b-a)^3 - (-1)^3 (b-a)^3\right ]</math>
:<math>=\frac{1}{12}(b-a)^2</math>

Now use ROOT to generate uniform distributions.
http://wiki.iac.isu.edu/index.php/TF_ErrAna_InClassLab#Day_3

===Binomial Distribution===

Binomial random variable describes experiments in which the outcome has only 2 possibilities. The two possible outcomes can be labeled as "success" or "failure". The probabilities may be defined as

;p
: the probability of a success

and

;q
:the probability of a failure.

If we let <math>X</math> represent the number of successes after repeating the experiment <math>n</math> times

Experiments with <math>n=1</math> are also known as Bernoulli trails.

Then <math>X</math> is the Binomial random variable with parameters <math>n</math> and <math>p</math>.

The number of ways in which the <math>x</math> successful outcomes can be organized in <math>n</math> repeated trials is

:<math>\frac{n !}{ \left [ (n-x) ! x !\right ]}</math> where the <math> !</math> denotes a factorial such that <math>5! = 5\times4\times3\times2\times1</math>.

The expression is known as the binomial coefficient and is represented as

<math>{n\choose x}=\frac{n!}{x!(n-x)!}</math>

The probability of any one ordering of the success and failures is given by

<math>P( \mbox{experimental ordering}) = p^{x}q^{n-x}</math>

This means the probability of getting exactly k successes after n trials is

:<math>P_B(x) = {n\choose x}p^{x}q^{n-x} </math>

==== Mean====

It can be shown that the Expectation Value of the distribution is

:<math>\mu = n p</math>

:<math>\mu = \sum_{x=0}^n x P_B(x) = \sum_{x=0}^n x \frac{n!}{x!(n-x)!} p^{x}q^{n-x}</math>
:<math> = \sum_{x=1}^n \frac{n!}{(x-1)!(n-x)!} p^{x}q^{n-x}</math> :summation starts from x=1 and not x=0 now
:<math> = np \sum_{x=1}^n \frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}q^{n-x}</math> :factor out <math>np</math> : replace n-1 with m everywhere and it looks like binomial distribution
:<math> = np \sum_{y=0}^{n-1} \frac{(n-1)!}{(y)!(n-y-1)!} p^{y}q^{n-y-1}</math> :change summation index so y=x-1, now n become n-1
:<math> = np \sum_{y=0}^{n-1} \frac{(n-1)!}{(y)!(n-1-y)!} p^{y}q^{n-1-y}</math> :
:<math> = np (q+p)^{n-1}</math> :definition of binomial expansion
:<math> = np 1^{n-1}</math> :q+p =1
:<math> = np </math>

====variance ====

:<math>\sigma^2 = npq</math>

;Remember: <math>\frac{\sum(x_i-\mu)^2}{N} = \frac{\sum (x_i^2 -2x_i \mu + \mu^2)}{N} = \frac{\sum x_i^2}{N} - 2 \mu \frac{\sum x_i}{N} + \frac{N \mu^2}{N} </math>
:<math> = \frac{\sum x_i^2}{N} -2 \mu^2 + \mu^2 =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\frac{\sum(x_i-\mu)^2}{N} =\frac{\sum x_i^2}{N} - \mu^2</math>

:<math>\Rightarrow \sigma^2 = E[(x-\mu)^2] = \sum_{x=0}^n (x_i - \mu)^2 P_B(x_i)</math>
: <math>= E[x^2] - \left ( E[x]\right )^2 = \sum_{x=0}^n x_i^2 P_B(x_i) - \left ( \sum_{x=0}^n x_i P_B(x_i)\right )^2</math>

To calculate the variance of the Binomial distribution I will just calculate <math>E[x^2]</math> and then subtract off <math>\left ( E[x]\right )^2</math>.

:<math>E[x^2] = \sum_{x=0}^n x^2 P_B(x)</math>
: <math>= \sum_{x=1}^n x^2 P_B(x)</math> : x=0 term is zero so no contribution
:<math>=\sum_{x=1}^n x^2 \frac{n!}{x!(n-x)!} p^{x}q^{n-x}</math>
: <math>= np \sum_{x=1}^n x \frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}q^{n-x}</math>

Let m=n-1 and y=x-1

: <math>= np \sum_{y=0}^n (y+1) \frac{m!}{(y)!(m-1-y+1)!} p^{y}q^{m-1-y+1}</math>
: <math>= np \sum_{y=0}^n (y+1) P(y)</math>
: <math>= np \left ( \sum_{y=0}^n y P(y) + \sum_{y=0}^n (1) P(y) \right)</math>
: <math>= np \left ( mp + 1 \right)</math>
: <math>= np \left ( (n-1)p + 1 \right)</math>

:<math>\sigma^2 = E[x^2] - \left ( E[x] \right)^2 = np \left ( (n-1)p + 1 \right) - (np)^2 = np(1-p) = npq</math>

=== Examples===

==== The number of times a coin toss is heads.====

The probability of a coin landing with the head of the coin facing up is

:<math>P = \frac{\mbox{number of desired outcomes}}{\mbox{number of possible outcomes}} = \frac{1}{2}</math> = Uniform distribution with a=0 (tails) b=1 (heads).

Suppose you toss a coin 4 times. Here are the possible outcomes

{| border="1" |cellpadding="20" cellspacing="0
|order Number
|colspan= "4" | Trial #
| # of Heads
|-
| || 1|| 2 || 3|| 4 ||
|-
|1 ||t || t || t|| t ||0
|-
|2||h || t || t|| t ||1
|-
|3||t || h || t|| t ||1
|-
|4||t || t || h|| t ||1
|-
|5||t || t || t|| h ||1
|-
|6||h || h || t|| t ||2
|-
|7||h || t || h|| t ||2
|-
|8||h || t || t|| h||2
|-
|9||t || h || h|| t ||2
|-
|10||t || h || t|| h ||2
|-
|11||t || t || h|| h ||2
|-
|12||t|| h || h|| h||3
|-
|13||h|| t || h|| h||3
|-
|14||h|| h || t|| h||3
|-
|15||h|| h || h|| t||3
|-
|16||h|| h || h|| h||4
|}

The probability of order #1 happening is

P( order #1) = <math>\left ( \frac{1}{2} \right )^0\left ( \frac{1}{2} \right )^4 = \frac{1}{16}</math>

P( order #2) = <math>\left ( \frac{1}{2} \right )^1\left ( \frac{1}{2} \right )^3 = \frac{1}{16}</math>

The probability of observing the coin land on heads 3 times out of 4 trials is.

<math>P(x=3) = \frac{4}{16} = \frac{1}{4} = {n\choose x}p^{x}q^{n-x} = \frac{4 !}{ \left [ (4-3) ! 3 !\right ]} \left ( \frac{1}{2}\right )^{3}\left ( \frac{1}{2}\right )^{4-3} = \frac{24}{1 \times 6} \frac{1}{16} = \frac{1}{4}</math>

==== A 6 sided die====

A die is a 6 sided cube with dots on each side. Each side has a unique number of dots with at most 6 dots on any one side.

P=1/6 = probability of landing on any side of the cube.

Expectation value :
; The expected (average) value for rolling a single die.
: <math>E({\rm Roll\ With\ 6\ Sided\ Die}) =\sum_i x_i P(x_i) =1 \left ( \frac{1}{6} \right) + 2\left ( \frac{1}{6} \right)+ 3\left ( \frac{1}{6} \right)+ 4\left ( \frac{1}{6} \right)+ 5\left ( \frac{1}{6} \right)+ 6\left ( \frac{1}{6} \right)=\frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5</math>

The variance:

: <math>E({\rm Roll\ With\ 6\ Sided\ Die}) =\sum_i (x_i - \mu)^2 P(x_i) </math>
: <math>= (1-3.5)^2 \left ( \frac{1}{6} \right) + (2-3.5)^2\left ( \frac{1}{6} \right)+ (3-3.5)^2\left ( \frac{1}{6} \right)+ (4-3.5)^2\left ( \frac{1}{6} \right)+ (5-3.5)^2\left ( \frac{1}{6} \right)+ (6-3.5)^2\left ( \frac{1}{6} \right) =2.92</math>
: <math>= \sum_i (x_i)^2 P(x_i) - \mu^2 = \left [ 1 \left ( \frac{1}{6} \right) + 4\left ( \frac{1}{6} \right)+ 9\left ( \frac{1}{6} \right)+ 16\left ( \frac{1}{6} \right)+ 25\left ( \frac{1}{6} \right)+ 36\left ( \frac{1}{6} \right) \right ] - (3.5)^3 =2.92</math>

If we roll the die 10 times what is the probability that X dice will show a 6?

A success will be that the die landed with 6 dots face up.

So the probability of this is 1/6 (p=1/6) , we toss it 10 times (n=10) so the binomial distribution function for a success/fail experiment says

<math>P_B(x) = {n\choose x}p^{x}q^{n-x} = \frac{10 !}{ \left [ (10-x) ! x !\right ]} \left ( \frac{1}{6}\right )^{x}\left ( \frac{5}{6}\right )^{10-x} </math>

So the probability the die will have 6 dots face up in 4/10 rolls is

<math>P_B(x=4) = \frac{10 !}{ \left [ (10-4) ! 4 !\right ]} \left ( \frac{1}{6}\right )^{4}\left ( \frac{5}{6}\right )^{10-4} </math>
:<math> = \frac{10 !}{ \left [ (6) ! 4 !\right ]} \left ( \frac{1}{6}\right )^{4}\left ( \frac{5}{6}\right )^{6} = \frac{210 \times 5^6}{6^10}=0.054 </math>

Mean = np =<math>\mu = 10/6 = 1.67</math>
Variance = <math>\sigma^2 = 10 (1/6)(5/6) = 1.38</math>

===Poisson Distribution===

The Poisson distribution is an approximation to the binomial distribution in the event that the probability of a success is quite small <math>(p \ll 1)</math>. As the number of repeated observations (n) gets large, the binomial distribution becomes more difficult to evaluate because of the leading term

:<math>\frac{n !}{ \left [ (n-x) ! x !\right ]}</math>

The poisson distribution overcomes this problem by defining the probability in terms of the average <math>\mu</math>.

:<math>P_P(x) = \frac{\mu^x e^{-\mu}}{x!}</math>

====Poisson as approximation to Binomial====

To drive home the idea that the Poisson distribution approximates a Binomial distribution at small p and large n consider the following derivation

The Binomial Probability Distriubtions is

:<math>P_B(x) = \frac{n!}{x!(n-x)!}p^{x}q^{n-x}</math>

The term

:<math> \frac{n!}{(n-x)!} = \frac{(n-x)! (n-x+1) (n-x + 2) \dots (n-1)(n)}{(n-x)!}</math>
:<math>= n (n-1)(n-2) \dots (n-x+2) (n-x+1)</math>

;IFF <math>x \ll n \Rightarrow</math> we have x terms above
:then <math>\frac{n!}{(n-x)!} =n^x</math>
:example:<math> \frac{100!}{(100-1)!} = \frac{99! \times 100}{99!} = 100^1</math>

This leave us with

:<math>P(x) = \frac{n^x}{x!}p^{x}q^{n-x}= \frac{(np)^x}{x!}(1-p)^{n-x}</math>
: <math>= \frac{(\mu)^x}{x!}(1-p)^{n}(1-p)^{-x}</math>

:<math>(1-p)^{-x} = \frac{1}{(1-p)^x} = 1+px = 1 : p \ll 1</math>

:<math>P(x) = \frac{(\mu)^x}{x!}(1-p)^{n}</math>

:<math>(1-p)^{n} = \left [(1-p)^{1/p} \right]^{\mu}</math>

: <math>\lim_{p \rightarrow 0} \left [(1-p)^{1/p} \right]^{\mu} = \left ( \frac{1}{e} \right)^{\mu} = e^{- \mu}</math>

;For <math>x \ll n</math>
:<math>\lim_{p \rightarrow 0}P_B(x,n,p ) = P_P(x,\mu)</math>

==== Derivation of Poisson Distribution====

The mean free path of a particle traversing a volume of material is a common problem in nuclear and particle physics. If you want to shield your apparatus or yourself from radiation you want to know how far the radiation travels through material.

The mean free path is the average distance a particle travels through a material before interacting with the material.
;If we let <math>\lambda</math> represent the mean free path
;Then the probability of having an interaction after a distance x is
: <math>\frac{x}{\lambda}</math>

as a result

: <math>1-\frac{x}{\lambda}= P(0,x, \lambda)</math> = probability of getting no events after a length dx

When we consider <math>\frac{x}{\lambda} \ll 1</math> ( we are looking for small distances such that the probability of no interactions is high)

:<math>P(0,x, \lambda) = e^{\frac{-x}{\lambda}} \approx 1 - \frac{x}{\lambda}</math>

Now we wish to find the probability of finding <math>N</math> events over a distance <math>x</math> given the mean free path.

This is calculated as a joint probability. If it were the case that we wanted to know the probability of only one interaction over a distance <math>L</math>. Then we would want to multiply the probability that an interaction happened after a distance <math>dx</math> by the probability that no more interactions happen by the time the particle reaches the distance <math>L</math>.

For the case of <math>N</math> interactions, we have a series of <math>N</math> interactions happening over N intervals of <math>dx</math> with the probability <math>dx/\lambda</math>

:<math>P(N,x,\lambda)</math> = probability of finding <math>N</math> events within the length <math>x</math>
: <math>= \frac{dx_1}{\lambda}\frac{dx_2}{\lambda}\frac{dx_3}{\lambda} \dots \frac{dx_N}{\lambda} e^{\frac{-x}{\lambda}}</math>

The above expression represents the probability for a particular sequence of events in which an interaction occurs after a distance <math>dx_1</math> then a interaction after <math>dx_2</math> , <math>\dots</math>

So in essence the above expression is a "probability element" where another probability element may be

: <math> P(N,x, \lambda)=\frac{dx_2}{\lambda}\frac{dx_1}{\lambda}\frac{dx_3}{\lambda} \dots \frac{dx_N}{\lambda} e^{\frac{-x}{\lambda}}</math>

where the first interaction occurs after the distance <math>x_2</math>.

: <math>= \Pi_{i=1}^{N} \left [ \frac{dx_i}{\lambda} \right ] e^{\frac{-x}{\lambda}}</math>

So we can write a differential probability element which we need to add up as

:<math>d^NP(N,x, \lambda)=\frac{1}{N!} \Pi_{i=1}^{N} \left [ \frac{dx_i}{\lambda} \right ] e^{\frac{-x}{\lambda}}</math>

The N! accounts for the degeneracy in which for every N! permutations there is really only one new combination. ie we are double counting when we integrate.

Using the integral formula
: <math> \Pi_{i=1}^{N} \left [\int_0^x \frac{dx_i}{\lambda} \right ]= \left [ \frac{x}{\lambda}\right]^N</math>

we end up with

<math>P(N,x, \lambda) = \frac{\left [ \frac{x}{\lambda}\right]^N}{N!} e^{\frac{-x}{\lambda}}</math>

====Mean of Poisson Dist====

:<math>\mu = \sum_{i=1}^{\infty} i P(i,x, \lambda)</math>
: <math>= \sum_{i=1}^{\infty} i \frac{\left [ \frac{x}{\lambda}\right]^i}{i!} e^{\frac{-x}{\lambda}}
= \frac{x}{\lambda} \sum_{i=1}^{\infty} \frac{\left [ \frac{x}{\lambda}\right]^{(i-1)}}{(i-1)!} e^{\frac{-x}{\lambda}} = \frac{x}{\lambda}
</math>

:<math>P_P(x,\mu) = \frac{\mu^x e^{-\mu}}{x!} </math>

====Variance of Poisson Dist====

For [http://wiki.iac.isu.edu/index.php/TF_ErrAna_Homework#Poisson_Prob_Dist Homework] you will show, in a manner similar to the above mean calculation, that the variance of the Poisson distribution is

:<math>\sigma^2 = \mu</math>

===Gaussian===

The Gaussian (Normal) distribution is an approximation of the Binomial distribution for the case of a large number of possible different observations. Poisson approximated the binomial distribution for the case when p<<1 ( the average number of successes is a lot smaller than the number of trials <math>(\mu = np)</math> ).

The Gaussian distribution is accepted as one of the most likely distributions to describe measurements.

A Gaussian distribution which is normalized such that its integral is unity is refered to as the Normal distribution. You could mathematically construct a Gaussian distribution which is not normalized to unity (this is often done when fitting experimental data).

:<math>P_G(x,\mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{x -\mu}{\sigma} \right) ^2}</math> = probability of observing <math>x</math> from a Gaussian parent distribution with a mean <math>\mu</math> and standard deviation <math>\sigma</math>.

==== Half-Width <math>\Gamma</math> (a.k.a. Full Width as Half Max)====

The half width <math>\Gamma</math> is used to describe the range of <math>x</math> through which the distributions amplitude decreases to half of its maximum value.

;ie: <math>P_G(\mu \pm \frac{\Gamma}{2}, \mu, \sigma) = \frac{P_G(\mu,\mu,\sigma)}{2}</math>

;Side note:the point of steepest descent is located at <math>x \pm \sigma</math> such that

; <math>P_G(\mu \pm \sigma, \mu, \sigma) = e^{1/2} P_G(\mu,\mu,\sigma)</math>

==== Probable Error (P.E.)====

The probable error is the range of <math>x</math> in which half of the observations (values of <math>x</math>) are expected to fall.

; <math>x= \mu \pm P.E.</math>

==== Binomial with Large N becomes Gaussian====

Consider the binomial distribution in which a fair coin is tossed a large number of times (N is very large and and EVEN number N=2n)

What is the probability you get exactly <math>\frac{1}{2}N -s</math> heads and <math>\frac{1}{2}N +s</math> tails where s is an integer?

The Binomial Probability distribution is given as

:<math>P_B(x) = {N\choose x}p^{x}q^{N-x} = \frac{N!}{x!(N-x)!}p^{x}q^{N-x}</math>

p = probability of success= 1/2

q= 1-p = 1/2

N = number of trials =2n

x= number of successes=n-s

:<math>P_B(n-s) = \frac{2n!}{(n-s)!(2n-n+s)!}p^{n-s}q^{2n-n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!}p^{n-s}q^{n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{n-s} \left(\frac{1}{2}\right)^{n+s}</math>
:<math>= \frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{2n}</math>

Now let's cast this probability with respect to the probability that we get an even number of heads and tails by defining the following ratio R such that

:<math>R \equiv \frac{P_B(n-s)}{P_B(n)}</math>

:<math>P_B(x=n) = \frac{N!}{n!(N-n)!}p^{n}q^{N-n} = \frac{(2n)!}{n!(n)!}p^{n}q^{n} = \frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n}</math>

:<math>R = \frac{\frac{2n!}{(n-s)!(n+s)!} \left(\frac{1}{2}\right)^{2n}}{\frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n}} = \frac{n! n!}{(n-s)! (n+s)!}</math>

Take the natural logarithm of both sides

:<math> \ln (R) = \ln \left ( \frac{n! n!}{(n-s)! (n+s)!} \right) = \ln(n!)+\ln(n!) - \ln\left[(n-s)!\right ] - \ln \left[(n+s)!\right] = 2 \ln(n!) - \ln\left [ (n-s)! \right ] - \ln \left [ (n+s)! \right ]</math>

Stirling's Approximation says
:<math>n! \sim \left (2 \pi n\right)^{1/2} n^n e^{-n}</math>
:<math>\Rightarrow </math>
;<math>\ln(n!) \sim \ln \left [ \left (2 \pi n\right)^{1/2} n^n e^{-n}\right ] = \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +\ln\left [ n^n \right ] + \ln \left [e^{-n}\right ]</math>
:<math>= \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n\ln\left [ n \right ] + (-n)</math>
:<math>= \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n(\ln\left [ n \right ] -1 )</math>

similarly

:<math>\ln\left [(n-s)! \right ] \sim \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n-1)^{1/2} \right ] + (n-s)(\ln\left [ (n-s) \right ] -1 )</math>
:<math>\ln\left [(n+s)! \right ] \sim \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n+1)^{1/2} \right ] + (n+s)(\ln\left [ (n+s) \right ] -1 )</math>

:<math>\Rightarrow \ln (R) = 2 \times \left (\ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ n^{1/2} \right ] +n(\ln\left [ n \right ] -1 ) \right ) </math>
:<math>- \left ( \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n-1)^{1/2} \right ] + (n-s)(\ln\left [ (n-s) \right ] -1 )\right )</math>
:<math> -\left ( \ln \left [ \left (2 \pi \right)^{1/2} \right ] +\ln \left [ (n+1)^{1/2} \right ] + (n+s)(\ln\left [ (n+s) \right ] -1 )\right ) </math>
: <math>= 2 \ln \left [ n^{1/2} \right ] +2 n(\ln\left [ n \right ] -1 ) - \ln \left [ (n-1)^{1/2} \right ] - (n-s)(\ln\left [ (n-s) \right ] -1 ) -\ln \left [ (n+1)^{1/2} \right ] - (n+s)(\ln\left [ (n+s) \right ] -1 )</math>

: <math>\ln \left [ n^{1/2} \right ] = \ln \left [ (n-1)^{1/2} \right ] = \ln \left [ (n+1)^{1/2} \right ]</math> For Large n

:<math> \ln (R) = 2 n(\ln\left [ n \right ] -1 ) - (n-s)(\ln\left [ (n-s) \right ] -1 ) - (n+s)(\ln\left [ (n+s) \right ] -1 )</math>
:<math> =2 n(\ln\left [ n \right ] -1 ) - (n-s)(\ln\left [ n(1-s/n) \right ] -1 ) - (n+s)(\ln\left [ n(1+s/n) \right ] -1 )</math>
: <math>= 2n \ln (n) - 2n - (n-s) \left [ \ln (n) + \ln (1-s/n) -1\right ] - (n+s) \left [ \ln (n) + \ln (1+s/n) -1\right ]</math>
: <math>= - 2n - (n-s) \left [ \ln (1-s/n) -1\right ] - (n+s) \left [ \ln (1+s/n) -1\right ]</math>
: <math>= - (n-s) \left [ \ln (1-s/n) \right ] - (n+s) \left [ \ln (1+s/n) \right ]</math>

If <math>-1 < s/n \le 1</math>

Then

: <math>\ln (1+s/n) = s/n - \frac{s^2}{2n^2} + \frac{s^3}{3 n^3} \dots</math>

<math>\Rightarrow</math>

: <math>\ln(R) =- (n-s) \left [ -s/n - \frac{s^2}{2n^2} - \frac{s^3}{3 n^3} \right ] - (n+s) \left [ s/n - \frac{s^2}{2n^2} + \frac{s^3}{3 n^3} \right ]</math>
: <math>= - \frac{s^2}{n} = - \frac{2s^2}{N}</math>

or

<math>R \sim e^{-2s^2/N}</math>

as a result

:<math>P(n-s) = R P_B(n)</math>

:<math> P_B(x=n)= \frac{(2n)!}{(n)!(n)!} \left(\frac{1}{2}\right)^{2n} = \frac{(\left ( \left (2 \pi 2n\right)^{1/2} (2n)^{2n} e^{-2n}\right ) }{\left(\left (2 \pi n\right)^{1/2} n^n e^{-n}\right ) \left ( \left (2 \pi n\right)^{1/2} n^n e^{-n}\right)} \left(\frac{1}{2}\right)^{2n}</math>
:<math>= \left(\frac{1}{\pi n} \right )^{1/2} = \left(\frac{2}{\pi N} \right )^{1/2}</math>

<math>P(n-s) = \left(\frac{2}{\pi N} \right )^{1/2} e^{-2s^2/N}</math>

In binomial distributions

<math>\sigma^2 = Npq = \frac{N}{4}</math> for this problem

or

<math>N = 4 \sigma^2</math>

<math>P(n-s) = \left(\frac{2}{\pi 4 \sigma^2} \right )^{1/2} e^{-2s^2/N} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{2s^2}{4 \sigma^2}} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{s}{\sigma} \right) ^2}</math>
= probability of exactly <math>(\frac{N}{2} -s)</math> heads AND <math>(\frac{N}{2} +s)</math> tails after flipping the coin N times (N is and even number and s is an integer).

If we let <math>x = n-s</math> and realize that for a binomial distributions

<math>\mu = Np = N/2 = n</math>

Then

<math>P(x) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{n-x}{\sigma} \right) ^2} = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right) ^2}</math>

; So when N gets big the Gaussian distribution is a good approximation to the Binomianl

==== Gaussian approximation to Poisson when <math>\mu \gg 1</math> ====

:<math>P_P(r) = \frac{\mu^r e^{-\mu}}{r!}</math> = Poisson probability distribution

substitute

<math>x \equiv r - \mu</math>

:<math>P_P(x + \mu) = \frac{\mu^{x + \mu} e^{-\mu}}{(x+\mu)!} = e^{-\mu} \frac{\mu^{\mu} \mu^x}{(\mu + x)!} = e^{-\mu} \mu^{\mu}\frac{\mu^x}{(\mu)! (\mu+1) \dots (\mu+x)}</math>
:<math> = e^{-\mu} \frac{\mu^{\mu}}{\mu!} \left [ \frac{\mu}{(\mu+1)} \cdot \frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] </math>

:<math>e^{-\mu} \frac{\mu^{\mu}}{\mu!} = e^{-\mu} \frac{\mu^{\mu}}{\sqrt{2 \pi \mu} \mu^{\mu}e^{-\mu}}= \frac{1}{\sqrt{2 \pi \mu}}</math> '''Stirling's Approximation when <math>\mu \gg 1</math>'''

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} \left [ \frac{\mu}{(\mu+1)} \cdot \frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] </math>

:<math>\left [ \frac{\mu}{(\mu+1)} \cdot\frac{\mu}{(\mu+2)} \dots \frac{\mu}{(\mu+x)} \right ] = \frac{1}{1 + \frac{1}{\mu}} \cdot \frac{1}{1 + \frac{2}{\mu}} \dots \frac{1}{1 + \frac{x}{\mu}}</math>

: <math>e^{x/\mu} \approx 1 + \frac{x}{\mu}</math> : if <math>x/\mu \ll 1</math> Note:<math>x \equiv r - \mu</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} \left [ \frac{1}{1 + \frac{1}{\mu}} \cdot \frac{1}{1 + \frac{2}{\mu}} \dots \frac{1}{1 + \frac{x}{\mu}} \right ] = \frac{1}{\sqrt{2 \pi \mu}} \left [ e^{-1/\mu} \times e^{-2/\mu} \cdots e^{-x/\mu} \right ] = \frac{1}{\sqrt{2 \pi \mu}} e^{-1 \left[ \frac{1}{\mu} +\frac{2}{\mu} \cdots \frac{x}{\mu} \right ]}</math>
: <math>= \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \sum_1^x i \right ]}</math>

another mathematical identity

:<math>\sum_{i=1}^{x} i = \frac{x}{2}(1+x)</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \frac{x}{2}(1+x) \right ]}</math>

if<math> x \gg 1</math> then

:<math>\frac{x}{2}(1+x) \approx \frac{x^2}{2}</math>

:<math>P_P(x + \mu) = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-1}{\mu} \left[ \frac{x^2}{2} \right ]} = \frac{1}{\sqrt{2 \pi \mu}} e^{\frac{-x^2}{2\mu} }</math>

In the Poisson distribution

:<math>\sigma^2 = \mu</math>

replacing dummy variable x with r - \mu

:<math>P_P(r) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{-(r - \mu)^2}{2\sigma^2} } =\frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{r -\mu}{\sigma} \right) ^2}</math> = Gaussian distribution when <math>\mu \gg 1</math>

==== Integral Probability (Cumulative Distribution Function)====

The Poisson and Binomial distributions are discrete probability distributions (integers).

The Gaussian distribution is our first continuous distribution as the variables are real numbers. It is not very meaningful to speak of the probability that the variate (x) assumes a specific value.

One could consider defining a probability element <math>A_G</math> which is really an integral over a finite region <math>\Delta x</math> such that

:<math>A_G(\Delta x, \mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} \int_{\mu - \Delta x}^{\mu + \Delta x} e^{- \frac{1}{2} \left ( \frac{x - \mu}{\sigma}\right )^2} dx</math>

The advantage of this definition becomes apparent when you are interesting in quantifying the probability that a measurement would fall outside a range <math>\Delta x</math>.

: <math>P_G( x - \Delta x > x > x + \Delta x) = 1 - A_G(\Delta x, \mu, \sigma)</math>

The Cumulative Distribution Function (CDF), however, is defined in terms of the integral from the variates min value

:<math>CDF \equiv \int_{x_{min}}^{x} P_G( x, \mu, \sigma) = \int_{-\infty}^{x} P_G( x, \mu, \sigma) = P_G(X \le x) =</math> Probability that you measure a value less than or equal to <math>x</math>

===== discrete CDF example =====

The probability that a student fails this class is 7.3%.

What is the probability more than 5 student will fail in a class of 32 students?

Answ: <math>P_B(x\ge 5) = \sum_{x=5}^{32} P_B(x) = CDF( x \ge 5) = 1- \sum_{x=0}^4 P_B(x) = 1 - CDF(x<5) </math>
:<math>= 1 - P_B(x=0)- P_B(x=1)- P_B(x=2)- P_B(x=3)- P_B(x=4)</math>
: <math>= 1 - 0.088 - 0.223 - 0.272 - 0.214 - 0.122 = 0.92 \Rightarrow P_B(x \ge 5) = 0.08</math>= 8%

There is an 8% probability that 5 or more student will fail the class

===== 2 SD rule of thumb for Gaussian PDF =====

In the above example you calculated the probability that more than 5 student will fail a class. You can extend this principle to calculate the probability of taking a measurement which exceeds the expected mean value.

One of the more common consistency checks you can make on a sample data set which you expect to be from a Gaussian distribution is to ask how many data points appear more than 2 S.D. (<math>\sigma</math>) from the mean value.

The CDF for this is
: <math>P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \int_{-\infty}^{\mu - 2\sigma} P_G(x, \mu, \sigma) dx</math>
: <math>= \frac{1}{\sigma \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} e^{- \frac{1}{2} \left ( \frac{x - \mu}{\sigma}\right )^2} dx</math>

Let

: <math>z = \frac{x-2}{\sigma}</math>
: <math>dz = \frac{dx}{\sigma}</math>

: <math>\Rightarrow P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \frac{1}{ \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} e^{- \frac{z^2}{2} } dz</math>

The above integral can only be done numerically by expanding the exponential in a power series

:<math>e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}</math>
:<math>\Rightarrow e^{-x} = 1 -x + \frac{x^2}{2!} - \frac{x^3}{3!} \cdots</math>
:<math>\Rightarrow e^{-z^2/2} = 1 -\frac{z^2}{2}+ \frac{z^4}{8} - \frac{z^6}{48} \cdots</math>

: <math>P_G(X \le \mu - 2 \sigma, \mu , \sigma ) = \frac{1}{ \sqrt{2 \pi}} \int_{-\infty}^{\mu - 2\sigma} \left ( 1 -\frac{z^2}{2}+ \frac{z^4}{8} - \frac{z^6}{48} \cdots \right)dz</math>
:<math> = \left . \frac{1}{ \sqrt{2 \pi}} \left ( z -\frac{z^3}{6}+ \frac{z^5}{40} - \frac{z^7}{48 \times 7} \cdots \right ) \right |_{-\infty}^{\mu - 2\sigma}</math>
: <math>=\left . \frac{1}{\pi} \sum_{j=0}^{\infty} \frac{(-1)^j \left (\frac{x}{\sqrt{2}} \right)^{2j+1}}{j! (2j+1)} \right |_{x=\mu - 2\sigma}</math>

No analytical for the probability but one which you can compute.

Below is a table representing the cumulative probability <math>P_G(x< \mu - \delta \mbox{ and } x> \mu + \delta , \mu, \sigma)</math> for events to occur outside and interval of <math>\pm \delta</math> in a Gaussian distribution

{| border="1" |cellpadding="20" cellspacing="0
|-
|<math>P_G(x< \mu - \delta \mbox{ and } x> \mu + \delta , \mu, \sigma)</math> || <math>\delta</math>
|-
|<math>3.2 \times 10^{-1}</math> ||1<math>\sigma</math>
|-
|<math>4.4 \times 10^{-2}</math> ||2<math>\sigma</math>
|-
|<math>2.7 \times 10^{-3}</math> ||3<math>\sigma</math>
|-
|<math>6.3 \times 10^{-5}</math> ||4<math>\sigma</math>
|}

[[File:TF_Error_CDF_Gauss.png| 400 px]]

===Cauchy/Lorentzian/Breit-Wigner Distribution===
In Mathematics, the Cauchy distribution is written as
:<math>P_{CL}(x, x_0, \Gamma) = \frac{1}{\pi} \frac{\Gamma/2}{(x -x_0)^2 + (\Gamma/2)^2}</math> = Cauchy-Lorentian Distribution

:Note; The probability does not fall as rapidly to zero as the Gaussian. As a result, the Gaussian's central peak contributes more to the area than the Lorentzian's.

This distribution happens to be a solution to physics problems involving forced resonances (spring systems driven by a source, or a nuclear interaction which induces a metastable state).

:<math>P_{BW} = \sigma(E)= \frac{1}{2\pi}\frac{\Gamma}{(E-E_0)^2 + (\Gamma/2)^2}</math> = Breit-Wigner distribution

:<math>E_0 =</math> mass resonance
:<math>\Gamma = </math>FWHM
: <math>\Delta E \Delta t = \Gamma \tau = \frac{h}{2 \pi}</math> = uncertainty principle
:<math>\tau=</math>lifetime of resonance/intermediate state particle

A Beit-Wigner function fit to cross section measured as a function of energy will allow one to evaluate the rate increases that are produced when the probing energy excites a resonant state that has a mass <math>E_0</math> and lasts for the time <math>\tau</math> derived from the Half Width <math>\Gamma</math>.

==== mean====

Mean is not defined

Mode = Median = <math>x_0</math> or <math>E_0</math>

==== Variance ====

The variance is also not defined but rather the distribution is parameterized in terms of the Half Width <math>\Gamma</math>

Let
:<math>z = \frac{x-\mu}{\Gamma/2}</math>

Then

:<math>\sigma^2 = \frac{\Gamma^2}{4\pi} \int_{-\infty}^{\infty} \frac{z^2}{1+z^2} dz</math>

The above integral does not converge for large deviations <math>(x -\mu)</math> . The width of the distribution is instead characterized by <math>\Gamma</math> = FWHM

===Landau===

:<math>P_L(x) = \frac{1}{2 \pi i} \int_{c-i\infty}^{c+i\infty}\! e^{s \log s + x s}\, ds </math>
where <math>c</math> is any positive real number.

To simplify computation it is more convenient to use the equivalent expression

:<math>P_L(x) = \frac{1}{\pi} \int_0^\infty\! e^{-t \log t - x t} \sin(\pi t)\, dt.</math>

The above distribution was derived by Landau (L. Landau, "On the Energy Loss of Fast Particles by Ionization", J. Phys., vol 8 (1944), pg 201 ) to describe the energy loss by particles traveling through thin material ( materials with a thickness on the order of a few radiation lengths).

Bethe-Bloch derived an expression to determine the AVERAGE amount of energy lost by a particle traversing a given material <math>(\frac{dE}{dx})</math> assuming several collisions which span the physical limits of the interaction.

For the case a thin absorbers, the number of collisions is so small that the central limit theorem used to average over several collision doesn't apply and there is a finite possibility of observing large energy losses.

As a result one would expect a distribution which is Gaussian like but with a "tail" on the <math>\mu + \sigma</math> side of the distribution.

===Gamma===

:<math> P_{\gamma}(x,k,\theta) = x^{k-1} \frac{e^{-x/\theta}}{\theta^k \, \Gamma(k)}\text{ for } x > 0\text{ and }k, \theta > 0.\,</math>

where

::<math> \Gamma(z) = \int_0^\infty t^{z-1} e^{-t}\,dt\;</math>

The distribution is used for "waiting time" models. How long do you need to wait for a rain storm, how long do you need to wait to die,...

Climatologists use this for predicting how rain fluctuates from season to season.

If <math>k =</math> integer then the above distribution is a sum of <math>k</math> independent exponential distributions

:<math> P_{\gamma}(x,k,\theta) = 1 - e^{-x/\theta} \sum_{j=0}^{k-1}\frac{1}{j!}\left ( \frac{x}{\theta}\right)^j </math>

==== Mean====

:<math>\mu = k \theta</math>

====Variance====

:<math>\sigma^2 = k \theta^2</math>

====Properties====

:<math>\lim_{X \rightarrow \infty} P_{\gamma}(x,k,\theta) = \left \{ {\infty \;\;\;\; k <1 \atop 0 \;\;\;\; k>1} \right .</math>
: <math>= \frac{1}{\theta} \;\; \mbox{if} k=1</math>

===Beta===
:<math> P_{\beta}(x;\alpha,\beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\int_0^1 u^{\alpha-1} (1-u)^{\beta-1}\, du} \!</math>

::<math>= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\, x^{\alpha-1}(1-x)^{\beta-1}\!</math>

::<math>= \frac{1}{\mathrm{B}(\alpha,\beta)}\, x
^{\alpha-1}(1-x)^{\beta-1}\!</math>

====Mean====

:<math>\mu = \frac{\alpha}{\alpha + \beta}</math>

==== Variance====

:<math>\sigma^2 = \frac{\alpha \beta }{(\alpha + \beta)^2 (\alpha + \beta + 1)}</math>

===Exponential===

The exponential distribution may be used to describe the processes that are in between Binomial and Poisson (exponential decay)

:<math> P_{e}(x,\lambda) = \left \{ {\lambda e^{-\lambda x} \;\;\;\; x \ge 0\atop 0 \;\;\;\; x<0} \right .</math>

:<math> CDF_{e}(x,\lambda) = \left \{ {\lambda 1-e^{-\lambda x} \;\;\;\; x \ge 0\atop 0 \;\;\;\; x<0} \right .</math>

==== Mean====

:<math>\mu = \frac{1}{\lambda}</math>

==== Variance ====

:<math>\mu = \frac{1}{\lambda^2}</math>

== Skewness and Kurtosis==

Distributions may also be characterized by how they look in terms of Skewness and Kurtosis

=== Skewness===

Measures the symmetry of the distribution

Skewness = <math>\frac{\sum (x_i - \bar{x})^3}{(N-1)s^3} = \frac{\mbox{3rd moment}}{\mbox{2nd moment}}</math>

where
:<math>s^2 = \frac{\sum (x_i - \bar{x})^2}{N-1}</math>

;The higher the number the more asymmetric (or skewed) the distribution is. The closer to zero the more symmetric.

A negative skewness indicates a tail on the left side of the distribution.
Positive skewness indicates a tail on the right.

===Kurtosis===

Measures the "pointyness" of the distribution

Kurtosis = <math>\frac{\sum (x_i - \bar{x})^4}{(N-1)s^4}</math>

where
:<math>s^2 = \frac{\sum (x_i - \bar{x})^2}{N-1}</math>

K=3 for Normal Distribution

In ROOT the Kurtosis entry in the statistics box is really the "excess kurtosis" which is the subtraction of the kurtosis by 3

Excess Kurtosis = <math>\frac{\sum (x_i - \bar{x})^4}{(N-1)s^4} - 3</math>

In this case a Positive excess Kurtosis will indicate a peak that is sharper than a gaussian while a negative value will indicate a peak that is flatter than a comparable Gaussian distribution.

[[File:ForeErrAna_Gaus-Cauchy_SkeKurt.gif|200 px]][[File:ForeErrAna_Gaus-Landau_SkeKurt.gif|200 px]][[File:ForeErrAna_Gaus-gamma_SkeKurt.gif|200 px]]

[http://wiki.iac.isu.edu/index.php/Forest_Error_Analysis_for_the_Physical_Sciences] [[Forest_Error_Analysis_for_the_Physical_Sciences]]

Chriss' Work Page

2012-03-12T19:19:32Z

Stocjas2: Created page with 'This is a temporary blank page for material related to Chriss Jackson's work with the rotating W target.'

This is a temporary blank page for material related to Chriss Jackson's work with the rotating W target.

HRRL Positron Rotating W Target

2012-03-12T19:17:43Z

Stocjas2:

=45 degree angle=

The Tungsten target should make a 45 degree angle with the beam.

make the motor so it can be rotated.

=Beam Heat Load=

;Accelerator Settings
{| border="1"
| beam energy || 10 MeV
|-
| Rep Rate: || 300 Hz
|-
| I_peak: || 200 mA/pulse
|-
| pulse width:|| 0.1 <math>\mu</math>s (100 ns)
|-
| Vacuum:|| <math>10^{-8}</math> Torr
|-
|}

: <math>200 \frac{\mbox{mA}}{\mbox{pulse}} \times 300 \frac{\mbox{pulses}}{\mbox{sec}} \times 0.1 \mu \mbox{s}= 60 \mu A</math>

Time on for one second is 30 us. and Time off is 0.999999 s The average current 0.999999 * 0 + 0.000001*300*200mA =
60<math>\mu</math>A.

''P'' = (10 MeV) <math>\times</math> (60 <math>\mu</math>A) = 600 Watts.

=Target Chamber design=

Latest Design From empire and Sadiq's measurement of the available space

{| border="1" cellpadding="20" cellspacing="0"
|-
|[[File:EmpMagHRRLPositronTargetChamberDrw_9-20-11.png | 200 px]]||[[File:HRRLPositronTargetSpace_10-5-11.png | 200 px]] || [[File:HRRLPositronTargetCoupler_10-5-11.png | 200 px]]|| [[File:HRRLPositronTargetHub_10-5-11.png | 200 px]]
|-
| Target motor Design || Available beam line space || Motor axle coupling to rotary union || Hub and spline to attache targets
|}

;Dimensions are in inches

[[File:HRRLPositronTargetChamber_4-20-11.png | 200 px]]

There is an 18" space in Z for the target chamber. If you want to rotate a disk on the 5" wide motor you will need at least a a 10" x 10" area.

So we could try as a first design a chamber which is 18" x 15" x 15".
We need the following

*2 beam flanges flush with the walls of the chamber.
*2 flanges to allow access for hands to work on the target motor inside. It should able to fit hands through.
*a flange on top which will allow a 10" diameter target disk to come out.
*a flange on the bottom for the ion pump.

July Design From empire

[[File:EmpMagHRRLPositronTargetChamberDrw_7-25-11.png | 200 px]]

{| border="1" cellpadding="20" cellspacing="0"
|-
|[[File:EmpMagHRRLPositronTargetChamberPict_7-25-11.png | 200 px]]||[[File:EmpMagStatorCoolLines_7-26-11.png | 200 px]]
|-
| Rotary Union overlayed on conceptual drawing. || Conceptual drawing of stator cooling lines. We will need separate connection to them on the vacuum chamber.
|}

=Gate Valve - Target distance=

The distance from the front of the gate valve to the most beam upstream part of the target chamber is

88.43 cm

The gate valve closes in 0.7 sec.

So water would need to travel at a speed of 88.43 cm/0.7 sec = 1.26 m/sec

= Properties of Tungsten=

*Melting Point = 3695 K.

*Heat Capacity = <math>(25~ ^{o}C)~ 24.27 ~ J ~ mol^{-1} ~ K^{-1}</math>

*These data are from: [http://en.wikipedia.org/wiki/Tungsten Tungsten]

*Heat loss due to radiation: ( \sigma T^4)

=Tungsten Temperature as a function of heat load=

*IAC beamline pressure = <math>10^{-8} </math> Torr

*The Tungsten heats up when an MeV energy electron impinges its surface. The imperfect vacuum inside the beam pipe does allow some radiative cooling.

*Conduction and Radiation

=Calculating Radiators Equilibrium Temperature=

==1.Calculating number of particles per second ==

We have electron beam of:

Frequency: f=1000Hz

Peak current: I=10~mAmp=0.01 Amp

Pulse width: <math> \Delta t= 50~ns=5 \times10^{-8}</math> seconds

So, how many electrons we have in each second?

By Q=It, we have

<math> N \times e=f \times I \times \Delta t </math>

Where N is the total electron numbers hits target per second, e is electron charge and f, I and ∆t are given above. Number of particle per second is:

<math> N = \frac {f \times I \times \Delta t}{e} </math>

==2.Calculating Energy deposited per second==

If we find the energy deposited by each electron and multiply to the total number of electrons in each second, we will find the total energy per second deposited in radiator.

To find energy deposited by each electron, we need to use formula

<math> E_{dep~one}={(\frac{dE}{dx})}_{coll}\times t </math>

Where is <math> E_{dep~one} </math> is energy deposited by one electron, <math> {(\frac{dE}{dx})}_{coll} </math> is mean energy loss (also stopping power) by collision of electron and <math> t </math> is thickness of the radiator.

Actually, energy loss of a electron comes from two parts: 1) the emission of electromagnetic radiation arising from scattering in the electric field of a nucleus (bremsstrahlung); 2) Collisional energy loss when passing through matter. Bremsstrahlung will not contribute to the temperature, because it is radiation.

Stopping power can be found from nuclear data tables <math> (dE/dx)_{ave} </math> and thickness is 0.001 times of radiation length. From Particle Data group we got radiation length and average total stopping powers around 15MeV for electrons in these materials from National Institute of Standards and Technology: [http://physics.nist.gov/PhysRefData/Star/Text/ESTAR.html Tungsten Stopping Power].

===Table of Radiation Lengths===
Note:These data is from Particle Data group,Link: [http://pdg.lbl.gov/AtomicNuclearProperties/].

{| border="1" cellpadding="20" cellspacing="0"

|-
|Elements
|Radiation Lengths <math> (g/cm^{2} )</math>
|-
|W || 6.76
|-
|} 

===Table of energy calculations===
For the thickness of 0.001 Radiation Length (0.0001RL) of radiators.
Note: <math>(dE/dx)_{coll}</math> is from National Institute of Standards and Technology. Link: [[http://physics.nist.gov/PhysRefData/Star/Text/ESTAR.html]])
{| border="1" cellpadding="20" cellspacing="0"
|-
|Elements
|<math>(dE/dx)_{coll} (MeV \; cm^2/g)</math>
|<math> t~( g~cm^{-2}</math>)
|<math>E_{(dep~one)}</math> (MeV)
|<math>E_{dep/s}</math> (MeV/s)
|<math>E_{dep/s}</math> (J/s)
|-
|W ||1.247 ||0.00676 ||0.00842972 ||<math>2.63*10^{10}</math> ||<math>4.21*10^{-3}</math>
|-
|}

In above table,we took the total numbers of electrons per second and multiply it to Energy deposited by one electron,get total energy deposited per second (which is power).

<math>P_{dep}= E_{dep}/s = ( E_{(dep~by~one)})\times(Number~of~electrons~per~second)</math>

==3.Calculating equilibrium temperature using Stefan–Boltzmann law==
If we assume that there is no energy conduction and total energy is just radiated from two surfaces of the radiators which are as big as beam spot,in our case beam spot is 2mm in diameter. According to Stefan–Boltzmann law, this total power radiated will be

<math> P_{rad} = A \sigma T^{4} </math>

Where T is radiating temperature P is the radiating power, A is surface area that beam incident and σ is Stefan–Boltzmann constant or Stefan's constant. To reach equilibrium temperature, Power deposited in and power radiated should be. So
<math> P_{dep}=P_{rad} </math>

so

<math> T = [ \frac{P_{dep}}{A\sigma} ]^{1/4} = [ \frac{N*E_{(dep~one)}}{A\sigma} ]^{1/4} </math>

=Vendors=

==MDC Target Chamber==

Target chamber: MDC vacuum products can send a prefab one you just need to weld up.

John Brooks

jbrooks@mdcvacuum.com

1-800-443-8817

or

510-265-3569

[[File:TargetChamberDesign_1.0.pdf]]

Tom Bogden (John's boss)

MDC does not have a fast closing gate valve. Model GV1500 closes in 0.7 seconds and costs $1600. One that closes in 10 msec costs about $10k

==Tungsten Disks==

;Dimensions are in cm

[[File:HRRL_PosTargetDesign_4-20-11.png| 200 px]]

http://www.alibaba.com/product-gs/346924116/Tungsten_Disk_With_Screw.html

http://www.zlxtech.com.cn/products.asp?productcode=7401000404

http://www.cleveland-tungsten.com/

If the motor has a radius of 2.5" and I want to shield the motor from the beam using 2" of Pb and the beam is about 1" in diameter then I want a Tungsten disk that is

5.21" in radius = 13.25 cm in radius ~ 30 cm in diameter

Then density of Tungsten is

19.25 g/cm^3

19.25 g/cm^3 \times \pi (30 cm)^2 = 54,428 g/cm

= 8.7 kg if it is 0.16 cm thick

The moment of inertia for a disk about its center is MR^2/2

So I = 8.7kg \times (0.3 m)^2/2 = 0.4 kg m^2

I should look into a ring of tungsten attached to a less dense material.

VX-U42 torque is about 50 in-lbs = 5.65 N m

Max acceleration would be

<math>\alpha = \frac{5.65 N m}{0.4 kg m^2} = 14.4 rad/sec \times \frac{1}{2 \pi} = 2.3 cycles/sec</math>

==Brushless Motor==

===Motor wiring diagram===

[[Media:U-42EmpireMagneticsMotorWiringDiag.pdf]]

Rick Halstead

5830 Commerce Blvd

Rohnert Park, CA 94928

Phone: 707-584-2801

Fax: 707-584-3418

rick@empiremagnetics.com:

The VX grade motors, with dry lubes have been used in deep space (Wake
Shield Module) where I'm told they were trying to get 10^-12 torr.

We need 10^{-8 }Torr

Vacuum lab grade motor U42.

http://www.empiremagnetics.com/prod_vac/prod_vac_vx_stepper_frame42.htm

Tony: Some of the couplings I had in mind:
http://www.sdp-si.com/estore/CoverPg/Couplings.htm

http://www.servometer.com/products/couplings/

I'm thinking the coupler with adaptation of a compression seal.
http://www.jaecofs.com/stainless-steel-compression-fittings.html

The challenge will be to get them to make a version that can be sealed
at each end, technically it is feasible, it's an issue of engineering
resource to chase them down, get a quote then propose it to you, so
you can get funding.

===Motor COntroller===

Jason Brickner
Olympus Controls
Tel. 208.475.4670 | Email jbrickner@olympus-controls.com

===Torque===

The MRI rotary Unions require 2 in-oz of torque each

The U42 has a minimum static torque of 375 in-oz (1 Nm/8.851 oz-in) = 42.4 Nm

The Momentum of inertia of a thin disk about its axial axis is

:<math>MR^2/2</math>

Moment of Inertia of a ring:

:<math>MR^2</math>

The target will be a disk of Aluminum with an outer ring of tungsten.

<math>I_{\mbox{Al disk}} =M_{\mbox{Al}} R^2</math>

<math>M_{\mbox{Al}}= \rho \times V = 2.7 g/cm^3 (0.2 cm) \times \pi ( 9 in 2.54 cm/in)^2 =2.7 g/cm^3 0.2 cm \pi 523 cm^2 = 886 g</math>

<math>I_{\mbox{Al disk}} =886 g \times 523 cm^2 = 463423 g cm^2 \times (1 oz/28 g) (1 in/2.54 cm) (1in/2.54 cm) = 2565 oz-in^2
</math>

<math>Torque = 42.4 Nm = I \alpha</math>

\alpha = 42.4 Nm/(0.04 kg m^2))=1060 rad/sec^2

I_{\mbox{W}} = MR^2 =

<math>M_{\mbox{W ring}} = 19.25 g/cm^3 \times \pi ((30 cm)^2 - (28.5 cm)^2) = 5.3 kg = 11.7 lbs</math>

<math>I = 5.3 kg \times 29 cm^2 = 0.4457 kg m^2</math>

<math>I_{tot} = I_{\mbox{Al disk}} + I_{\mbox{W}} = (0.04 + 0.4457) kg m^2</math>

<math>\alpha = 42.4 Nm/(0.49 kg m^2) = 87 rad/s^2</math>

How much time to accelerate to 50 Hz = <math>50 *2 *\pi</math> rad/sec?

<math>\omega_f = \alpha t^2/2</math>

<math>t = \sqrt{\frac{2 \omega_f}{\alpha}} =\sqrt{\frac{2 50 8 2 *\pi}{87}} = 4.2</math> seconds

==Rotary coupler/Union==

===ORNL rotating tungsten target===

Full Name: Thomas J. McManamy

Email Address: mcmanamytj@ornl.gov

Phone Number: 865-576-0039

Fax Number: 865-241-6909

Postal Address: OAK RIDGE NATIONAL LABORATORY

PO BOX 2008 MS6476

OAK RIDGE TN 37831-6476

http://www.ornl.gov/info/reporter/no116/nov09_dw.htm#revolution

===DSTI===

http://www.dsti.com/products/?gclid=CPWEvP6ogagCFRs5gwod6SSXtA

Brett Villella
brett.villella@dsti.com

Dynamic Sealing Technologies, Inc.

13829 Jay Street NW

Andover, MN 55304

Direct: 763-404-8021

Main: 763-786-3758

Fax: 763-786-9674

Web: www.dsti.com

Model: SPS-5510-R.A , a single pass rotary union, the input connector is 3/4" diameter as shown in the drawing below.

The minimum torque needed to rotate base on zero pressure is 4 in-lbs.

[[File:SinglePassRotaryUnionDSTi_v1.pdf]]

The target chamber design should be able to accommodate a 4" = 3.14+ 1" length in order to accept a rotary union which works at liquid nitrogen temperatures. The above single pass rotary union is good to -25F according to the company.

Place order for 2 with Michel @ 763-404-8024

===gat===

The guy from Empiremagnetics says that the ferofluid used in the unions below becomes stiff at low temperatures causes too much load on the motors.

http://www.gat-mbh.de/index.php?page=rotovac-en&group=produkte-en:rotating_unions-en

http://www.tengxuan.net/old/product.asp?Catalog=8

http://www.rotarysystems.com/?gclid=CID4r6z22acCFQM6gwodoQVQ9w
http://rotarysystems.com/series-006

==Motor Controller==

http://www.centent.com/

=Target specifications=

The U42 motor has a 4.2 inch diameter. With a 2" Pb brick for shielding this would put the target radiu to at least (2.1 + 2 + 1) 5.1 inches.

== Moment of Inertia==

<math>I_{Disk} = \frac{1}{2} MR^2</math>

<math>I_{Hoop} = \frac{1}{2} M(R_1^2 +R_2^2)</math>

R= 32 cm = 12.6 inches.

==Torque==

==Rotational Speed==
===Mechanical design===

An Aluminum disk will be machined to have several Tungsten targets attached.

The disk has a radius of about 5.1 inches. The circumference will be 2 \pi r \approx 45 inches. If the targets are 1" wide with 1" gaps, then you will have about 20 targets.

If you rotate the target at 1 Hz then in one second you can hit 20 targets. This means your rep rate can be 20 Hz.

If you rotate the target at 50 Hz then your rep rate can be 1 kHz.

But since you will have 20 gaps (empty target positions) then you could run at 2 kHz.

So a target rotating 25 Hz can accept a rep rate of 1 kHz.

The HRRL easily runs at 300 Hz. For a target with 20 foils you would run the motor at 300/40 = 7.5 Hz.

===Heat load===
You will need at least 2 holes in the Tungsten disk which you can use as an empty target. The Max pulse rate of the Linac is 1 kHz. So a 500 Hz motor would be the Max speed needed if you have 2 Windows.

How long can a 1.6 mm thick piece of Tungsten that has an area of 2.5" x 2.5" be in a 600 Watt heat source before melting.

Specific Heat Capacity = <math> 130 \frac{J}{kg K}</math>

Thermal Conductivity = <math> 173\frac{W}{m K}=100\frac{BTU}{hr ft F}</math>

The Melting point of Tungsten is 3600 Kelvin.

density = 19.25<math>\frac{g}{cm^3}</math>

Volume = <math>(.16 cm)(2.5 cm)^2 = 1 cm^3</math>

Mass = <math>1 cm^2 \times 19.25 g/cm^3 = 19.25 g

:<math>Q = C_m m \Delta T = 130 \frac{J}{kg K} \times 0.967 kg \times (3600-300 K) = 414843 J = 600 J/s *t </math> Watt = J/s
:: <math>\Rightarrow t= 691 seconds.</math>

If Zinc

: C_m = 0.39 J/g/K Density = 7.13 g/cm^3 \Right arrow M = 7.13 \times 50.3 = 358 g, Melting point 693 K
:Q = 0.39 \times 0.358 \times 393 = 55 Joules = 10^3 Watts * t \Rightarrow t = 0.055 s \Rightarrow 18 Hz.

Now You need to calculate how quickly water can absorb heat

;Water properties
:C_m = 4.12 J/g/K , Density = 1 g/cm^3 assuming aflow rate of 1 liter/sec \Rightarrow mass = 1 g, Boiling point water = 373.15

Tungsten's specific heat capacity is about 30 times larger than water. So for the same mass and power input water's temperature will change a factor of 30 more than tungsten.

In 1 second Tungsten's temperature change will be
:\frac{600 J}{130 \times 0.967} = 5K

Copper coils which have water flowing through them are soldered to the Tungsten. The water is a heat sink transferring heat from the Tungsten to the water.

;Heat transfer by conduction
:<math>\frac{Q}{t} = \frac{\kappa A \Delta T}{d}</math>
:<math>\kappa \equiv</math> Heat Conductivity

=Sadiq's Calculation =

Moment of inertia of a cylinder rotating in with z axis:

<math> I = \frac{m r^2}{2} = \frac{\rho \times V r^2}{2} = \frac{\rho \times \pi r^2 h \times r^2}{2} = \frac{\rho \pi r^4 h}{2} </math>

<math>\alpha = \frac{d w}{dt} = \frac{d^2\theta}{dt^2} = \frac{\tau}{I} </math>

<math> w = \frac{\tau}{I}t + w_0 </math>

<math> \theta = \frac{1}{2} \frac{\tau}{I} t^2 + w_0 t + \theta_0 </math>

Angular change due to acceleration is:

<math> \Delta \theta = \frac{1}{2} \frac{\tau}{I} t^2 + w_0 t + \theta_0 - ( w_0 t + \theta_0) = \frac{1}{2} \frac{\tau}{I} t^2 </math>

Position change at some distance R from the center is

<math> \Delta l = \Delta \theta R = \frac{1}{2} \frac{\tau}{I} t^2 R</math>

<math> \Delta l = \frac{1}{2} \frac{\tau}{I} t^2 R</math>

Assume we just have one disk, and we are looking for position change at the radius, r=R. We are also looking at the the position change after one revolution, t = T.

<math> \Delta l = \frac{1}{2} \frac{\tau}{\frac{\rho \pi r^4 h}{2}} T^2 r = \frac{\tau}{\rho \pi r^3 h} \frac{1}{f^2} </math>

<math> f^2 r^3 = \frac{\tau}{\rho \pi h} \frac{1}{\Delta l } </math>

or

<math>f^2 r^3 \Delta l = \frac{\tau}{\rho \pi h} </math>

== Rough Estimation ==

100 oz-in = 0.706154 N*m

<math> \tau = 400~oz-in = 2.82462~N*m </math>

<math> \rho_{Al} = 2.7 ~g~ cm^{-3} = 2.7 \times 10^3 ~kg~ m^{-3}</math>

We choose:

<math> h = 1~inch = 0.0254 ~m</math>

<math> \Delta l = \frac{1}{4}~inch = 0.00635 ~m</math>

<math> f^2 r^3 = \frac{2.82462~N*m}{(2.7 \times 10^3 ~kg~ m^{-3} )\times 3.14 \times (0.0254 ~m)} \times \frac{1}{0.00635 ~m } = 2.0656616 </math>

<math> f^2 r^3 = 2.0656616 </math>

<math> r^3 = \frac{2}{f^2} </math>

<math> r = ( \frac{2}{f^2} ) ^ {1/3}</math>

Which means: If out motor has a torque of <math> 400~oz-in </math>, and rotating a 1 inch thick cylindrical aluminium plate, to move a point outer edge of the plate, forward or backward, 1/4 inch in next revolution, the radius and frequency of the plate should satisfy:

<math> r^3 = \frac{2}{f^2} </math>

{| border="1" |cellpadding="20" cellspacing="0
|-
|r (cm) || r (inch) || Circumference (in) || #tar (Number of 1-inch-targets can be mounted on the disk) || Rep rate (Hz) ||
|-
| 43.1 || 17.0 || 106.56 || 53 || 5 ||
|-
| 27.1 || 10.7 || 67.00 || 33 || 10 ||
|-
| 17.2 || 6.8 || 42.53 || 21 || 20 ||
|-
| 13.0 || 5.1 || 32.14 || 16 || 30 ||
|-
| 10.8 || 4.3 || 26.70 || 23 || 40 ||
|-
| 9.3 || 3.7 || 22.99 || 11 || 50 ||
|-
| 8.2 || 3.2 || 20.27 || 10 || 60 ||
|-
| 7.4 || 2.9 || 18.30 || 9 || 70 ||
|-
| 6.8 || 2.6 || 16.81 || 8 || 80 ||
|-
| 6.27 || 2.5 || 15.50 || 7 || 90 ||
|-
| 5.85 || 2.2 || 14.46 || 7 || 100 ||
|-
| 3.68 || 1.45 || 9.10 || 4 || 200 ||
|-
| 2.81 || 1.11 || 6.95 || 3 || 300 ||
|-
| 2.0 || 0.787 || 4.94 || 2 || 500 ||
|-
| 1.25 || 0.492 || 3.09 || 1 || 1000 ||

|}

=Q=
Q: Does the W target has to be in the vacuum?
What if we have it in the air, at the ends of the beam line we have very thin vacuum window?
In photofission experiment we had 1 mil vacuum window. under
15 MeV e- beam, only 1/1000 electron will interact with the window, produce bremsstrahlung photon.

Q: Can we make positron target that is inner part of the vacuum box, then we rotate all the box. Heat will be transfered to box,
box can cooled by water from outside.
May be box does not have to rotation movement? I can do other moments like up-down and right-left.

I think we can make a Tungsten target as a part of the beam line. That means outer part Tungsten target will be beam pine.
Then we can use water cooling to cool our Tungsten target from outside.
Before we had all the electrons hitting Faraday Cup, which did not melt the copper even when there was not cooling water
circling around FC.
Tungsten has higher melting point, plus we run cooling water around Tungsten target, then Tungsten shouldn't melt.

The first version of a positron converter target will be designed to distribute the heat load by rotating the tungsten target.

Calculate for 1 mm and 2 mm thick Tungsten

Look for Tungsten disks to attach to brushless motor and fit into beam pipe

=Chriss Jackson's Work=
[[Chriss' Work Page]]

Go Back [[Positrons]]

JacksonLDSwork

2012-03-12T19:16:46Z

Stocjas2:

This page is a dummy page for Chriss Jackson to edit as needed pertaining to the Rotating W Target Positron Project.

JacksonLDSwork

2012-03-12T19:12:55Z

Stocjas2: Created page with 'This page is a dummy page for Chris Jacksson to edit as needed pertaining to the Rotating W Target Positron Project.'

This page is a dummy page for Chris Jacksson to edit as needed pertaining to the Rotating W Target Positron Project.

Work Shop Safety

2011-08-10T20:02:18Z

Stocjas2: /* Bench Grinder */

=Machine Specific Safety=

==Saws==

-Table Saw-

http://www.youtube.com/watch?v=Y9V4FyCX97Y&ob=av3e

-Circular Saw-

http://www.youtube.com/watch?v=VlXl99iZkfM&feature=fvwrel

==Drill Press==

http://www.youtube.com/watch?v=Ox6nXyRtuog

==Bench Grinder==

-Grinder-

http://www.youtube.com/watch?v=GTpNPr35OlE

-wire wheel-

http://www.youtube.com/watch?v=5kHH12MlkFQ

==Lathe/Milling machine==

http://www.youtube.com/watch?v=xv6unAYenZE

==Acetylene Torch==

-Safety-

http://www.youtube.com/watch?v=DFU4OjEYGBI

***Old****
http://www.youtube.com/watch?v=3Ln0_hLDQqo&feature=related

http://www.youtube.com/watch?v=9um45rCRcpY&feature=related

http://www.youtube.com/watch?v=IsfwM32OI40&feature=related

http://www.youtube.com/watch?v=VfRQxxxLJuo

http://www.youtube.com/watch?v=DFU4OjEYGBI

*****Old****
Brazing

http://www.youtube.com/watch?v=ZojIpKCo4TQ&feature=related

Note. We are updating the video list for this section. Until updates are complete please watch all videos listed.

=Graham resource Center=

http://library.iit.edu/grc/safety/

=ehow safety tips=

http://www.ehow.com/videos-on_7156_workshop-safety-tips.html

Work Shop Safety

2011-08-10T20:02:00Z

Stocjas2: /* Saws */

=Machine Specific Safety=

==Saws==

-Table Saw-

http://www.youtube.com/watch?v=Y9V4FyCX97Y&ob=av3e

-Circular Saw-

http://www.youtube.com/watch?v=VlXl99iZkfM&feature=fvwrel

==Drill Press==

http://www.youtube.com/watch?v=Ox6nXyRtuog

==Bench Grinder==
http://www.youtube.com/watch?v=GTpNPr35OlE

wire wheel

http://www.youtube.com/watch?v=5kHH12MlkFQ

==Lathe/Milling machine==

http://www.youtube.com/watch?v=xv6unAYenZE

==Acetylene Torch==

-Safety-

http://www.youtube.com/watch?v=DFU4OjEYGBI

***Old****
http://www.youtube.com/watch?v=3Ln0_hLDQqo&feature=related

http://www.youtube.com/watch?v=9um45rCRcpY&feature=related

http://www.youtube.com/watch?v=IsfwM32OI40&feature=related

http://www.youtube.com/watch?v=VfRQxxxLJuo

http://www.youtube.com/watch?v=DFU4OjEYGBI

*****Old****
Brazing

http://www.youtube.com/watch?v=ZojIpKCo4TQ&feature=related

Note. We are updating the video list for this section. Until updates are complete please watch all videos listed.

=Graham resource Center=

http://library.iit.edu/grc/safety/

=ehow safety tips=

http://www.ehow.com/videos-on_7156_workshop-safety-tips.html

Work Shop Safety

2011-08-10T20:01:21Z

Stocjas2: /* Acetylene Torch */

=Machine Specific Safety=

==Saws==

http://www.youtube.com/watch?v=Y9V4FyCX97Y&ob=av3e
http://www.youtube.com/watch?v=VlXl99iZkfM&feature=fvwrel

==Drill Press==

http://www.youtube.com/watch?v=Ox6nXyRtuog

==Bench Grinder==
http://www.youtube.com/watch?v=GTpNPr35OlE

wire wheel

http://www.youtube.com/watch?v=5kHH12MlkFQ

==Lathe/Milling machine==

http://www.youtube.com/watch?v=xv6unAYenZE

==Acetylene Torch==

-Safety-

http://www.youtube.com/watch?v=DFU4OjEYGBI

***Old****
http://www.youtube.com/watch?v=3Ln0_hLDQqo&feature=related

http://www.youtube.com/watch?v=9um45rCRcpY&feature=related

http://www.youtube.com/watch?v=IsfwM32OI40&feature=related

http://www.youtube.com/watch?v=VfRQxxxLJuo

http://www.youtube.com/watch?v=DFU4OjEYGBI

*****Old****
Brazing

http://www.youtube.com/watch?v=ZojIpKCo4TQ&feature=related

Note. We are updating the video list for this section. Until updates are complete please watch all videos listed.

=Graham resource Center=

http://library.iit.edu/grc/safety/

=ehow safety tips=

http://www.ehow.com/videos-on_7156_workshop-safety-tips.html

Work Shop Safety

2011-08-10T20:00:32Z

Stocjas2: /* Acetylene Torch */

=Machine Specific Safety=

==Saws==

http://www.youtube.com/watch?v=Y9V4FyCX97Y&ob=av3e
http://www.youtube.com/watch?v=VlXl99iZkfM&feature=fvwrel

==Drill Press==

http://www.youtube.com/watch?v=Ox6nXyRtuog

==Bench Grinder==
http://www.youtube.com/watch?v=GTpNPr35OlE

wire wheel

http://www.youtube.com/watch?v=5kHH12MlkFQ

==Lathe/Milling machine==

http://www.youtube.com/watch?v=xv6unAYenZE

==Acetylene Torch==

-Safety-

http://www.youtube.com/watch?v=DFU4OjEYGBI

***Old****
http://www.youtube.com/watch?v=3Ln0_hLDQqo&feature=related

http://www.youtube.com/watch?v=9um45rCRcpY&feature=related

http://www.youtube.com/watch?v=IsfwM32OI40&feature=related

http://www.youtube.com/watch?v=VfRQxxxLJuo

http://www.youtube.com/watch?v=DFU4OjEYGBI

*****Old****
Brazing

http://www.youtube.com/watch?v=ZojIpKCo4TQ&feature=related

=Graham resource Center=

http://library.iit.edu/grc/safety/

=ehow safety tips=

http://www.ehow.com/videos-on_7156_workshop-safety-tips.html

Work Shop Safety

2011-08-10T19:47:26Z

Stocjas2: /* You Tube Safety Videos */

=Machine Specific Safety=

==Saws==

http://www.youtube.com/watch?v=Y9V4FyCX97Y&ob=av3e
http://www.youtube.com/watch?v=VlXl99iZkfM&feature=fvwrel

==Drill Press==

http://www.youtube.com/watch?v=Ox6nXyRtuog

==Bench Grinder==
http://www.youtube.com/watch?v=GTpNPr35OlE

wire wheel

http://www.youtube.com/watch?v=5kHH12MlkFQ

==Lathe/Milling machine==

http://www.youtube.com/watch?v=xv6unAYenZE

==Acetylene Torch==

http://www.youtube.com/watch?v=3Ln0_hLDQqo&feature=related

http://www.youtube.com/watch?v=9um45rCRcpY&feature=related

http://www.youtube.com/watch?v=IsfwM32OI40&feature=related

http://www.youtube.com/watch?v=VfRQxxxLJuo

http://www.youtube.com/watch?v=DFU4OjEYGBI

Brazing

http://www.youtube.com/watch?v=ZojIpKCo4TQ&feature=related

=Graham resource Center=

http://library.iit.edu/grc/safety/

=ehow safety tips=

http://www.ehow.com/videos-on_7156_workshop-safety-tips.html

Work Shop Safety

2011-08-10T19:47:19Z

Stocjas2: /* Machine Specific Safety */

=Machine Specific Safety=

==Saws==

http://www.youtube.com/watch?v=Y9V4FyCX97Y&ob=av3e
http://www.youtube.com/watch?v=VlXl99iZkfM&feature=fvwrel

==Drill Press==

http://www.youtube.com/watch?v=Ox6nXyRtuog

==Bench Grinder==
http://www.youtube.com/watch?v=GTpNPr35OlE

wire wheel

http://www.youtube.com/watch?v=5kHH12MlkFQ

==Lathe/Milling machine==

http://www.youtube.com/watch?v=xv6unAYenZE

==Acetylene Torch==

http://www.youtube.com/watch?v=3Ln0_hLDQqo&feature=related

http://www.youtube.com/watch?v=9um45rCRcpY&feature=related

http://www.youtube.com/watch?v=IsfwM32OI40&feature=related

http://www.youtube.com/watch?v=VfRQxxxLJuo

http://www.youtube.com/watch?v=DFU4OjEYGBI

Brazing

http://www.youtube.com/watch?v=ZojIpKCo4TQ&feature=related

=Graham resource Center=

http://library.iit.edu/grc/safety/

=ehow safety tips=

http://www.ehow.com/videos-on_7156_workshop-safety-tips.html

=You Tube Safety Videos=

http://www.youtube.com/watch?v=a64y3Ih_s74

http://www.youtube.com/watch?v=jIIHNI8X15Q&feature=mfu_in_order&list=UL

http://www.youtube.com/watch?v=4Tb2v_HU7f0&feature=BFa&list=ULd6ZU7ZYh1fA&index=3

File:PMT-Optimizations Data.xls

2011-06-30T20:16:49Z

Stocjas2:

PhotoFission with Polarized Photons from HRRL

2011-06-30T20:16:41Z

Stocjas2: /* Jason's Uploads */

[[Image:PhotfissionHeros_10-21-08.jpg | 500 px]]

=Experiments=
*[[October Fission HRRL Measurements 2008]]
*[[Feb_PhotFisRun_44MeV_March_2011]]

=Analysis=

*[[2008_HRRL_Analysis]]
*[[2011_44MeV_Analysis_2011]]

=Simulations=

=References=

==Meeting Notes==

==Hand Calculations==

*[[Neutron Polarimeter]]

*[[Dan's parallel calculation]]

*[[2n correlations in Photofission]]

==Publications==

[[Radiator Foil]]

[[Eγ]]

[[Using Carbon or Aluminum to block photons]]

[[Eγ vs probability with 5 cm of D20 ]]

[[Eγ vs probability with 8 cm of D20 ]]

[[Thickness of Lead to block photons]]

[[Where does the radiator need to be?]]

[[Determining the uncertainty of Eγ]]

[[Things That Still Need to Get Done]]

[[Tasks March 31, 2009]]

[[Notes from July 2nd, 2008 Meeting]]

[[Brems Intensity Spectrum (with Collimation Factors) Plots]]

[[Temporary Files]]

[[Electronics diagram 2/2/11]]

[[Brems Intensity Spectrum (without Collimation Factors) Plots]]

[[Notes for the July 11th, 2008 Meeting]]

[[Column of Water Length]]

[[Finding the Space Between Lead Shots]]

[[Electronics We Need to Collect]]

[[Compton Scattering]]

[[Constant Fraction Discriminator Details]]

[[Magnet Calculations]]

[[Notes from July 22nd, 2008 Meeting]]

[[Current Conceptual Design of Beamline]]

[[ Matching Detectors]]

[[ Cosmic Ray Shielding Test ]]

[[Neutron Time of Flight Test]]

[[PhotoFis.C (ROOT program to analyse data)]]

[[Pictures of experimental setup]]

[[Data bank]]

[[Integrated asymmetry]]

[[Peaks fitting]]

[[Neutron Production Thresholds]]

[[Conversion TDC to Energy]]

[[Cross Sections of C,N,O,Si,Pb]]

[[TDC Calibration]]

[http://wiki.iac.isu.edu/index.php/Plot FWHM distribution]

[[A kind of analysis]]

[[NaI spectra]]

[[B-field calculation]]

[[Plastic Scintillator Calculation]]

[[New Magnets]]

[[Spectra Analysis]]

[[Bremsstrahlung lineshape for E0 = 16 MeV, Ti radiator]]

[[Flux from NaI vs. Gamma Flash]]

[[ACCAPP_09_PhotFis_Poster]]

[[Pair Spectrometer Info]]

[[Faraday Cup Analysis]]

[[March 2010 Run Report]]

[[Minimum accelerator energy to run experiment]]

[[Collimation geometry for different beam energies]]

[[Counts Rate (44 MeV LINAC)]]

[[Pair Spectrometer]]

[[Aluminum Converter]]

[[Faraday Cup Temperature]]

[[Kicker Magnets]]

[[MCNPX Simulations for Neutron Beam]]

[[IAC experimental cell]]

[[Neutron Detector Setup]]

[[44 MeV Beam Line Optimization]]

[[FFs angular asymmetry]]

[[Electronics & DAQ]]

[[Cool beamline drawings]]

[[Pair Production Rate Calculation]]

[[Ribbon Cable Attenuation Measurements]]

[[Current Electronics]]

[[Draft Run Plan 2/23/11]]

[[44 Setup]]

[[n's detectors threshold measurements 2/28/11]]

[[PS front and back count rates (cosmic) 2/28/11]]

[[FC calibration 44 MeV 3/1/11 ]]

[[FC calibration 25 MeV 3/4/11 ]]

[[D2O photodisintegration simulation]]

[[Things to do different on next run]]

[[Dan's April 2011 2 neutron correlations talk]]

[[Dan's scribbles on the Hamamatsu R580 Voltage Divider Design]]

[[n's det calibration]]

[[2D Faraday Cup Data Analysis]]

[[(gamma,f) versus (gamma,f)]]

[[Bremsstrahlung radiation properties, Ee=25 MeV]]

[http://wiki.iac.isu.edu/index.php/HRRL Go Back]

=Other=
==Jason's Uploads==
[[File:PMT-Threshold-Testing.xls]]

[http://sales.hamamatsu.com/index.php?id=13189439&src=newproductpage Hamamatsu Documentation]

[[File:SaintGobainLightGudieQuote.pdf]]

[[File:PMT Beahvior TubeA.pdf]]

[[File:PMT Beahvior TubeB.pdf]]

[[File:PMT Beahvior TubeB(unfiltered).pdf]]

[[File:PMT-Optimizations Data.xls]]

File:PMT Beahvior TubeB(unfiltered).pdf

2011-06-30T20:15:45Z

Stocjas2:

File:PMT Beahvior TubeB.pdf

2011-06-30T20:15:36Z

Stocjas2:

File:PMT Beahvior TubeA.pdf

2011-06-30T20:15:18Z

Stocjas2:

PhotoFission with Polarized Photons from HRRL

2011-06-30T20:15:09Z

Stocjas2: /* Jason's Uploads */

PhotoFission with Polarized Photons from HRRL

2011-06-30T20:14:49Z

Stocjas2: /* Jason's Uploads */

[[Image:PhotfissionHeros_10-21-08.jpg | 500 px]]

=Experiments=
*[[October Fission HRRL Measurements 2008]]
*[[Feb_PhotFisRun_44MeV_March_2011]]

=Analysis=

*[[2008_HRRL_Analysis]]
*[[2011_44MeV_Analysis_2011]]

=Simulations=

=References=

==Meeting Notes==

==Hand Calculations==

*[[Neutron Polarimeter]]

*[[Dan's parallel calculation]]

*[[2n correlations in Photofission]]

==Publications==

[[Radiator Foil]]

[[Eγ]]

[[Using Carbon or Aluminum to block photons]]

[[Eγ vs probability with 5 cm of D20 ]]

[[Eγ vs probability with 8 cm of D20 ]]

[[Thickness of Lead to block photons]]

[[Where does the radiator need to be?]]

[[Determining the uncertainty of Eγ]]

[[Things That Still Need to Get Done]]

[[Tasks March 31, 2009]]

[[Notes from July 2nd, 2008 Meeting]]

[[Brems Intensity Spectrum (with Collimation Factors) Plots]]

[[Temporary Files]]

[[Electronics diagram 2/2/11]]

[[Brems Intensity Spectrum (without Collimation Factors) Plots]]

[[Notes for the July 11th, 2008 Meeting]]

[[Column of Water Length]]

[[Finding the Space Between Lead Shots]]

[[Electronics We Need to Collect]]

[[Compton Scattering]]

[[Constant Fraction Discriminator Details]]

[[Magnet Calculations]]

[[Notes from July 22nd, 2008 Meeting]]

[[Current Conceptual Design of Beamline]]

[[ Matching Detectors]]

[[ Cosmic Ray Shielding Test ]]

[[Neutron Time of Flight Test]]

[[PhotoFis.C (ROOT program to analyse data)]]

[[Pictures of experimental setup]]

[[Data bank]]

[[Integrated asymmetry]]

[[Peaks fitting]]

[[Neutron Production Thresholds]]

[[Conversion TDC to Energy]]

[[Cross Sections of C,N,O,Si,Pb]]

[[TDC Calibration]]

[http://wiki.iac.isu.edu/index.php/Plot FWHM distribution]

[[A kind of analysis]]

[[NaI spectra]]

[[B-field calculation]]

[[Plastic Scintillator Calculation]]

[[New Magnets]]

[[Spectra Analysis]]

[[Bremsstrahlung lineshape for E0 = 16 MeV, Ti radiator]]

[[Flux from NaI vs. Gamma Flash]]

[[ACCAPP_09_PhotFis_Poster]]

[[Pair Spectrometer Info]]

[[Faraday Cup Analysis]]

[[March 2010 Run Report]]

[[Minimum accelerator energy to run experiment]]

[[Collimation geometry for different beam energies]]

[[Counts Rate (44 MeV LINAC)]]

[[Pair Spectrometer]]

[[Aluminum Converter]]

[[Faraday Cup Temperature]]

[[Kicker Magnets]]

[[MCNPX Simulations for Neutron Beam]]

[[IAC experimental cell]]

[[Neutron Detector Setup]]

[[44 MeV Beam Line Optimization]]

[[FFs angular asymmetry]]

[[Electronics & DAQ]]

[[Cool beamline drawings]]

[[Pair Production Rate Calculation]]

[[Ribbon Cable Attenuation Measurements]]

[[Current Electronics]]

[[Draft Run Plan 2/23/11]]

[[44 Setup]]

[[n's detectors threshold measurements 2/28/11]]

[[PS front and back count rates (cosmic) 2/28/11]]

[[FC calibration 44 MeV 3/1/11 ]]

[[FC calibration 25 MeV 3/4/11 ]]

[[D2O photodisintegration simulation]]

[[Things to do different on next run]]

[[Dan's April 2011 2 neutron correlations talk]]

[[Dan's scribbles on the Hamamatsu R580 Voltage Divider Design]]

[[n's det calibration]]

[[2D Faraday Cup Data Analysis]]

[[(gamma,f) versus (gamma,f)]]

[[Bremsstrahlung radiation properties, Ee=25 MeV]]

[http://wiki.iac.isu.edu/index.php/HRRL Go Back]

=Other=
==Jason's Uploads==
[[File:PMT-Threshold-Testing.xls]]

[http://sales.hamamatsu.com/index.php?id=13189439&src=newproductpage Hamamatsu Documentation]

[[File:SaintGobainLightGudieQuote.pdf]]

[[File:PMT Beahvior|TubeA.pdf]]

[[File:PMT Beahvior|TubeB.pdf]]

[[File:PMT Beahvior|TubeA(unfiltered).pdf]]

PhotoFission with Polarized Photons from HRRL

2011-06-30T20:13:32Z

Stocjas2: /* Jason's Uploads */

File:PMT behavior D-TubeA.pdf

2011-06-30T20:12:47Z

Stocjas2:

PhotoFission with Polarized Photons from HRRL

2011-06-30T20:12:09Z

Stocjas2: /* Jason's Uploads */

PhotoFission with Polarized Photons from HRRL

2011-06-30T20:12:02Z

Stocjas2: /* Jason's Uploads */

User talk:Stocjas2

2011-06-23T15:28:02Z

Stocjas2:

=Stuff=
Specs for the R580 PMT found at this website.
http://sales.hamamatsu.com/index.php?id=13189439&src=newproductpage

[Presentation][[File:Presentaion.pdf]]

File:SaintGobainLightGudieQuote.pdf

2011-06-23T15:27:18Z

Stocjas2:

PhotoFission with Polarized Photons from HRRL

2011-06-23T15:27:04Z

Stocjas2: /* Jason's Uploads */

PhotoFission with Polarized Photons from HRRL

2011-06-23T15:26:24Z

Stocjas2: /* Jason's Uploads */

PhotoFission with Polarized Photons from HRRL

2011-06-23T15:26:01Z

Stocjas2: /* Jason's Uploads */

[[Image:PhotfissionHeros_10-21-08.jpg | 500 px]]

=Experiments=
*[[October Fission HRRL Measurements 2008]]
*[[Feb_PhotFisRun_44MeV_March_2011]]

=Analysis=

*[[2008_HRRL_Analysis]]
*[[2011_44MeV_Analysis_2011]]

=Simulations=

=References=

==Meeting Notes==

==Hand Calculations==

*[[Neutron Polarimeter]]

*[[Dan's parallel calculation]]

*[[2n correlations in Photofission]]

==Publications==

[[Radiator Foil]]

[[Eγ]]

[[Using Carbon or Aluminum to block photons]]

[[Eγ vs probability with 5 cm of D20 ]]

[[Eγ vs probability with 8 cm of D20 ]]

[[Thickness of Lead to block photons]]

[[Where does the radiator need to be?]]

[[Determining the uncertainty of Eγ]]

[[Things That Still Need to Get Done]]

[[Tasks March 31, 2009]]

[[Notes from July 2nd, 2008 Meeting]]

[[Brems Intensity Spectrum (with Collimation Factors) Plots]]

[[Temporary Files]]

[[Electronics diagram 2/2/11]]

[[Brems Intensity Spectrum (without Collimation Factors) Plots]]

[[Notes for the July 11th, 2008 Meeting]]

[[Column of Water Length]]

[[Finding the Space Between Lead Shots]]

[[Electronics We Need to Collect]]

[[Compton Scattering]]

[[Constant Fraction Discriminator Details]]

[[Magnet Calculations]]

[[Notes from July 22nd, 2008 Meeting]]

[[Current Conceptual Design of Beamline]]

[[ Matching Detectors]]

[[ Cosmic Ray Shielding Test ]]

[[Neutron Time of Flight Test]]

[[PhotoFis.C (ROOT program to analyse data)]]

[[Pictures of experimental setup]]

[[Data bank]]

[[Integrated asymmetry]]

[[Peaks fitting]]

[[Neutron Production Thresholds]]

[[Conversion TDC to Energy]]

[[Cross Sections of C,N,O,Si,Pb]]

[[TDC Calibration]]

[http://wiki.iac.isu.edu/index.php/Plot FWHM distribution]

[[A kind of analysis]]

[[NaI spectra]]

[[B-field calculation]]

[[Plastic Scintillator Calculation]]

[[New Magnets]]

[[Spectra Analysis]]

[[Bremsstrahlung lineshape for E0 = 16 MeV, Ti radiator]]

[[Flux from NaI vs. Gamma Flash]]

[[ACCAPP_09_PhotFis_Poster]]

[[Pair Spectrometer Info]]

[[Faraday Cup Analysis]]

[[March 2010 Run Report]]

[[Minimum accelerator energy to run experiment]]

[[Collimation geometry for different beam energies]]

[[Counts Rate (44 MeV LINAC)]]

[[Pair Spectrometer]]

[[Aluminum Converter]]

[[Faraday Cup Temperature]]

[[Kicker Magnets]]

[[MCNPX Simulations for Neutron Beam]]

[[IAC experimental cell]]

[[Neutron Detector Setup]]

[[44 MeV Beam Line Optimization]]

[[FFs angular asymmetry]]

[[Electronics & DAQ]]

[[Cool beamline drawings]]

[[Pair Production Rate Calculation]]

[[Ribbon Cable Attenuation Measurements]]

[[Current Electronics]]

[[Draft Run Plan 2/23/11]]

[[44 Setup]]

[[n's detectors threshold measurements 2/28/11]]

[[PS front and back count rates (cosmic) 2/28/11]]

[[FC calibration 44 MeV 3/1/11 ]]

[[FC calibration 25 MeV 3/4/11 ]]

[[D2O photodisintegration simulation]]

[[Things to do different on next run]]

[[Dan's April 2011 2 neutron correlations talk]]

[[Dan's scribbles on the Hamamatsu R580 Voltage Divider Design]]

[[n's det calibration]]

[[2D Faraday Cup Data Analysis]]

[[(gamma,f) versus (gamma,f)]]

[[Bremsstrahlung radiation properties, Ee=25 MeV]]

[http://wiki.iac.isu.edu/index.php/HRRL Go Back]

=Other=
==Jason's Uploads==
[[File:PMT-Threshold-Testing.xls]]
[http://sales.hamamatsu.com/index.php?id=13189439&src=newproductpage Hamamatsu Documentation]

PhotoFission with Polarized Photons from HRRL

2011-06-23T15:25:28Z

Stocjas2: /* Jason's Uploads */

[[Image:PhotfissionHeros_10-21-08.jpg | 500 px]]

=Experiments=
*[[October Fission HRRL Measurements 2008]]
*[[Feb_PhotFisRun_44MeV_March_2011]]

=Analysis=

*[[2008_HRRL_Analysis]]
*[[2011_44MeV_Analysis_2011]]

=Simulations=

=References=

==Meeting Notes==

==Hand Calculations==

*[[Neutron Polarimeter]]

*[[Dan's parallel calculation]]

*[[2n correlations in Photofission]]

==Publications==

[[Radiator Foil]]

[[Eγ]]

[[Using Carbon or Aluminum to block photons]]

[[Eγ vs probability with 5 cm of D20 ]]

[[Eγ vs probability with 8 cm of D20 ]]

[[Thickness of Lead to block photons]]

[[Where does the radiator need to be?]]

[[Determining the uncertainty of Eγ]]

[[Things That Still Need to Get Done]]

[[Tasks March 31, 2009]]

[[Notes from July 2nd, 2008 Meeting]]

[[Brems Intensity Spectrum (with Collimation Factors) Plots]]

[[Temporary Files]]

[[Electronics diagram 2/2/11]]

[[Brems Intensity Spectrum (without Collimation Factors) Plots]]

[[Notes for the July 11th, 2008 Meeting]]

[[Column of Water Length]]

[[Finding the Space Between Lead Shots]]

[[Electronics We Need to Collect]]

[[Compton Scattering]]

[[Constant Fraction Discriminator Details]]

[[Magnet Calculations]]

[[Notes from July 22nd, 2008 Meeting]]

[[Current Conceptual Design of Beamline]]

[[ Matching Detectors]]

[[ Cosmic Ray Shielding Test ]]

[[Neutron Time of Flight Test]]

[[PhotoFis.C (ROOT program to analyse data)]]

[[Pictures of experimental setup]]

[[Data bank]]

[[Integrated asymmetry]]

[[Peaks fitting]]

[[Neutron Production Thresholds]]

[[Conversion TDC to Energy]]

[[Cross Sections of C,N,O,Si,Pb]]

[[TDC Calibration]]

[http://wiki.iac.isu.edu/index.php/Plot FWHM distribution]

[[A kind of analysis]]

[[NaI spectra]]

[[B-field calculation]]

[[Plastic Scintillator Calculation]]

[[New Magnets]]

[[Spectra Analysis]]

[[Bremsstrahlung lineshape for E0 = 16 MeV, Ti radiator]]

[[Flux from NaI vs. Gamma Flash]]

[[ACCAPP_09_PhotFis_Poster]]

[[Pair Spectrometer Info]]

[[Faraday Cup Analysis]]

[[March 2010 Run Report]]

[[Minimum accelerator energy to run experiment]]

[[Collimation geometry for different beam energies]]

[[Counts Rate (44 MeV LINAC)]]

[[Pair Spectrometer]]

[[Aluminum Converter]]

[[Faraday Cup Temperature]]

[[Kicker Magnets]]

[[MCNPX Simulations for Neutron Beam]]

[[IAC experimental cell]]

[[Neutron Detector Setup]]

[[44 MeV Beam Line Optimization]]

[[FFs angular asymmetry]]

[[Electronics & DAQ]]

[[Cool beamline drawings]]

[[Pair Production Rate Calculation]]

[[Ribbon Cable Attenuation Measurements]]

[[Current Electronics]]

[[Draft Run Plan 2/23/11]]

[[44 Setup]]

[[n's detectors threshold measurements 2/28/11]]

[[PS front and back count rates (cosmic) 2/28/11]]

[[FC calibration 44 MeV 3/1/11 ]]

[[FC calibration 25 MeV 3/4/11 ]]

[[D2O photodisintegration simulation]]

[[Things to do different on next run]]

[[Dan's April 2011 2 neutron correlations talk]]

[[Dan's scribbles on the Hamamatsu R580 Voltage Divider Design]]

[[n's det calibration]]

[[2D Faraday Cup Data Analysis]]

[[(gamma,f) versus (gamma,f)]]

[[Bremsstrahlung radiation properties, Ee=25 MeV]]

[http://wiki.iac.isu.edu/index.php/HRRL Go Back]

=Other=
==Jason's Uploads==
Specs for the R580 PMT found at this website. http://sales.hamamatsu.com/index.php?id=13189439&src=newproductpage
[[File:PMT-Threshold-Testing.xls]]
[Hamamatsu Documentation]

PhotoFission with Polarized Photons from HRRL

2011-06-23T15:24:58Z

Stocjas2:

[[Image:PhotfissionHeros_10-21-08.jpg | 500 px]]

=Experiments=
*[[October Fission HRRL Measurements 2008]]
*[[Feb_PhotFisRun_44MeV_March_2011]]

=Analysis=

*[[2008_HRRL_Analysis]]
*[[2011_44MeV_Analysis_2011]]

=Simulations=

=References=

==Meeting Notes==

==Hand Calculations==

*[[Neutron Polarimeter]]

*[[Dan's parallel calculation]]

*[[2n correlations in Photofission]]

==Publications==

[[Radiator Foil]]

[[Eγ]]

[[Using Carbon or Aluminum to block photons]]

[[Eγ vs probability with 5 cm of D20 ]]

[[Eγ vs probability with 8 cm of D20 ]]

[[Thickness of Lead to block photons]]

[[Where does the radiator need to be?]]

[[Determining the uncertainty of Eγ]]

[[Things That Still Need to Get Done]]

[[Tasks March 31, 2009]]

[[Notes from July 2nd, 2008 Meeting]]

[[Brems Intensity Spectrum (with Collimation Factors) Plots]]

[[Temporary Files]]

[[Electronics diagram 2/2/11]]

[[Brems Intensity Spectrum (without Collimation Factors) Plots]]

[[Notes for the July 11th, 2008 Meeting]]

[[Column of Water Length]]

[[Finding the Space Between Lead Shots]]

[[Electronics We Need to Collect]]

[[Compton Scattering]]

[[Constant Fraction Discriminator Details]]

[[Magnet Calculations]]

[[Notes from July 22nd, 2008 Meeting]]

[[Current Conceptual Design of Beamline]]

[[ Matching Detectors]]

[[ Cosmic Ray Shielding Test ]]

[[Neutron Time of Flight Test]]

[[PhotoFis.C (ROOT program to analyse data)]]

[[Pictures of experimental setup]]

[[Data bank]]

[[Integrated asymmetry]]

[[Peaks fitting]]

[[Neutron Production Thresholds]]

[[Conversion TDC to Energy]]

[[Cross Sections of C,N,O,Si,Pb]]

[[TDC Calibration]]

[http://wiki.iac.isu.edu/index.php/Plot FWHM distribution]

[[A kind of analysis]]

[[NaI spectra]]

[[B-field calculation]]

[[Plastic Scintillator Calculation]]

[[New Magnets]]

[[Spectra Analysis]]

[[Bremsstrahlung lineshape for E0 = 16 MeV, Ti radiator]]

[[Flux from NaI vs. Gamma Flash]]

[[ACCAPP_09_PhotFis_Poster]]

[[Pair Spectrometer Info]]

[[Faraday Cup Analysis]]

[[March 2010 Run Report]]

[[Minimum accelerator energy to run experiment]]

[[Collimation geometry for different beam energies]]

[[Counts Rate (44 MeV LINAC)]]

[[Pair Spectrometer]]

[[Aluminum Converter]]

[[Faraday Cup Temperature]]

[[Kicker Magnets]]

[[MCNPX Simulations for Neutron Beam]]

[[IAC experimental cell]]

[[Neutron Detector Setup]]

[[44 MeV Beam Line Optimization]]

[[FFs angular asymmetry]]

[[Electronics & DAQ]]

[[Cool beamline drawings]]

[[Pair Production Rate Calculation]]

[[Ribbon Cable Attenuation Measurements]]

[[Current Electronics]]

[[Draft Run Plan 2/23/11]]

[[44 Setup]]

[[n's detectors threshold measurements 2/28/11]]

[[PS front and back count rates (cosmic) 2/28/11]]

[[FC calibration 44 MeV 3/1/11 ]]

[[FC calibration 25 MeV 3/4/11 ]]

[[D2O photodisintegration simulation]]

[[Things to do different on next run]]

[[Dan's April 2011 2 neutron correlations talk]]

[[Dan's scribbles on the Hamamatsu R580 Voltage Divider Design]]

[[n's det calibration]]

[[2D Faraday Cup Data Analysis]]

[[(gamma,f) versus (gamma,f)]]

[[Bremsstrahlung radiation properties, Ee=25 MeV]]

[http://wiki.iac.isu.edu/index.php/HRRL Go Back]

=Other=
==Jason's Uploads==
Specs for the R580 PMT found at this website. http://sales.hamamatsu.com/index.php?id=13189439&src=newproductpage
[[File:PMT-Threshold-Testing.xls]]