Difference between revisions of "TF ErrorAna StatInference"

From New IAC Wiki
Jump to navigation Jump to search
Line 216: Line 216:
  
 
: <math>=  D \left ( N x_j - \sum x_i \right) </math>
 
: <math>=  D \left ( N x_j - \sum x_i \right) </math>
 +
 +
 +
:\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D \left ( N x_j - \sum x_i \right)  ]
  
 
== Linear Fit with error==
 
== Linear Fit with error==

Revision as of 01:21, 8 March 2010

Statistical Inference

Frequentist -vs- Bayesian Inference

When it comes to testing a hypothesis, there are two dominant philosophies known as a Frequentist or a Bayesian perspective.

The dominant discussion for this class will be from the Frequentist perspective.

frequentist statistical inference

Statistical inference is made using a null-hypothesis test; that is, ones that answer the question Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?


The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event. Thus, if nt is the total number of trials and nx is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:

[math]P(x) \approx \frac{n_x}{n_t}.[/math]

Bayesian inference.

Statistical inference is made by using evidence or observations to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.

Bayes' theorem relates the conditional probability|conditional and marginal probability|marginal probabilities of events A and B, where B has a non-vanishing probability:

[math]P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! [/math].

Each term in Bayes' theorem has a conventional name:

  • P(A) is the prior probability or marginal probability of A. It is "prior" in the sense that it does not take into account any information about B.
  • P(B) is the prior or marginal probability of B, and acts as a normalizing constant.
  • P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B.
  • P(B|A) is the conditional probability of B given A.

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event A given B is related to the converse conditional probabablity of B given A.

Example

Suppose there is a school having 60% boys and 40% girls as students.

The female students wear trousers or skirts in equal numbers; the boys all wear trousers.

An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers.

What is the probability this student is a girl?

The correct answer can be computed using Bayes' theorem.

[math] P(A) \equiv[/math] probability that the student observed is a girl = 0.4
[math]P(B) \equiv[/math] probability that the student observed is wearing trousers = 60+20/100 = 0.8
[math]P(B|A) \equiv[/math] probability the student is wearing trousers given that the student is a girl
[math]P(A|B) \equiv[/math] probability the student is a girl given that the student is wearing trousers
[math]P(B|A) =0.5[/math]


[math]P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.[/math]


Method of Maximum Likelihood

The principle of maximum likelihood is the cornerstone of Frequentist based hypothesis testing and may be written as
The best estimate for the mean and standard deviation of the parent population is obtained when the observed set of values are the most likely to occur;ie: the probability of the observing is a maximum.

Least Squares Fit

Applying the Method of Maximum Likelihood

Our object is to find the best straight line fit for an expected linear relationship between dependent variate [math](y)[/math] and independent variate [math](x)[/math].


If we let [math]y_0(x)[/math] represent the "true" linear relationship between independent variate [math]x[/math] and dependent variate [math]y[/math] such that

[math]y_o(x) = A + B x[/math]

Then the Probability of observing the value [math]y_i[/math] with a standard deviation [math]\sigma_i[/math] is given by

[math]P_i = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}[/math]

assuming an experiment done with sufficiently high statistics that it may be represented by a Gaussian parent distribution.

If you repeat the experiment [math]N[/math] times then the probability of deducing the values [math]A[/math] and [math]B[/math] from the data can be expressed as the joint probability of finding [math]N[/math] [math]y_i[/math] values for each [math]x_i[/math]

[math]P(A,B) = \Pi \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}[/math]
[math]= \left ( \frac{1}{\sigma \sqrt{2 \pi}}\right )^N e^{- \frac{1}{2} \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2}[/math] = Max

The maximum probability will result in the best values for [math]A[/math] and [math]B[/math]

This means

[math]\chi^2 = \sum \left ( \frac{y_i - y_0(x_i)}{\sigma_i}\right)^2 = \sum \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2[/math] = Min

The min for [math]\chi^2[/math] occurs when the function is a minimum for both parameters A & B : ie

[math]\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0[/math]
[math]\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0[/math]
If [math]\sigma_i = \sigma[/math]
All variances are the same (weighted fits don't make this assumption)

Then

[math]\frac{\partial \chi^2}{\partial A} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial A} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum \left ( y_i - A - B x_i \right)=0[/math]
[math]\frac{\partial \chi^2}{\partial B} = \frac{1}{\sigma^2}\sum \frac{ \partial}{\partial B} \left ( y_i - A - B x_i \right)^2=\frac{-2}{\sigma^2}\sum x_i \left ( y_i - A - B x_i \right)=0[/math]


or

[math]\sum \left ( y_i - A - B x_i \right)=0[/math]
[math]\sum x_i \left( y_i - A - B x_i \right)=0[/math]

The above equations represent a set of simultaneous of 2 equations and 2 unknowns which can be solved.

[math]\sum y_i = \sum A + B \sum x_i[/math]
[math]\sum x_i y_i = A \sum x_i + B \sum x_i^2[/math]


[math]\left( \begin{array}{c} \sum y_i \\ \sum x_i y_i \end{array} \right) = \left( \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array} \right)\left( \begin{array}{c} A \\ B \end{array} \right)[/math]


The Method of Determinants

for the matrix problem:

[math]\left( \begin{array}{c} y_1 \\ y_2 \end{array} \right) = \left( \begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array} \right)\left( \begin{array}{c} x_1 \\ x_2 \end{array} \right)[/math]

the above can be written as

[math]y_1 = a_{11} x_1 + a_{12} x_2[/math]
[math]y_2 = a_{21} x_1 + a_{22} x_2[/math]

solving for [math]x_1[/math] assuming [math]y_1[/math] is known

[math]a_{22} (y_1 = a_{11} x_1 + a_{12} x_2)[/math]
[math]-a_{12} (y_2 = a_{21} x_1 + a_{22} x_2)[/math]
[math]\Rightarrow a_{22} y_1 - a_{12} y_2 = (a_{11}a_{22} - a_{12}a_{21}) x_1[/math]
[math]\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| = \left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| x_1[/math]

or

[math]x_1 = \frac{\left| \begin{array}{cc} y_1 & a_{12}\\ y_2 & a_{22} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }[/math] similarly [math]x_2 = \frac{\left| \begin{array}{cc} y_1 & a_{11}\\ y_2 & a_{21} \end{array} \right| }{\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| }[/math]

Solutions exist as long as

[math]\left| \begin{array}{cc} a_{11} & a_{12}\\ a_{12} & a_{22} \end{array} \right| \ne 0[/math]

Apply the method of determinant for the maximum likelihood problem above

[math]A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}[/math]
[math]B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}[/math]


If the uncertainty in all the measurements is not the same then we need to insert [math]\sigma_i[/math] back into the system of equations.


[math]A = \frac{\left| \begin{array}{cc} \sum\frac{ y_i}{\sigma_i^2} & \sum\frac{ x_i}{\sigma_i^2}\\ \sum\frac{ x_i y_i}{\sigma_i^2} & \sum\frac{ x_i^2}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} N\sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N\sum \frac{1}{\sigma_i^2} & \sum \frac{ y_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i y_i}{\sigma_i^2} \end{array}\right|}{\left| \begin{array}{cc} N\sum \frac{1}{\sigma_i^2} & \sum \frac{x_i}{\sigma_i^2}\\ \sum \frac{x_i}{\sigma_i^2} & \sum \frac{x_i^2}{\sigma_i^2} \end{array}\right|}[/math]

Uncertainty in the Linear Fit parameters

As always the uncertainty is determined by the Taylor expansion in quadrature such that

[math]\sigma_P^2 = \sum \left [ \sigma_i^2 \left ( \frac{\partial P}{\partial y_i}\right )^2\right ][/math] = error in parameter P: here covariance has been assumed to be zero

By definition of variance

[math]\sigma_i^2 \approx s^2 = \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2}[/math] : there are 2 parameters and N data points which translate to (N-2) degrees of freedom.


The least square fit ( assuming equal [math]\sigma[/math]) has the following solution for the parameters A & B as

[math]A = \frac{\left| \begin{array}{cc} \sum y_i & \sum x_i\\ \sum x_i y_i & \sum x_i^2 \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} \;\;\;\; B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}[/math]

uncertainty in A

[math]\frac{\partial A}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\sum y_i \sum x_i^2 - \sum x_i \sum x_i y_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}[/math]
[math] = \frac{(1) \sum x_i^2 - x_j\sum x_i }{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}[/math] only the [math]y_j[/math] term survives
[math] = D \left ( \sum x_i^2 - x_j\sum x_i \right)[/math]

Let

[math]D \equiv \frac{1}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}=\frac{1}{N\sum x_i^2 - \sum x_i \sum x_i }[/math]
[math]\sigma_A^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial A}{\partial y_j}\right )^2\right ][/math]
[math]= \sum_{j=1}^N \sigma_j^2 \left ( D \left ( \sum x_i^2 - x_j\sum x_i \right) \right )^2[/math]
[math] = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2 - x_j\sum x_i \right )^2[/math] : Assume [math]\sigma_i = \sigma[/math]
[math] = \sigma^2 D^2 \sum_{j=1}^N \left ( \sum x_i^2\right )^2 + \left (x_j\sum x_i \right )^2 - 2 \left ( \sum x_i^2 x_j \sum x_i \right )[/math]
[math] = \sigma^2 D^2\sum x_i^2\left [ \sum_{j=1}^N \left ( \sum x_i^2\right ) + \sum_{j=1}^N x_j^2 - 2 \sum x_i \sum_{j=1}^N x_j \right ][/math]
[math] = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum x_i \sum_{j=1}^N x_j + \sum_{j=1}^N x_j^2 \right ][/math]
[math] \sum x_i \sum_{j=1}^N x_j \approx \sum_{j=1}^N x_j^2[/math] Both sums are over the number of observations [math]N[/math]
[math] = \sigma^2 D^2\sum x_i^2\left [ N \left ( \sum x_i^2\right ) - 2 \sum_{j=1}^N x_j^2 + \sum_{j=1}^N x_j^2 \right ][/math]
[math] = \sigma^2 D^2\sum x_i^2 \frac{1}{D}[/math]
[math] \sigma_A^2= \sigma^2 \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}[/math]
[math] \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2}[/math]


If we redefine our origin in the linear plot so the line is centered a x=0 then

[math]\sum{x_i} = 0[/math]
[math]\Rightarrow \frac{\sum x_i^2 }{N\sum x_i^2 - \left (\sum x_i \right)^2} = \frac{\sum x_i^2 }{N\sum x_i^2 } = \frac{1}{N}[/math]

or

[math] \sigma_A^2= \frac{\sum \left( y_i - A - B x_i \right)^2}{N -2} \frac{1}{N} = \frac{\sigma^2}{N}[/math]
Note
The parameter A is the y-intercept so it makes some intuitive sense that the error in the Y -intercept would be dominated by the statistical error in Y


uncertainty in B

[math]B = \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|} [/math]
[math]\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 \left ( \frac{\partial B}{\partial y_j}\right )^2\right ][/math]
[math]\frac{\partial B}{\partial y_j} =\frac{\partial}{\partial y_j} \frac{\left| \begin{array}{cc} N & \sum y_i\\ \sum x_i & \sum x_i y_i \end{array}\right|}{\left| \begin{array}{cc} N & \sum x_i\\ \sum x_i & \sum x_i^2 \end{array}\right|}= \frac{\partial}{\partial y_j} D \left ( N\sum x_i y_i -\sum x_i \sum y_i \right )[/math]
[math]= D \left ( N x_j - \sum x_i \right) [/math]


\sigma_B^2 = \sum_{j=1}^N \left [ \sigma_j^2 D \left ( N x_j - \sum x_i \right) ]

Linear Fit with error

[math]\frac{\partial \chi^2}{\partial A} = \sum \frac{ \partial}{\partial A} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0[/math]
[math]= -2 \sum \left ( \frac{y_i - A - B x_i }{\sigma_i^2}\right)=0[/math]
[math]\frac{\partial \chi^2}{\partial B} = \sum \frac{ \partial}{\partial B} \left ( \frac{y_i - A - B x_i }{\sigma_i}\right)^2=0[/math]
[math]= 2 \sum \left ( \frac{y_i - A - B x_i }{\sigma_i^2}\right)(B x_i)=0[/math]


Go Back Forest_Error_Analysis_for_the_Physical_Sciences#Statistical_inference