Difference between revisions of "Forest Error Analysis for the Physical Sciences"

From New IAC Wiki
Jump to navigation Jump to search
 
(119 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==Class Admin==
+
=Class Admin=
===[[Forest_ErrorAnalysis_Syllabus]]===
+
==[[Forest_ErrorAnalysis_Syllabus]]==
  
===Homework===
+
==Homework==
 
Homework is due at the beginning of class on the assigned day. If you have a documented excuse for your absence, then you will have 24 hours to hand in the homework after being released by your doctor.
 
Homework is due at the beginning of class on the assigned day. If you have a documented excuse for your absence, then you will have 24 hours to hand in the homework after being released by your doctor.
  
=== Class Policies===
+
== Class Policies==
 
http://wiki.iac.isu.edu/index.php/Forest_Class_Policies
 
http://wiki.iac.isu.edu/index.php/Forest_Class_Policies
  
=== Instructional Objectives ===
+
== Instructional Objectives ==
  
 
;Course Catalog Description
 
;Course Catalog Description
: Error Analysis for the Physics Sciences 3 credits. Lecture course with computation requirements. Topics include: Error propagation, Probability Distributions, Least Squares fit, multiple regression, goodnes of fit, covariance and correlations.  
+
: Error Analysis for the Physics Sciences 3 credits. Lecture course with computation requirements. Topics include: Error propagation, Probability Distributions, Least Squares fit, multiple regression, goodness of fit, covariance and correlations.  
 
Prequisites:Math 360.
 
Prequisites:Math 360.
 
;Course Description
 
;Course Description
: The course assumes that the student has very limited experience with the UNIX environment and C/C++ programming. Homework problems involve modifying and compiling example programs written in C++.
+
: The application of statistical inference and hypothesis testing will be the main focus of this course for students who are senior level undergraduates or beginning graduate students.  The course begins by introducing the basic skills of error analysis and then proceeds to describe fundamental methods comparing measurements and models.  A freely available data analysis package known as ROOT will be used.  Some programming skills will be needed using C/C++ but a limited amount of experience is assumed.
 +
 
 +
==Objectives and Outcomes==
 +
 
 +
[[Forest_ErrorAnalysis_ObjectivesnOutcomes]]
 +
 
 +
==Suggested Text==
 +
[http://www.amazon.com/s/ref=nb_ss?url=search-alias%3Daps&field-keywords=0079112439&x=12&y=19 Data Reduction and Error Analysis for the Physical Sciences by Philip Bevington] ISBN:
 +
0079112439
 +
 
 +
=Homework=
 +
 
 +
[[TF_ErrAna_Homework]]
 +
 
 +
= Class Labs =
 +
 
 +
[[TF_ErrAna_InClassLab]]
 +
 
 +
 
  
 
=Systematic and Random Uncertainties=
 
=Systematic and Random Uncertainties=
Line 27: Line 45:
  
 
;Precision
 
;Precision
:a measure of how exactly the result is determine.  No reference is made to what the result means.
+
:A measure of how exact the result is determined.  No reference is made to what the result means.
 +
 
 +
==Systematic Error==
 +
 
 +
What is a systematic error?
 +
 
 +
 
 +
A class of errors which result in reproducible mistakes due to equipment bias or a bias related to its use by the observer. 
 +
 
 +
Example:  A ruler
 +
 
 +
a.) A ruler could be shorter or longer because of temperature fluctuations
 +
 
 +
b.) An observer could be viewing the markings at a glancing angle.
 +
 
 +
 
 +
 
 +
In this case a systematic error is more of a mistake than an uncertainty.
 +
 
 +
In some cases you can correct for the systematic error.  In the above Ruler example you can measure how the ruler's length changes with temperature.  You can then correct this systematic error by measuring the temperature of the ruler during the distance measurement.
 +
 
 +
 
 +
Correction Example:
  
=Reporting Uncertainties=
+
A ruler is calibrated at 25 C an has an expansion coefficient of (0.0005 <math>\pm</math> 0.0001 m/C.
==Notation==
+
 
 +
You measure the length of a wire at 20 C and find that on average it is <math>1.982 \pm 0.0001</math> m long.
 +
 
 +
This means that the 1 m ruler is really (1-(20-25 C)(0.0005 m/C)) = 0.99775
 +
 
 +
 
 +
So the correction becomes
 +
 
 +
1.982 *( 0.99775) =1.977 m
 +
 
 +
;Note: The numbers above without decimal points are integers.  Integers have infinite precision.  We will discuss the propagation of the errors above in a different chapter.
 +
 
 +
Error from bad technique:
 +
 
 +
After repeating the experiment several times the observer discovers that he had a tendency to read the meter stick at an angle and not from directly above.  After investigating this misread with repeated measurements the observer estimates that on average he will misread the meter stick by 2 mm.  This is now a systematic error that is estimated using random statistics.
 +
 
 +
==Reporting Uncertainties==
 +
===Notation===
  
 
X <math>\pm</math> Y = X(Y)
 
X <math>\pm</math> Y = X(Y)
Line 51: Line 108:
  
 
{| border="3"  cellpadding="20" cellspacing="0"
 
{| border="3"  cellpadding="20" cellspacing="0"
|Measurement ||most Sig. digit || least Sig. || Num. Sig. Dig. || Sient. Not.
+
|Measurement ||most Sig. digit || least Sig. || Num. Sig. Dig. || Scientific Notation
 
|-
 
|-
| 5 || 5 || 5 || 1 || <math>5</math>
+
| 5 || 5 || 5 || 1* || <math>5</math>
 
|-
 
|-
 
| 5.0 || 5 || 0 || 2|| <math>5.0 </math>
 
| 5.0 || 5 || 0 || 2|| <math>5.0 </math>
 
|-
 
|-
| 50 || 5 || 0 || 1|| <math>5 \times 10^{1}</math>
+
| 50 || 5 || 0 || 2*|| <math>5 \times 10^{1}</math>
 
|-
 
|-
 
| 50.1 || 5 || 1 || 3|| <math>5.01 \times 10^{1}</math>
 
| 50.1 || 5 || 1 || 3|| <math>5.01 \times 10^{1}</math>
Line 65: Line 122:
 
|}
 
|}
  
;Note  
+
;*Note  
:The value of "50" above is ambiguous unless we use scientific notation in which case we know if the zero is significant or not.
+
:The values of "5" and "50" above are ambiguous unless we use scientific notation in which case we know if the zero is significant or not.  Otherwise, integers have infinite precision.
  
 
===Round Off===
 
===Round Off===
Line 102: Line 159:
 
|}
 
|}
  
=Statistical Distributions=
 
  
== Average and Variance ==
+
==Statistics abuse==
===Average===
 
  
 +
http://www.worldcat.org/oclc/28507867
  
The word "average" is used to describe a property of a probability distribution or a set of observations/measurements made in an experiment which gives an indication of a likely outcome of an experiment.
+
http://www.worldcat.org/oclc/53814054
  
The symbol
+
[[Forest_StatBadExamples]]
  
: <math>\mu</math>
+
=Statistical Distributions=
  
is usually used to represent the average.
+
[[Forest_ErrAna_StatDist]]
  
Other notations are
+
=Propagation of Uncertainties=
 +
[[TF_ErrorAna_PropOfErr]]
  
: <math>\bar{x}</math>
+
=Statistical inference=
  
===Variance===
+
[[TF_ErrorAna_StatInference]]
  
The word "variance" is used to describe a property of a probability distribution or a set of observations/measurements made in an experiment which gives an indication how the average value fluctuates.
 
  
A typical variable used to denote the variance is
 
  
:<math>\sigma</math>
 
  
====Standard Deviation====
+
http://arxiv.org/abs/1506.09077
  
The standard deviation is defined as the square root of the variance
+
Byron Roe
 +
(Submitted on 30 Jun 2015)
  
:S.D. = <math>\sqrt{\sigma}</math>
+
  The problem of fitting an event distribution when the total expected number of events is not fixed, keeps appearing in experimental studies. In a chi-square fit, if overall normalization is one of the parameters parameters to be fit, the fitted curve may be seriously low with respect to the data points, sometimes below all of them. This problem and the solution for it are well known within the statistics community, but, apparently, not well known among some of the physics community. The purpose of this note is didactic, to explain the cause of the problem and the easy and elegant solution. The solution is to use maximum likelihood instead of chi-square. The essential difference between the two approaches is that maximum likelihood uses the normalization of each term in the chi-square assuming it is a normal distribution, 1/sqrt(2 pi sigma-square). In addition, the normalization is applied to the theoretical expectation not to the data. In the present note we illustrate what goes wrong and how maximum likelihood fixes the problem in a very simple toy example which illustrates the problem clearly and is the appropriate physics model for event histograms. We then note how a simple modification to the chi-square method gives a result identical to the maximum likelihood method.
  
== Average for an unknown probability distribution (parent population)==
+
=References=
 +
1.) "Data Reduction and Error Analysis for the Physical Sciences", Philip R. Bevington, ISBN-10: 0079112439, ISBN-13: 9780079112439
  
If the "Parent Population" is not known, you are just given a list of numbers with no indication of the probability distribution that they were drawn from, then the average and variance may be calculate as shown below.
+
CPP programs for Bevington
  
===Arithmetic Mean and variance===
+
[http://www.physics.isu.edu/~tforest/Bevington_CPP/  Bevington programs]
  
If <math>n</math> observables are mode in an experiment then the arithmetic mean of those observables is defined as
+
2.)[http://www.uscibooks.com/taylornb.htm An Introduction to Error Analysis, John R. Taylor] ISBN 978-0-935702-75-0
  
:<math>\bar{x} = \frac{\sum_{i=1}^{i=N} x_i}{N}</math>
+
3.) [http://root.cern.ch  ROOT: An Analysis Framework]
  
  
The "unbiased" variance of the above sample is defined as
+
4.) [http://nedwww.ipac.caltech.edu/level5/Sept01/Orear/frames.html Orear, "Notes on Statistics for Physicists"]
  
:<math>s^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N-1}</math>
+
5.) Statistical Methods for Engineers and Scientists, Robert M. Bethea, Benjamin S. Duran, and Thomas L. Boullion, Marcel Dekker Inc, New York, (1985), QA 276.B425
  
;If you were told that the average is <math>\bar{x}</math> then you can calculate the
+
6.) Data Analysis for Scientists and Engineers, Stuart L. Meyer, John Wiley & Sons Inc, New York, (1975), QA 276.M437
"true"  variance of the above sample as
 
  
:<math>\sigma^2 = \frac{\sum_{i=1}^{i=N} (x_i - \bar{x})^2}{N}</math>
+
7.) Syntax for ROOT Math Functions
  
=== Weighted Mean and variance ===
+
http://root.cern.ch/root/htmldoc/MATH_Index.html
  
If each observable (<math>x_i</math>)  is accompanied by an estimate of the uncertainty in that observable (<math>\delta x_i</math>) then weighted mean is defined as
+
=ROOT comands=
 
 
:<math>\bar{x} = \frac{ \sum_{i=1}^{i=n} \frac{x_i}{\delta x_i}}{\sum_{i=1}^{i=n} \frac{1}{\delta x_i}}</math>
 
 
 
The variance of the distribution is defined as
 
 
 
:<math>\bar{x} = \sum_{i=1}^{i=n} \frac{1}{\delta x_i}</math>
 
 
 
==Probability Distributions==
 
 
 
===Expectation Value===
 
The average of a sample drawn from any probability distribution is defined in terms of the expectation value E(x) such that
 
 
 
The expectation value for a discrete probability distribution is given by
 
  
: <math>E(x) = \sum_i x_i P(x_i)</math>
+
==A ROOT Primer==
  
The expectation value for a  continuous probability distribution  is calculated as
+
===Download===
  
: <math>E(x) = \int x dP(x)</math>
+
You can usually download the ROOT software for multiple platforms for free at the URL below
  
===variance===
+
http://root.cern.ch
  
The variance of a sample draw from a probability distribution
+
There is a lot of documentation which may be reached through the above URL as well
  
 +
some things that may need to be installed
  
===Uniform===
+
sudo apt-get install git
===Binomial Distribution===
 
  
Binomial random variable describes experiments in which the outcome has only 2 possibilities.  The two possible outcomes can be labeled as "success" or "failure". The probabilities may be defined as
+
sudo apt-get install make
  
;p
+
sudo apt-get install libx11-dev
: the probability of a success
 
  
and
+
sudo apt-get install libxpm-dev
  
;q
+
sudo apt-get install libxft-dev
:the probability of a failure.
 
  
 +
sudo apt-get install libxext-dev
  
 +
====install====
  
If we let <math>X</math> represent the number of successes after repeating the experiment <math>n</math> times
+
git clone http://github.com/root-project/root.git
  
Experiments with <math>n=1</math> are also known as Bernoulli trails.
+
cd root
  
Then <math>X</math> is the Binomial random variable with parameters <math>n</math> and <math>p</math>.
+
git checkout -b v6-10-08 v6-10-08
  
The number of ways in which the <math>x</math> successful outcomes can be organized in <math>n</math> repeated  trials is
+
cd ..
  
:<math>\frac{n !}{ \left [ (n-x) ! x !\right ]}</math>  where the <math> !</math> denotes a factorial such that <math>5! = 5\times4\times3\times2\times1</math>.
+
mkdir v6-10-08
  
The expression is known as the binomial coefficient and is represented as
+
cd v6-10-08
  
<math>{n\choose x}=\frac{n!}{x!(n-x)!}</math>
+
cmake -D mysql:BOOL=OFF ../root
  
 +
cmake --build .
  
The probability of any one ordering of the success and failures is given by
+
===Setup the environment===
  
<math>P( \mbox{experimental ordering}) = p^{x}q^{n-x}</math>
+
There is an environmental variable which is usually needed on most platforms in order to run ROOT.  I will just give the commands for setting things up under the bash shell running on the UNIX platform.
  
 +
;Set the variable ROOTSYS to point to the directory used to install ROOT
  
This means the probability of getting exactly k successes after n trials is
+
export ROOTSYS=~/src/ROOT/root
  
:<math>P(x=k) = {n\choose x}p^{x}q^{n-x} </math>
+
OR
  
 +
source ~/src/ROOT/root/bin/thisroot.sh
  
 +
I have installed the latest version of ROOT in the subdirectory ~/src/ROOT/root.
  
It can be shown that the Expectation Value of the distribution is
 
  
:<math>\mu = n p</math>
+
You can now run the program from any subdirectory with the command
  
and the variance is
+
$ROOTSYS/bin/root
  
:<math>\sigma^2 = npq</math>
+
If you want to compile programs which use ROOT libraries you should probably put the "bin" subdirectory under ROOTSYS in your path as well as define a variable pointing to the subdirectory used to store all the ROOT libraries (usually $ROOTSYS/lib).
  
=== Examples===
+
PATH=$PATH:$ROOTSYS/bin
  
==== The number of times a coin toss is heads.====
+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib
  
The probability of a coin landing with the head of the coin facing up is
 
  
:<math>P = \frac{\mbox{number of desired outcomes}}{\mbox{number of possible outcomes}} = \frac{1}{2}</math>
+
Other UNIX shell and operating systems will have different syntax for doing the same thing.
  
Suppose you toss a coin 4 times.  Here are the possible outcomes
+
===Starting and exiting ROOT===
  
 +
Starting and exiting the ROOT program may be the next important thing to learn.
  
{| border="1"  |cellpadding="20" cellspacing="0
+
As already mentioned above you can start ROOT by executing the command
|order Number
 
|colspan= "4" | Trial #
 
| # of Heads
 
|-
 
| || 1|| 2 || 3|| 4 ||
 
|-
 
|1 ||t || t || t|| t ||0
 
|-
 
|2||h || t || t|| t ||1
 
|-
 
|3||t || h || t|| t ||1
 
|-
 
|4||t || t || h|| t ||1
 
|-
 
|5||t || t || t|| h ||1
 
|-
 
|6||h || h || t|| t ||2
 
|-
 
|7||h || t || h|| t ||2
 
|-
 
|8||h || t || t|| h||2
 
|-
 
|9||t || h || h|| t ||2
 
|-
 
|10||t || h || t|| h ||2
 
|-
 
|11||t || t || h|| h ||2
 
|-
 
|12||t|| h || h|| h||3
 
|-
 
|13||h|| t || h|| h||3
 
|-
 
|14||h|| h || t|| h||3
 
|-
 
|15||h|| h || h|| t||3
 
|-
 
|16||h|| h || h|| h||4
 
|}
 
  
 +
$ROOTSYS/bin/root
  
The probability of order #1 happening is
+
from an operating system command shell.  You will see a "splash" screen appear and go away after a few seconds.  Your command shell is now a ROOT command interpreter known as CINT which has the command prompt "root[0]".  You can type commands within this environment such as the command below to exit the ROOT program.
  
P( order #1) = <math>\left ( \frac{1}{2} \right )^0\left ( \frac{1}{2} \right )^4 = \frac{1}{16}</math>
+
To EXIT the program just type
  
P( order #2) = <math>\left ( \frac{1}{2} \right )^1\left ( \frac{1}{2} \right )^3 = \frac{1}{16}</math>
+
.q
  
The probability of observing the coin land on heads 3 times out of 4 trials is.
+
===Starting the ROOT Browser (GUI)===
  
<math>P(x=3) = \frac{4}{16} = \frac{1}{4} = {n\choose x}p^{x}q^{n-x}  = \frac{4 !}{ \left [ (4-3) ! 3 !\right ]} \left ( \frac{1}{2}\right )^{3}\left ( \frac{1}{2}\right )^{4-3} = \frac{24}{1 \times 6} \frac{1}{16} = \frac{1}{4}</math>
+
By now most computer users are accustomed to Graphical User Interfaces (GUIs) for application. You can launch a GUI from within the ROOT command interpreter interface with the command
  
==== Count number of times a 6 is observed when rolling a die====
+
new TBrowser();
  
p=1/6
+
A window should open with pull down menus.  You can now exit root by selecting "Quit ROOT" from the pull down menu labeled "File" in the upper left part of the GUI window.
  
Expectation value :
+
==plotting Probability Distributions in ROOT==
; The expected (average) value from a single roll of the dice is
 
: <math>E({\rm Roll\ With\ 6\ Sided\ Die}) =\sum_i x_i P(x_i) = 1 \left ( \frac{1}{6} \right) + 2\left ( \frac{1}{6} \right)+ 3\left ( \frac{1}{6} \right)+ 4\left ( \frac{1}{6} \right)+ 5\left ( \frac{1}{6} \right)+ 6\left ( \frac{1}{6} \right)=\frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5</math>
 
  
===Poisson Distribution===
+
http://project-mathlibs.web.cern.ch/project-mathlibs/sw/html/group__PdfFunc.html
  
 +
The Binomial distribution is plotted below for the case of flipping a coin 60 times (n=60,p=1/2)
  
:<math>P(x) = \frac{\left ( \lambda s \right)^x e^{-\lambda s}}{x!}</math>
+
;You can see the integral , Mean and , RMS by right clicking on the Statistics box and choosing "SetOptStat".  A menu box pops up and you enter the binary code "1001111" (without the quotes).
  
where
 
  
: <math>\lambda</math> = probability for the occurrence of an event per unit interval <math>s</math>
+
<pre>
 
+
root [0] Int_t i;                                                                
 
+
root [1] Double_t x                                                             
;Homework Problem (Bevington pg 38)
+
root [2] TH1F *B30=new TH1F("B30","B30",100,0,100)                              
:Derive the Poisson distribution assuming a small sample size
+
root [3] for(i=0;i<100;i++) {x=ROOT::Math::binomial_pdf(i,0.5,60);B30->Fill(i,x);}
 
+
root [4] B30->Draw();                                                           
1.) Assume that the average rate of an event is constant over a given time interval and that the events are randomly distributed over that time interval.
+
</pre>
 
 
2.) The probability of NO events occuring over the time interval t is exponential such that
 
 
 
;<math>P(0,t,\tau) = \exp^{-t/\tau}</math>
 
 
 
where \tau is a constant of proportionality associated with the mean time
 
 
 
the change in the probability as a function of time is given by
 
 
 
;<math>dP(0,t,\tau) = - P(0,t,\tau) \frac{dt}{\tau}</math>
 
 
 
===Gaussian===
 
===Lorentzian===
 
===Gamma===
 
===Beta===
 
===Breit Wigner===
 
===Cauchy===
 
===Chi-squared===
 
===Exponential===
 
===F-distribution===
 
===Landau===
 
===Log Normal===
 
===t-Distributioon===
 
 
 
=Propagation of Uncertainties=
 
 
 
 
 
=Statistical inference=
 
 
 
For this class we shall define a hypothesis test as a test used to
 
 
 
There are two schools of thought on this
 
 
 
== frequentist statistical inference==
 
 
 
:Statistical inference is made using a null-hypothesis test; that is, ones that answer the question Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?
 
 
 
 
 
The relative frequency of occurrence of an event, in a number of repetitions of the experiment, is a measure of the probability of that event.
 
Thus, if nt is the total number of trials and nx is the number of trials where the event x occurred, the probability P(x) of the event occurring will be approximated by the relative frequency as follows:
 
 
 
:<math>P(x) \approx \frac{n_x}{n_t}.</math>
 
 
 
== Bayesian inference.==
 
 
 
:Statistical inference is made by using evidence or observations  to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.
 
 
 
Bayes gave a special case involving continuous probability distribution|continuous prior and posterior probability distributions and discrete probability distributions of data, but in its simplest setting involving only discrete distributions, Bayes' theorem relates the conditional probability|conditional and marginal probability|marginal probabilities of events ''A'' and ''B'', where ''B'' has a non-vanishing probability:
 
 
 
:<math>P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! </math>.
 
 
 
Each term in Bayes' theorem has a conventional name:
 
* P(''A'') is the prior probability or marginal probability of ''A''. It is "prior" in the sense that it does not take into account any information about&nbsp;''B''.
 
* P(''A''|''B'') is the conditional probability of ''A'', given ''B''. It is also called the posterior probability because it is derived from or depends upon the specified value of&nbsp;''B''.
 
* P(''B''|''A'') is the conditional probability of ''B'' given ''A''.
 
* P(''B'') is the prior or marginal probability of ''B'', and acts as a normalizing constant.
 
 
 
Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event ''A'' given ''B'' is related to the converse conditional probabablity of ''B'' given ''A''.
 
 
 
===Example===
 
Suppose there is a school having 60% boys and 40% girls as students. The female students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.
 
The event A is that the student observed is a girl, and the event B is that the student observed is wearing trousers. To compute P(A|B), we first need to know:
 
P(A), or the probability that the student is a girl regardless of any other information. Since the observers sees a random student, meaning that all students have the same probability of being observed, and the fraction of girls among the students is 40%, this probability equals 0.4.
 
P(B|A), or the probability of the student wearing trousers given that the student is a girl. As they are as likely to wear skirts as trousers, this is 0.5.
 
P(B), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since P(B) = P(B|A)P(A) + P(B|A')P(A'), this is 0.5×0.4 + 1×0.6 = 0.8.
 
Given all this information, the probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:
 
 
 
:<math>P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.</math>
 
 
 
Another, essentially equivalent way of obtaining the same result is as follows. Assume, for concreteness, that there are 100 students, 60 boys and 40 girls. Among these, 60 boys and 20 girls wear trousers. All together there are 80 trouser-wearers, of which 20 are girls. Therefore the chance that a random trouser-wearer is a girl equals 20/80 = 0.25. Put in terms of Bayes´ theorem, the probability of a student being a girl is 40/100, the probability that any given girl will wear trousers is 1/2. The product of two is 20/100, but you know the student is wearing trousers, so you remove the 20 non trouser wearing students and receive a probability of (20/100)/(80/100), or 20/80.
 
 
 
 
 
== Chi-Square==
 
=== comparing experiment with theory/function===
 
=== Comparing 2 experiments===
 
 
 
===P-value===
 
 
 
Root fundtion to evaluate meaning of Chi-square
 
 
 
 
 
PDG => Rather, the p-value is
 
the probability, under the assumption of a hypothesis H , of obtaining data at least as
 
incompatible with H as the data actually observed.
 
 
 
From http://en.wikipedia.org/wiki/P-value and http://en.wikipedia.org/wiki/Statistical_significance
 
 
 
the p-value is the frequency or probability with which the observed event would occur, if the null hypothesis were true. If the obtained p-value is smaller than the significance level, then the null hypothesis is rejected.
 
 
 
In some fields, for example nuclear and particle physics, it is common to express statistical significance in units of "σ" (sigma), the standard deviation of a Gaussian distribution. A statistical significance of "<math>n\sigma</math>" can be converted into a value of α via use of the error function:
 
 
 
:<math>\alpha = 1 - \operatorname{erf}(n/\sqrt{2}) \, </math>
 
 
 
The use of σ is motivated by the ubiquitous emergence of the Gaussian distribution in measurement uncertainties. For example, if a theory predicts a parameter to have a value of, say, 100, and one measures the parameter to be&nbsp;109&nbsp;±&nbsp;3, then one might report the measurement as a "3σ deviation" from the theoretical prediction. In terms of α, this statement is equivalent to saying that "assuming the theory is true, the likelihood of obtaining the experimental result by coincidence is 0.27%" (since 1&nbsp;&minus;&nbsp;erf(3/√2) =&nbsp;0.0027).
 
  
 +
You can continue creating plots of the various distributions with the commands
 +
<pre>
 +
root [2] TH1F *P30=new TH1F("P30","P30",100,0,100)                               
 +
root [3] for(i=0;i<100;i++) {x=ROOT::Math::poisson_pdf(i,30);P30->Fill(i,x);}
 +
root [4] P30->Draw("same");                                                           
 +
root [2] TH1F *G30=new TH1F("G30","G30",100,0,100)                               
 +
root [3] for(i=0;i<100;i++) {x=ROOT::Math::gaussian_pdf(i,3.8729,30);G30->Fill(i,x);}
 +
root [4] G30->Draw("same");                                                           
 +
</pre>
  
 +
The above commands created the plot below.
  
What is a p-value?
+
[[File:Forest_EA_BinPosGaus_Mean30.png| 200 px]]
 +
[[File:Forest_EA_BinPosGaus_Mean30.eps]]
  
A p-value is a measure of how much evidence we have against the null hypothesis. The null hypothesis, traditionally represented by the symbol H0, represents the hypothesis of no change or no effect.
+
;Note:  I set each distribution to correspond to a mean (<math>\mu</math>) of 30. The <math>\sigma^2</math> of the Binomial distribution is <math>np(1-p)</math> = 60(1/2)(1/2) = 15. The <math>\sigma^2</math> of the Poisson distribution is defined to equal the mean <math>\mu=30</math>.  I intentionally set the mean and sigma of the Gaussian to agree with the Binomial. As you can see the Binomial and Gaussian lay on top of each other while the Poisson is wider.   The Gaussian will lay on top of the Poisson if you change its <math>\sigma</math> to 5.477
 
 
The smaller the p-value, the more evidence we have against H0. It is also a measure of how likely we are to get a certain sample result or a result “more extreme,” assuming H0 is true. The type of hypothesis (right tailed, left tailed or two tailed) will determine what “more extreme” means.
 
 
 
Much research involves making a hypothesis and then collecting data to test that hypothesis. In particular, researchers will set up a null hypothesis, a hypothesis that presumes no change or no effect of a treatment. Then these researchers will collect data and measure the consistency of this data with the null hypothesis.
 
 
 
The p-value measures consistency by calculating the probability of observing the results from your sample of data or a sample with results more extreme, assuming the null hypothesis is true. The smaller the p-value, the greater the inconsistency.
 
 
 
Traditionally, researchers will reject a hypothesis if the p-value is less than 0.05. Sometimes, though, researchers will use a stricter cut-off (e.g., 0.01) or a more liberal cut-off (e.g., 0.10). The general rule is that a small p-value is evidence against the null hypothesis while a large p-value means little or no evidence against the null hypothesis. Please note that little or no evidence against the null hypothesis is not the same as a lot of evidence for the null hypothesis.
 
 
 
It is easiest to understand the p-value in a data set that is already at an extreme. Suppose that a drug company alleges that only 50% of all patients who take a certain drug will have an adverse event of some kind. You believe that the adverse event rate is much higher. In a sample of 12 patients, all twelve have an adverse event.
 
 
 
The data supports your belief because it is inconsistent with the assumption of a 50% adverse event rate. It would be like flipping a coin 12 times and getting heads each time.
 
 
 
The p-value, the probability of getting a sample result of 12 adverse events in 12 patients assuming that the adverse event rate is 50%, is a measure of this inconsistency. The p-value, 0.000244, is small enough that we would reject the hypothesis that the adverse event rate was only 50%.
 
 
 
A large p-value should not automatically be construed as evidence in support of the null hypothesis. Perhaps the failure to reject the null hypothesis was caused by an inadequate sample size. When you see a large p-value in a research study, you should also look for one of two things:
 
 
 
a power calculation that confirms that the sample size in that study was adequate for detecting a clinically relevant difference; and/or
 
a confidence interval that lies entirely within the range of clinical indifference.
 
You should also be cautious about a small p-value, but for different reasons. In some situations, the sample size is so large that even differences that are trivial from a medical perspective can still achieve statistical significance.
 
 
 
As a statistician, I am not in a good position to advise you on whether a difference is trivial or not. As a medical expert, you need to balance the cost and side effects of a treatment against the benefits that the therapy provides.
 
 
 
The authors of the research paper should inform you what size difference is clinically relevant and what sized difference is trivial. But if they don't, you should. Ask yourself how much of a difference would be large enough to cause you to change your practice. Then compare this to the confidence interval in the research paper. If both limits of the confidence interval are smaller than a clinically relevant difference, then you should not change your practice, no matter what the p-value tells you.
 
 
 
You should not interpret the p-value as the probability that the null hypothesis is true. Such an interpretation is problematic because a hypothesis is not a random event that can have a probability.
 
 
 
Bayesian statistics provides an alternative framework that allows you to assign probabilities to hypotheses and to modify these probabilities on the basis of the data that you collect.
 
 
 
Example
 
 
 
A large number of p-values appear in a publication
 
 
 
Consultation Patterns and Provision of Contraception in General Practice Before Teenage Pregnancy: Case-Control Study. Churchill D, Allen J, Pringle M, Hippisley-Cox J, Ebdon D, Macpherson M, Bradley S. British Medical Journal 2000: 321(7259); 486-9. [Abstract] [Full text] [PDF]
 
by Churchill et al 2000. This was a study of consultation practices among teenagers who become pregnant. The researchers selected 240 patients (cases) with a recorded conception before the age of 20. Three controls were selected for each case and were matched on age and practice.
 
 
 
The not too surprising finding is that the cases were more likely to have consulted certain health professionals in the year before conception and were more likely to request contraceptive protection. This demonstrates that teenagers are not reluctant to seek advice about contraception.
 
 
 
For example, 91% of the cases (219/240) sought the advice of a general practitioner in the year before conception compared to 82% of the controls (586/719) during a similar time frame. This is a large difference. The odds ratio is 2.37. The p-value is 0.001, which indicates that this ratio is statistically significantly different from 1.0. The 95% confidence interval for the odds ratio is 1.45 to 3.86.
 
 
 
In contrast, 23% of the cases (56/240) sought advice from a practice nurse while 24% of the controls (170/719) sought advice. This is a small difference and the odds ratio is 0.98. The p-value is 0.905, which indicates that this odds ratio does not differ significantly from 1. As with any negative finding, you should be concerned about whether the result is due to an inadequate sample size. The confidence interval, however, is 0.69 to 1.39. This indicates that the research study had a good amount of precision and that the sample size was reasonable.
 
  
 +
for(i=0;i<100;i++) {x=ROOT::Math::gaussian_pdf(i,5.477,30);G30->Fill(i,x);}
  
 +
;Also Note: The Gaussian distribution can be used to represent the Binomial and Poisson distributions when the average is large ( ie:<math>\mu > 10</math>).  The Poisson distribution, though,  is an approximation to the Binomial distribution for the special case where the average number of successes is small ( <math>\mu << n</math> because <math>p <<1</math>)
  
 
<pre>
 
<pre>
root [3] TMath::Prob(1.31,11)
+
Int_t i;                                                               
 
+
Double_t x                                                             
Double_t Prob(Double_t chi2, Int_t ndf)
+
TH1F *B30=new TH1F("B30","B30",100,0,10);
Computation of the probability for a certain Chi-squared (chi2)
+
TH1F *P30=new TH1F("P30","P30",100,0,10);
and number of degrees of freedom (ndf).
+
for(i=0;i<10;i++) {x=ROOT::Math::binomial_pdf(i,0.1,10);B30->Fill(i,x);}
 
+
for(i=0;i<10;i++) {x=ROOT::Math::poisson_pdf(i,1);P30->Fill(i,x);}
Calculations are based on the incomplete gamma function P(a,x),
 
where a=ndf/2 and x=chi2/2.
 
 
 
P(a,x) represents the probability that the observed Chi-squared
 
for a correct model should be less than the value chi2.
 
 
 
The returned probability corresponds to 1-P(a,x),
 
which denotes the probability that an observed Chi-squared exceeds
 
the value chi2 by chance, even for a correct model.
 
 
 
 
</pre>
 
</pre>
--- NvE 14-nov-1998 UU-SAP Utrecht
 
 
=t-test=
 
=Komolgorov test=
 
 
=References=
 
1.) "Data Reduction and Error Analysis for the Physical Sciences", Philip R. Bevington, ISBN-10: 0079112439, ISBN-13: 9780079112439
 
 
CPP programs for Bevington
 
 
[http://www.physics.isu.edu/~tforest/Bevington_CPP/  Bevington programs]
 
 
2.)[http://www.uscibooks.com/taylornb.htm An Introduction to Error Analysis, John R. Taylor] ISBN 978-0-935702-75-0
 
  
=ROOT comands=
 
  
==Filling a histogram with random numbers from a given probability distribution==
+
[[File:Forest_EA_BinPos_Mean1.png| 200 px]]
 +
[[File:Forest_EA_BinPos_Mean1.eps]]
  
The Binomial distribution is plotted below for the case of flipping a coin 60 times (n=60,p=1/2)
+
Other probability distribution functions
  
 
<pre>
 
<pre>
root [0] Int_t i;                                                               
+
double beta_pdf(double x, double a, double b)
root [1] Double_t x                                                              
+
double binomial_pdf(unsigned int k, double p, unsigned int n)
root [2] TH1F *B30=new TH1F("B30","B30",100,0,100)                              
+
double breitwigner_pdf(double x, double gamma, double x0 = 0)
root [3] for(i=0;i<100;i++) {x=ROOT::Math::binomial_pdf(i,0.5,60);B30->Fill(i,x);}
+
double cauchy_pdf(double x, double b = 1, double x0 = 0)
root [4] B30->Draw();                                                           
+
double chisquared_pdf(double x, double r, double x0 = 0)
 +
double exponential_pdf(double x, double lambda, double x0 = 0)
 +
double fdistribution_pdf(double x, double n, double m, double x0 = 0)
 +
double gamma_pdf(double x, double alpha, double theta, double x0 = 0)
 +
double gaussian_pdf(double x, double sigma = 1, double x0 = 0)
 +
double landau_pdf(double x, double s = 1, double x0 = 0.)
 +
double lognormal_pdf(double x, double m, double s, double x0 = 0)
 +
double normal_pdf(double x, double sigma = 1, double x0 = 0)
 +
double poisson_pdf(unsigned int n, double mu)
 +
double tdistribution_pdf(double x, double r, double x0 = 0)
 +
double uniform_pdf(double x, double a, double b, double x0 = 0)
 
</pre>
 
</pre>
  
In the example generates 1000 random numbers from a Poisson Parent population with a mean of 10
+
==Sampling from Prob Dist in ROOT==
<pre>
 
  
root [0] TRandom r                                   
+
You can also use ROOT to generate random numbers according to a given probability distribution.
root [1] TH1F *P10=new TH1F("P10","P10",100,0,100)   
 
root [2] for(i=0;i<10000;i++) P10->Fill(r.Poisson(30));
 
Warning: Automatic variable i is allocated (tmpfile):1:
 
root [3] P10->Draw();                                 
 
</pre>
 
  
Other possibilities
 
 
<pre>
 
Gaussian with a mean of 30 and sigma of 5
 
root [1] TH1F *G10=new TH1F("G10","G10",100,0,100)   
 
root [2] for(i=0;i<10000;i++) G10->Fill(r.Gauss(30,5));
 
root [3] G10->Draw("same");
 
</pre>
 
 
==Filling a histogram with random numbers from a given probability distribution==
 
  
 
In the example generates 1000 random numbers from a Poisson Parent population with a mean of 10
 
In the example generates 1000 random numbers from a Poisson Parent population with a mean of 10
Line 554: Line 409:
  
 
G10->Fill(r.BreitWigner(30,5)); mean of 30 and gamma of 5
 
G10->Fill(r.BreitWigner(30,5)); mean of 30 and gamma of 5
 +
 +
 +
 +
=Final Exam=
 +
 +
The Final exam will be to write a report describing the analysis of the data in [[TF_ErrAna_InClassLab#Lab_16]]
 +
 +
Grading Scheme:
 +
 +
 +
Grid Search method results
 +
 +
10% Parameter values
 +
 +
20% Parameter errors
 +
 +
30% Probability fit is correct
 +
 +
40% Grammatically correct written explanation of the data analysis with publication quality plots
 +
 +
Report  due in my office and source code in my e-mail by Friday, May 8, 2:30 pm (MST)
 +
 +
Report length is between 3 and 15 pages all inclusive. Min Font Size is 12 pt, 1.5 spacing of lines, max margin 2".
 +
 +
=WIP=
 +
 +
[[TF_ErrorAnalysis_WIP]]

Latest revision as of 22:11, 29 April 2020

Class Admin

Forest_ErrorAnalysis_Syllabus

Homework

Homework is due at the beginning of class on the assigned day. If you have a documented excuse for your absence, then you will have 24 hours to hand in the homework after being released by your doctor.

Class Policies

http://wiki.iac.isu.edu/index.php/Forest_Class_Policies

Instructional Objectives

Course Catalog Description
Error Analysis for the Physics Sciences 3 credits. Lecture course with computation requirements. Topics include: Error propagation, Probability Distributions, Least Squares fit, multiple regression, goodness of fit, covariance and correlations.

Prequisites:Math 360.

Course Description
The application of statistical inference and hypothesis testing will be the main focus of this course for students who are senior level undergraduates or beginning graduate students. The course begins by introducing the basic skills of error analysis and then proceeds to describe fundamental methods comparing measurements and models. A freely available data analysis package known as ROOT will be used. Some programming skills will be needed using C/C++ but a limited amount of experience is assumed.

Objectives and Outcomes

Forest_ErrorAnalysis_ObjectivesnOutcomes

Suggested Text

Data Reduction and Error Analysis for the Physical Sciences by Philip Bevington ISBN: 0079112439

Homework

TF_ErrAna_Homework

Class Labs

TF_ErrAna_InClassLab


Systematic and Random Uncertainties

Although the name of the class is "Error Analysis" for historical purposes, a more accurate description would be "Uncertainty Analysis". "Error" usually means a mistake is made while "Uncertainty" is a measure of how confident you are in a measurement.

Accuracy -vs- Precision

Accuracy
How close does an experiment come to the correct result
Precision
A measure of how exact the result is determined. No reference is made to what the result means.

Systematic Error

What is a systematic error?


A class of errors which result in reproducible mistakes due to equipment bias or a bias related to its use by the observer.

Example: A ruler

a.) A ruler could be shorter or longer because of temperature fluctuations

b.) An observer could be viewing the markings at a glancing angle.


In this case a systematic error is more of a mistake than an uncertainty.

In some cases you can correct for the systematic error. In the above Ruler example you can measure how the ruler's length changes with temperature. You can then correct this systematic error by measuring the temperature of the ruler during the distance measurement.


Correction Example:

A ruler is calibrated at 25 C an has an expansion coefficient of (0.0005 [math]\pm[/math] 0.0001 m/C.

You measure the length of a wire at 20 C and find that on average it is [math]1.982 \pm 0.0001[/math] m long.

This means that the 1 m ruler is really (1-(20-25 C)(0.0005 m/C)) = 0.99775


So the correction becomes

1.982 *( 0.99775) =1.977 m

Note
The numbers above without decimal points are integers. Integers have infinite precision. We will discuss the propagation of the errors above in a different chapter.

Error from bad technique:

After repeating the experiment several times the observer discovers that he had a tendency to read the meter stick at an angle and not from directly above. After investigating this misread with repeated measurements the observer estimates that on average he will misread the meter stick by 2 mm. This is now a systematic error that is estimated using random statistics.

Reporting Uncertainties

Notation

X [math]\pm[/math] Y = X(Y)

Significant Figures and Round off

Significant figures

Most Significant digit
The leftmost non-zero digit is the most significant digit of a reported value
Least Significant digit
The least significant digit is identified using the following criteria
1.) If there is no decimal point, then the rightmost digit is the least significant digit.
2.)If there is a decimal point, then the rightmost digit is the least significant digit, even if it is a zero.

In other words, zero counts as a least significant digit only if it is after the decimal point. So when you report a measurement with a zero in such a position you had better mean it.

The number of significant digits in a measurement are the number of digits which appear between the least and most significant digits.

examples:

Measurement most Sig. digit least Sig. Num. Sig. Dig. Scientific Notation
5 5 5 1* [math]5[/math]
5.0 5 0 2 [math]5.0 [/math]
50 5 0 2* [math]5 \times 10^{1}[/math]
50.1 5 1 3 [math]5.01 \times 10^{1}[/math]
0.005001 5 1 4 [math]5.001\times 10^{-3}[/math]
  • Note
The values of "5" and "50" above are ambiguous unless we use scientific notation in which case we know if the zero is significant or not. Otherwise, integers have infinite precision.

Round Off

Measurements that are reported which are based on the calculation of more than one measured quantity must have the same number of significant digits as the quantity with the smallest number of significant digits.

To accomplish this you will need to round of the final measured value that is reported.

To round off a number you:

1.) Increment the least significant digit by one if the digit below it (in significance) is greater than 5.

2.) Do nothing if the digit below it (in significance) is less than 5.

Then truncate the remaining digits below the least significant digit.

What happens if the next significant digit below the least significant digit is exactly 5?

To avoid a systematic error involving round off you would ideally randomly decide to just truncate or increment. If your calculation is not on a computer with a random number generator, or you don't have one handy, then the typical technique is to increment the least significant digit if it is odd (or even) and truncate it if it is even (or odd).

Examples

The table below has three entries; the final value calculated from several measured quantities, the number of significant digits for the measurement with the smallest number of significant digits, and the rounded off value properly reported using scientific notation.

Value Sig. digits Rounded off value
12.34 3 [math]1.23 \times 10^{1}[/math]
12.36 3 [math]1.24 \times 10^{1}[/math]
12.35 3 [math]1.24 \times 10^{1}[/math]
12.35 2 [math]1.2 \times 10^{1}[/math]


Statistics abuse

http://www.worldcat.org/oclc/28507867

http://www.worldcat.org/oclc/53814054

Forest_StatBadExamples

Statistical Distributions

Forest_ErrAna_StatDist

Propagation of Uncertainties

TF_ErrorAna_PropOfErr

Statistical inference

TF_ErrorAna_StatInference



http://arxiv.org/abs/1506.09077

Byron Roe (Submitted on 30 Jun 2015)

  The problem of fitting an event distribution when the total expected number of events is not fixed, keeps appearing in experimental studies. In a chi-square fit, if overall normalization is one of the parameters parameters to be fit, the fitted curve may be seriously low with respect to the data points, sometimes below all of them. This problem and the solution for it are well known within the statistics community, but, apparently, not well known among some of the physics community. The purpose of this note is didactic, to explain the cause of the problem and the easy and elegant solution. The solution is to use maximum likelihood instead of chi-square. The essential difference between the two approaches is that maximum likelihood uses the normalization of each term in the chi-square assuming it is a normal distribution, 1/sqrt(2 pi sigma-square). In addition, the normalization is applied to the theoretical expectation not to the data. In the present note we illustrate what goes wrong and how maximum likelihood fixes the problem in a very simple toy example which illustrates the problem clearly and is the appropriate physics model for event histograms. We then note how a simple modification to the chi-square method gives a result identical to the maximum likelihood method.

References

1.) "Data Reduction and Error Analysis for the Physical Sciences", Philip R. Bevington, ISBN-10: 0079112439, ISBN-13: 9780079112439

CPP programs for Bevington

Bevington programs

2.)An Introduction to Error Analysis, John R. Taylor ISBN 978-0-935702-75-0

3.) ROOT: An Analysis Framework


4.) Orear, "Notes on Statistics for Physicists"

5.) Statistical Methods for Engineers and Scientists, Robert M. Bethea, Benjamin S. Duran, and Thomas L. Boullion, Marcel Dekker Inc, New York, (1985), QA 276.B425

6.) Data Analysis for Scientists and Engineers, Stuart L. Meyer, John Wiley & Sons Inc, New York, (1975), QA 276.M437

7.) Syntax for ROOT Math Functions

http://root.cern.ch/root/htmldoc/MATH_Index.html

ROOT comands

A ROOT Primer

Download

You can usually download the ROOT software for multiple platforms for free at the URL below

http://root.cern.ch

There is a lot of documentation which may be reached through the above URL as well

some things that may need to be installed

sudo apt-get install git

sudo apt-get install make

sudo apt-get install libx11-dev

sudo apt-get install libxpm-dev

sudo apt-get install libxft-dev

sudo apt-get install libxext-dev

install

git clone http://github.com/root-project/root.git

cd root

git checkout -b v6-10-08 v6-10-08

cd ..

mkdir v6-10-08

cd v6-10-08

cmake -D mysql:BOOL=OFF ../root

cmake --build .

Setup the environment

There is an environmental variable which is usually needed on most platforms in order to run ROOT. I will just give the commands for setting things up under the bash shell running on the UNIX platform.

Set the variable ROOTSYS to point to the directory used to install ROOT
export ROOTSYS=~/src/ROOT/root

OR

source ~/src/ROOT/root/bin/thisroot.sh

I have installed the latest version of ROOT in the subdirectory ~/src/ROOT/root.


You can now run the program from any subdirectory with the command

$ROOTSYS/bin/root

If you want to compile programs which use ROOT libraries you should probably put the "bin" subdirectory under ROOTSYS in your path as well as define a variable pointing to the subdirectory used to store all the ROOT libraries (usually $ROOTSYS/lib).

PATH=$PATH:$ROOTSYS/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib


Other UNIX shell and operating systems will have different syntax for doing the same thing.

Starting and exiting ROOT

Starting and exiting the ROOT program may be the next important thing to learn.

As already mentioned above you can start ROOT by executing the command

$ROOTSYS/bin/root

from an operating system command shell. You will see a "splash" screen appear and go away after a few seconds. Your command shell is now a ROOT command interpreter known as CINT which has the command prompt "root[0]". You can type commands within this environment such as the command below to exit the ROOT program.

To EXIT the program just type

.q

Starting the ROOT Browser (GUI)

By now most computer users are accustomed to Graphical User Interfaces (GUIs) for application. You can launch a GUI from within the ROOT command interpreter interface with the command

new TBrowser();

A window should open with pull down menus. You can now exit root by selecting "Quit ROOT" from the pull down menu labeled "File" in the upper left part of the GUI window.

plotting Probability Distributions in ROOT

http://project-mathlibs.web.cern.ch/project-mathlibs/sw/html/group__PdfFunc.html

The Binomial distribution is plotted below for the case of flipping a coin 60 times (n=60,p=1/2)

You can see the integral , Mean and , RMS by right clicking on the Statistics box and choosing "SetOptStat". A menu box pops up and you enter the binary code "1001111" (without the quotes).


root [0] Int_t i;                                                                 
root [1] Double_t x                                                               
root [2] TH1F *B30=new TH1F("B30","B30",100,0,100)                                
root [3] for(i=0;i<100;i++) {x=ROOT::Math::binomial_pdf(i,0.5,60);B30->Fill(i,x);}
root [4] B30->Draw();                                                             

You can continue creating plots of the various distributions with the commands

root [2] TH1F *P30=new TH1F("P30","P30",100,0,100)                                
root [3] for(i=0;i<100;i++) {x=ROOT::Math::poisson_pdf(i,30);P30->Fill(i,x);}
root [4] P30->Draw("same");                                                             
root [2] TH1F *G30=new TH1F("G30","G30",100,0,100)                                
root [3] for(i=0;i<100;i++) {x=ROOT::Math::gaussian_pdf(i,3.8729,30);G30->Fill(i,x);}
root [4] G30->Draw("same");                                                             

The above commands created the plot below.

Forest EA BinPosGaus Mean30.png File:Forest EA BinPosGaus Mean30.eps

Note
I set each distribution to correspond to a mean ([math]\mu[/math]) of 30. The [math]\sigma^2[/math] of the Binomial distribution is [math]np(1-p)[/math] = 60(1/2)(1/2) = 15. The [math]\sigma^2[/math] of the Poisson distribution is defined to equal the mean [math]\mu=30[/math]. I intentionally set the mean and sigma of the Gaussian to agree with the Binomial. As you can see the Binomial and Gaussian lay on top of each other while the Poisson is wider. The Gaussian will lay on top of the Poisson if you change its [math]\sigma[/math] to 5.477
for(i=0;i<100;i++) {x=ROOT::Math::gaussian_pdf(i,5.477,30);G30->Fill(i,x);}
Also Note
The Gaussian distribution can be used to represent the Binomial and Poisson distributions when the average is large ( ie:[math]\mu \gt 10[/math]). The Poisson distribution, though, is an approximation to the Binomial distribution for the special case where the average number of successes is small ( [math]\mu \lt \lt n[/math] because [math]p \lt \lt 1[/math])
Int_t i;                                                                 
Double_t x                                                               
TH1F *B30=new TH1F("B30","B30",100,0,10);
TH1F *P30=new TH1F("P30","P30",100,0,10);
for(i=0;i<10;i++) {x=ROOT::Math::binomial_pdf(i,0.1,10);B30->Fill(i,x);}
for(i=0;i<10;i++) {x=ROOT::Math::poisson_pdf(i,1);P30->Fill(i,x);}


Forest EA BinPos Mean1.png File:Forest EA BinPos Mean1.eps

Other probability distribution functions

double	beta_pdf(double x, double a, double b)
double	binomial_pdf(unsigned int k, double p, unsigned int n)
double	breitwigner_pdf(double x, double gamma, double x0 = 0)
double	cauchy_pdf(double x, double b = 1, double x0 = 0)
double	chisquared_pdf(double x, double r, double x0 = 0)
double	exponential_pdf(double x, double lambda, double x0 = 0)
double	fdistribution_pdf(double x, double n, double m, double x0 = 0)
double	gamma_pdf(double x, double alpha, double theta, double x0 = 0)
double	gaussian_pdf(double x, double sigma = 1, double x0 = 0)
double	landau_pdf(double x, double s = 1, double x0 = 0.)
double	lognormal_pdf(double x, double m, double s, double x0 = 0)
double	normal_pdf(double x, double sigma = 1, double x0 = 0)
double	poisson_pdf(unsigned int n, double mu)
double	tdistribution_pdf(double x, double r, double x0 = 0)
double	uniform_pdf(double x, double a, double b, double x0 = 0)

Sampling from Prob Dist in ROOT

You can also use ROOT to generate random numbers according to a given probability distribution.


In the example generates 1000 random numbers from a Poisson Parent population with a mean of 10

root [0] TRandom r                                     
root [1] TH1F *P10=new TH1F("P10","P10",100,0,100)     
root [2] for(i=0;i<10000;i++) P10->Fill(r.Poisson(30));
Warning: Automatic variable i is allocated (tmpfile):1:
root [3] P10->Draw();                                  

Other possibilities

Gaussian with a mean of 30 and sigma of 5
root [1] TH1F *G10=new TH1F("G10","G10",100,0,100)     
root [2] for(i=0;i<10000;i++) G10->Fill(r.Gauss(30,5));  //mean of 30 and sigma of 5
root [3] G10->Draw("same");

There is also

G10->Fill(r.Uniform(20,50)); // Generates a uniform distribution between 20 and 50

G10->Fill(r.Exp(30)); // exponential distribution with a mean of 30

G10->Fill(r.Landau(30,5)); // Landau distribution with a mean of 30 and sigma of 5

G10->Fill(r.BreitWigner(30,5)); mean of 30 and gamma of 5


Final Exam

The Final exam will be to write a report describing the analysis of the data in TF_ErrAna_InClassLab#Lab_16

Grading Scheme:


Grid Search method results

10% Parameter values

20% Parameter errors

30% Probability fit is correct

40% Grammatically correct written explanation of the data analysis with publication quality plots

Report due in my office and source code in my e-mail by Friday, May 8, 2:30 pm (MST)

Report length is between 3 and 15 pages all inclusive. Min Font Size is 12 pt, 1.5 spacing of lines, max margin 2".

WIP

TF_ErrorAnalysis_WIP