# Forest Error Analysis for the Physical Sciences

## Homework

Homework is due at the beginning of class on the assigned day. If you have a documented excuse for your absence, then you will have 24 hours to hand in the homework after being released by your doctor.

## Instructional Objectives

Course Catalog Description
Error Analysis for the Physics Sciences 3 credits. Lecture course with computation requirements. Topics include: Error propagation, Probability Distributions, Least Squares fit, multiple regression, goodness of fit, covariance and correlations.

Prequisites:Math 360.

Course Description
The application of statistical inference and hypothesis testing will be the main focus of this course for students who are senior level undergraduates or beginning graduate students. The course begins by introducing the basic skills of error analysis and then proceeds to describe fundamental methods comparing measurements and models. A freely available data analysis package known as ROOT will be used. Some programming skills will be needed using C/C++ but a limited amount of experience is assumed.

# Systematic and Random Uncertainties

Although the name of the class is "Error Analysis" for historical purposes, a more accurate description would be "Uncertainty Analysis". "Error" usually means a mistake is made while "Uncertainty" is a measure of how confident you are in a measurement.

## Accuracy -vs- Precision

Accuracy
How close does an experiment come to the correct result
Precision
A measure of how exact the result is determined. No reference is made to what the result means.

## Systematic Error

What is a systematic error?

A class of errors which result in reproducible mistakes due to equipment bias or a bias related to its use by the observer.

Example: A ruler

a.) A ruler could be shorter or longer because of temperature fluctuations

b.) An observer could be viewing the markings at a glancing angle.

In this case a systematic error is more of a mistake than an uncertainty.

In some cases you can correct for the systematic error. In the above Ruler example you can measure how the ruler's length changes with temperature. You can then correct this systematic error by measuring the temperature of the ruler during the distance measurement.

Correction Example:

A ruler is calibrated at 25 C an has an expansion coefficient of (0.0005 0.0001 m/C.

You measure the length of a wire at 20 C and find that on average it is m long.

This means that the 1 m ruler is really (1-(20-25 C)(0.0005 m/C)) = 0.99775

So the correction becomes

1.982 *( 0.99775) =1.977 m

Note
The numbers above without decimal points are integers. Integers have infinite precision. We will discuss the propagation of the errors above in a different chapter.

After repeating the experiment several times the observer discovers that he had a tendency to read the meter stick at an angle and not from directly above. After investigating this misread with repeated measurements the observer estimates that on average he will misread the meter stick by 2 mm. This is now a systematic error that is estimated using random statistics.

X Y = X(Y)

## Significant Figures and Round off

### Significant figures

Most Significant digit
The leftmost non-zero digit is the most significant digit of a reported value
Least Significant digit
The least significant digit is identified using the following criteria
1.) If there is no decimal point, then the rightmost digit is the least significant digit.
2.)If there is a decimal point, then the rightmost digit is the least significant digit, even if it is a zero.

In other words, zero counts as a least significant digit only if it is after the decimal point. So when you report a measurement with a zero in such a position you had better mean it.

The number of significant digits in a measurement are the number of digits which appear between the least and most significant digits.

examples:

 Measurement most Sig. digit least Sig. Num. Sig. Dig. Scientific Notation 5 5 5 1* 5.0 5 0 2 50 5 0 2* 50.1 5 1 3 0.005001 5 1 4
• Note
The values of "5" and "50" above are ambiguous unless we use scientific notation in which case we know if the zero is significant or not. Otherwise, integers have infinite precision.

### Round Off

Measurements that are reported which are based on the calculation of more than one measured quantity must have the same number of significant digits as the quantity with the smallest number of significant digits.

To accomplish this you will need to round of the final measured value that is reported.

To round off a number you:

1.) Increment the least significant digit by one if the digit below it (in significance) is greater than 5.

2.) Do nothing if the digit below it (in significance) is less than 5.

Then truncate the remaining digits below the least significant digit.

What happens if the next significant digit below the least significant digit is exactly 5?

To avoid a systematic error involving round off you would ideally randomly decide to just truncate or increment. If your calculation is not on a computer with a random number generator, or you don't have one handy, then the typical technique is to increment the least significant digit if it is odd (or even) and truncate it if it is even (or odd).

Examples

The table below has three entries; the final value calculated from several measured quantities, the number of significant digits for the measurement with the smallest number of significant digits, and the rounded off value properly reported using scientific notation.

 Value Sig. digits Rounded off value 12.34 3 12.36 3 12.35 3 12.35 2

# Statistical inference

Byron Roe (Submitted on 30 Jun 2015)

  The problem of fitting an event distribution when the total expected number of events is not fixed, keeps appearing in experimental studies. In a chi-square fit, if overall normalization is one of the parameters parameters to be fit, the fitted curve may be seriously low with respect to the data points, sometimes below all of them. This problem and the solution for it are well known within the statistics community, but, apparently, not well known among some of the physics community. The purpose of this note is didactic, to explain the cause of the problem and the easy and elegant solution. The solution is to use maximum likelihood instead of chi-square. The essential difference between the two approaches is that maximum likelihood uses the normalization of each term in the chi-square assuming it is a normal distribution, 1/sqrt(2 pi sigma-square). In addition, the normalization is applied to the theoretical expectation not to the data. In the present note we illustrate what goes wrong and how maximum likelihood fixes the problem in a very simple toy example which illustrates the problem clearly and is the appropriate physics model for event histograms. We then note how a simple modification to the chi-square method gives a result identical to the maximum likelihood method.