
Canadian Strategic Highway
Research Program
C-SHRP Bayesian
Modelling:
A User's Guide
Appendix
A BAYES THEOREM
The purpose of this section is to provide the reader with a brief introduction to the key concepts of Bayesian statistics. Bayes' theorem is used in the following hypothetical example to introduce the concept of a prior probability, new data as an update to the prior, and a posterior probability. This example is based on presentation notes from the C-SHRP Training Sessions in Bayesian Methods and Software (Nesbitt 1994, available on the CD-ROM version of this guide). Example: Interpreting Test Results A 30 year old woman concerned about a lump in her breast visits her doctor. The doctor makes an assessment. Given the particulars of the woman's case and her age, the doctor knows from past cases that there is only about one in one-hundred (i.e. 1 percent) chance of the lump being malignant. She prudently orders a needle biopsy, in which cells from the lump are withdrawn and examined under a microscope for malignancy. The needle biopsy is described as being '90 percent reliable' based on past experience. That is to say that in the case where the lump is malignant, the needle biopsy will indicate malignancy 90 percent of the time and indicate no malignancy (false-negative) 10 percent of the time. In the case where the lump is non-malignant, the test will correctly show this 90 percent of the time and indicate malignancy (false-positive) 10 percent of the time. After a tense period of waiting the test comes back, unfortunately indicating a positive result. The doctor cautions that there is reason for concern but not alarm as it is by no means certain that there is a malignancy. The patient says, "But it looks like it's about ninety percent certain that I have cancer!" The doctor replies, "No, I think we should look at the initial evidence as well. I'd say that the actual chance that you have cancer is a lot closer to my initial odds of one percent than ninety percent."Most people would intuitively assess that the probability of cancer is neither one percent nor ninety percent, but somewhere in between. But the question of whether the probability is closer to one percent or ninety percent is difficult to solve by intuition alone. Bayes' theorem can be used to calculate the true probability of a malignancy. To solve the problem, consider conducting needle biopsies in one thousand cases similar to the one previously described. We would expect, on average, that 10 (i.e. 1%) of the 1,000 would actually have cancer and 990 would not. The following table shows the expected results of needle biopsies conducted on this group. Table 1 : Needle Biopsy Results
Since the patient has received a positive test result, she belongs to a group of 108 (i.e. 99+9 =108) patients who would be expected to receive positive test results. The probability of any patient in this group, including our patient, having cancer is: Probability of Malignancy = # of true positives divided by total # of positives = 9/108 = 0.083 = 8.3% So indeed the actual probability of
malignancy was still quite low even after the positive
test. Because the solution deals only with
the laws of simple probability, it may not be apparent
from the preceding example that we have encountered
Bayes' theorem at all. However, this identical problem
may be restated in terms of Bayes' theorem which states
that for two conditional probabilities, A and B:
Plugging these values into Bayes' theorem, we see an equation identical to the one we derived earlier based on simple probability theory:
Bayes' theorem can be written in a more general form for many mutually exclusive events A1, A2, , AJ. The posterior probability for an event Aj is:
This may be expressed in more compact form:
Although the multivariate regression case is too involved for this introduction, we will consider a simplified linear regression model as a second example of Bayes' theorem. The following is based on an example originally presented in Design of a Long Term Pavement Monitoring System for the Canadian Strategic Highway Research Program (Nesbitt & Sparks 1990). Many of the details of the example have been simplified in this introduction and the reader may wish to refer to the original document for more information (available on the CD-ROM version of this guide). Example: A Simple Pavement
Deterioration Model (Those wishing to repeat this example with XLBayes after learning more about Bayesian regression should note that a Se2 = 0.01, degrees of freedom = 20. The variance- covariance matrix is simply the variance of b in this case, 0.0001) Figure
1 : Prior for Performance Index Loss The prior may be summarized with a straight line plot of performance index loss (y) versus age (t). The upper line was plotted by using the mean estimate of b plus its standard deviation. The lower line was plotted using the mean estimate of b minus its standard deviation. Due to uncertainty about the value of parameter b, there is considerable uncertainty about the time it will take a pavement section to reach a performance index loss of 0.8. The prior may also be summarized by plotting the probability distribution function for the regression coefficient b, as shown in Figure 2. In the figure the mean estimate of b is 0.053. The width of the bell shaped curve corresponds to the certainty of the estimate. Note that the experts are confident that b is no lower than about 0.025 and no higher than about 0.085. Figure2
: Prior Probability Distribution for Coefficient b Assume that a small quantity of early data based on 5 test sections is available. This data is summarized in Table 2.
A classical regression analysis, as described in Appendix B, is performed on the early data. This gives an estimate of the regression coefficient b. The results of the classical regression (denoted 'Data') are shown together with the prior in Figure 3. Figure 3 : Prior and
Early Data Distributions for Coefficient b We wish to determine with Bayesian regression the posterior probability distribution for the estimate of b. This problem is analogous to determining the posterior probability of a malignancy based on both a prior and additional data. The results of applying Bayesian regression to this problem are shown in Figure 4. Figure 4 : Probability Distributions for Bayesian Regression
The probability distribution for the posterior estimate of b is 'tighter' than either the prior or the data. This is intuitively reasonable as the prior and data reinforce each other with a similar estimate of the mean of b. One can see from the figure the benefit of using Bayesian regression where good prior information is available. Simple classical regression would have resulted in the broad probability distribution based on the data. To show the effect of a greater amount of data, observations for an additional 10 test sections are provided (Table 3). Bayesian regression is repeated using all 15 data points and the same prior as before. The result is a posterior with an even smaller confidence interval. Table 3 : Additional 10 Data Points
Figure 5 : Bayesian Regression Results using 15 Data Points
In general as more and more data is added to the problem, the posterior will continue to become more and more definitive (i.e. more and more confident in its estimate of b). Note that the estimate of b based on data alone is much more definitive in the case based on 15 data points as opposed to 5. Because the data is more definitive with 15 data points than it was with 5, the mean of the posterior has also shifted away from the prior and closer to the data. As more and more data is collected, the effect of the prior will continue to diminish. The question of why one would wish to use Bayesian regression can now be addressed. The difference between classical regression and Bayesian regression is simply that classical regression uses no prior information in making its estimate for the parameter b. The classical regression result (i.e. the 'Data' result) is somewhat lacking compared to the Bayesian result. If no more data were easily obtainable, one would certainly prefer the Bayesian approach. Bayesian regression is also very useful where the database is large but of low quality. Potential quality problems include 'noisy data', insufficient data in certain categories, and more complex problems such as multi-collinearity. In practice there are numerous data difficulties that can confound a classical regression analysis. Bayesian regression can be used to overcome some of these problems. Summary Bayesian statistics and Bayes' Theorem have been introduced in this section with two examples. Although other discussions in this user's guide are more in-depth, what is sought with a Bayesian regression analysis is always essentially the same. The goal is to create a regression model based on both prior information and new data, similar to the simple pavement performance model example in this chapter. References Nesbitt, Dale, C-SHRP Training Sessions in Bayesian Methods and Software, Transportation Association of Canada, 1994. Nesbitt, Dale and Sparks, Gordon, Design of a Long Term Pavement Monitoring System for the Canadian Strategic Highway Research Program, Transportation Association of Canada, 1990. Schmitt, Samuel A., Measuring Uncertainty - An Elementary Introduction to Bayesian Statistics, Addison-Wesly Publishing Co., Don Mills, Ontario, 1969. Winkler, Robert L., Introduction to Bayesian Inference and Decision, Holt, Rinehart and Winston, Inc., Toronto, 1972. |